* [PATCH v2 00/15] Make the user mode driver code a better citizen [not found] ` <87y2oac50p.fsf@x220.int.ebiederm.org> @ 2020-06-29 19:55 ` Eric W. Biederman 2020-06-29 19:56 ` [PATCH v2 01/15] umh: Capture the pid in umh_pipe_setup Eric W. Biederman ` (17 more replies) 0 siblings, 18 replies; 72+ messages in thread From: Eric W. Biederman @ 2020-06-29 19:55 UTC (permalink / raw) To: linux-kernel Cc: David Miller, Greg Kroah-Hartman, Tetsuo Handa, Alexei Starovoitov, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds This is the second round of my changeset to split the user mode driver code from the user mode helper code, and to make the code use common facilities to get things done instead of recreating them just for the user mode driver code. I have split the changes into small enough pieces so they should be easily readable and testable. The changes lean into the preexisting interfaces in the kernel and remove special cases for user mode driver code in favor of solutions that don't need special cases. This results in smaller code with fewer bugs. At a practical level this removes the maintenance burden of the user mode drivers from the user mode helper code and from exec as the special cases are removed. Similarly the LSM interaction bugs are fixed by not having unnecessary special cases for user mode drivers. I have tested thes changes by booting with the code compiled in and by killing "bpfilter_umh" and running iptables -vnL to restart the userspace driver. I have compiled tested each change with and without CONFIG_BPFILTER enabled. I made a few very small changes from v1 to v2: - Updated the function name in a comment when the function is renamed - Moved some more code so that the the !CONFIG_BPFILTER case continues to compile when I moved the code into umd.c - A fix for the module loading case to really flush the file descriptor. - Removed split_argv entirely from fork_usermode_driver. There was nothing to split so it was just confusing. Please let me know if you see any bugs. Once the code review is finished I plan to place the code in a non-rebasing branch so I can pull it into my tree and so it can also be pulled into the bpf-next tree. Eric W. Biederman (15): umh: Capture the pid in umh_pipe_setup umh: Move setting PF_UMH into umh_pipe_setup umh: Rename the user mode driver helpers for clarity umh: Remove call_usermodehelper_setup_file. umh: Separate the user mode driver and the user mode helper support umd: For clarity rename umh_info umd_info umd: Rename umd_info.cmdline umd_info.driver_name umd: Transform fork_usermode_blob into fork_usermode_driver umh: Stop calling do_execve_file exec: Remove do_execve_file bpfilter: Move bpfilter_umh back into init data umd: Track user space drivers with struct pid bpfilter: Take advantage of the facilities of struct pid umd: Remove exit_umh umd: Stop using split_argv fs/exec.c | 38 ++------ include/linux/binfmts.h | 1 - include/linux/bpfilter.h | 7 +- include/linux/sched.h | 9 -- include/linux/umd.h | 18 ++++ include/linux/umh.h | 15 ---- kernel/Makefile | 1 + kernel/exit.c | 1 - kernel/umd.c | 182 +++++++++++++++++++++++++++++++++++++++ kernel/umh.c | 171 +----------------------------------- net/bpfilter/bpfilter_kern.c | 38 ++++---- net/bpfilter/bpfilter_umh_blob.S | 2 +- net/ipv4/bpfilter/sockopt.c | 20 +++-- 13 files changed, 248 insertions(+), 255 deletions(-) v1: https://lkml.kernel.org/r/87pn9mgfc2.fsf_-_@x220.int.ebiederm.org --- git range-diff master v1 v2 1: 2b76f9b3158d ! 1: d8fb851fa3d8 umh: Capture the pid in umh_pipe_setup @@ Commit message code that is specific to user mode drivers from the common user path of user mode helpers. + Link: https://lkml.kernel.org/r/87h7uygf9i.fsf_-_@x220.int.ebiederm.org + Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> ## include/linux/umh.h ## 2: d853e933ae32 ! 2: b191c5df43ec umh: Move setting PF_UMH into umh_pipe_setup @@ Commit message Setting PF_UMH unconditionally is harmless as an action will only happen if it is paired with an entry on umh_list. + Link: https://lkml.kernel.org/r/87bll6gf8t.fsf_-_@x220.int.ebiederm.org + Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> ## kernel/umh.c ## 3: 92d2550f0d6a ! 3: 74e8c0bf3076 umh: Rename the user mode driver helpers for clarity @@ Commit message don't make much sense. Instead name them umd_setup and umd_cleanup for the functional role in setting up user mode drivers. + Link: https://lkml.kernel.org/r/875zbegf82.fsf_-_@x220.int.ebiederm.org + Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> ## kernel/umh.c ## @@ kernel/umh.c: static int umh_pipe_setup(struct subprocess_info *info, struct cre { struct umh_info *umh_info = info->data; +- /* cleanup if umh_pipe_setup() was successful but exec failed */ ++ /* cleanup if umh_setup() was successful but exec failed */ + if (info->retval) { + fput(umh_info->pipe_to_umh); + fput(umh_info->pipe_from_umh); @@ kernel/umh.c: int fork_usermode_blob(void *data, size_t len, struct umh_info *info) } 4: 5a9cc2c6c64f ! 4: 6652f7c0a909 umh: Remove call_usermodehelper_setup_file. @@ Commit message For this to work the argv_free is moved from umh_clean_and_save_pid to fork_usermode_blob. + Link: https://lkml.kernel.org/r/87zh8qf0mp.fsf_-_@x220.int.ebiederm.org + Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> ## include/linux/umh.h ## 5: 03ed13fa8eee ! 5: 2a1ccb05cf9f umh: Separate the user mode driver and the user mode helper support @@ Commit message This makes the kernel smaller for everyone who does not use a usermode driver. + v2: Moved exit_umh from sched.h to umd.h and handle the case when the + code is compiled out. + + Link: https://lkml.kernel.org/r/87tuyyf0ln.fsf_-_@x220.int.ebiederm.org + Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> ## include/linux/bpfilter.h ## @@ include/linux/bpfilter.h struct sock; int bpfilter_ip_set_sockopt(struct sock *sk, int optname, char __user *optval, + ## include/linux/sched.h ## +@@ include/linux/sched.h: static inline void rseq_execve(struct task_struct *t) + + #endif + +-void __exit_umh(struct task_struct *tsk); +- +-static inline void exit_umh(struct task_struct *tsk) +-{ +- if (unlikely(tsk->flags & PF_UMH)) +- __exit_umh(tsk); +-} +- + #ifdef CONFIG_DEBUG_RSEQ + + void rseq_syscall(struct pt_regs *regs); + ## include/linux/umd.h (new) ## @@ +#ifndef __LINUX_UMD_H__ @@ include/linux/umd.h (new) + +#include <linux/umh.h> + ++#ifdef CONFIG_BPFILTER ++void __exit_umh(struct task_struct *tsk); ++ ++static inline void exit_umh(struct task_struct *tsk) ++{ ++ if (unlikely(tsk->flags & PF_UMH)) ++ __exit_umh(tsk); ++} ++#else ++static inline void exit_umh(struct task_struct *tsk) ++{ ++} ++#endif ++ +struct umh_info { + const char *cmdline; + struct file *pipe_to_umh; @@ kernel/Makefile: obj-y = fork.o exec_domain.o panic.o \ obj-$(CONFIG_MULTIUSER) += groups.o + ## kernel/exit.c ## +@@ + #include <linux/random.h> + #include <linux/rcuwait.h> + #include <linux/compat.h> ++#include <linux/umd.h> + + #include <linux/uaccess.h> + #include <asm/unistd.h> + ## kernel/umd.c (new) ## @@ +// SPDX-License-Identifier: GPL-2.0-only @@ kernel/umd.c (new) +{ + struct umh_info *umh_info = info->data; + -+ /* cleanup if umh_pipe_setup() was successful but exec failed */ ++ /* cleanup if umh_setup() was successful but exec failed */ + if (info->retval) { + fput(umh_info->pipe_to_umh); + fput(umh_info->pipe_from_umh); @@ kernel/umh.c: struct subprocess_info *call_usermodehelper_setup(const char *path -{ - struct umh_info *umh_info = info->data; - -- /* cleanup if umh_pipe_setup() was successful but exec failed */ +- /* cleanup if umh_setup() was successful but exec failed */ - if (info->retval) { - fput(umh_info->pipe_to_umh); - fput(umh_info->pipe_from_umh); 6: 698bfbcb6c7f ! 6: b16081fb8d92 umd: For clarity rename umh_info umd_info @@ Commit message This structure is only used for user mode drivers so change the prefix from umh to umd to make that clear. + Link: https://lkml.kernel.org/r/87o8p6f0kw.fsf_-_@x220.int.ebiederm.org + Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> ## include/linux/bpfilter.h ## @@ include/linux/bpfilter.h: int bpfilter_ip_set_sockopt(struct sock *sk, int optna int (*sockopt)(struct sock *sk, int optname, ## include/linux/umd.h ## -@@ - - #include <linux/umh.h> +@@ include/linux/umd.h: static inline void exit_umh(struct task_struct *tsk) + } + #endif -struct umh_info { +struct umd_info { @@ kernel/umd.c: static int umd_setup(struct subprocess_info *info, struct cred *ne - struct umh_info *umh_info = info->data; + struct umd_info *umd_info = info->data; - /* cleanup if umh_pipe_setup() was successful but exec failed */ + /* cleanup if umh_setup() was successful but exec failed */ if (info->retval) { - fput(umh_info->pipe_to_umh); - fput(umh_info->pipe_from_umh); 7: 9cdcb5e7fc61 ! 7: 42c13aa9c526 umd: Rename umd_info.cmdline umd_info.driver_name @@ Commit message driver_name any place where the code is looking for a name of the binary. + Link: https://lkml.kernel.org/r/87imfef0k3.fsf_-_@x220.int.ebiederm.org + Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> ## include/linux/umd.h ## -@@ - #include <linux/umh.h> +@@ include/linux/umd.h: static inline void exit_umh(struct task_struct *tsk) + #endif struct umd_info { - const char *cmdline; 8: 5ada2f70ae21 ! 8: 385ed14a025b umd: Transform fork_usermode_blob into fork_usermode_driver @@ Commit message path based LSMs there are no new special cases. [1] https://lore.kernel.org/linux-fsdevel/2a8775b4-1dd5-9d5c-aa42-9872445e0942@i-love.sakura.ne.jp/ + Link: https://lkml.kernel.org/r/87d05mf0j9.fsf_-_@x220.int.ebiederm.org + Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> ## include/linux/umd.h ## @@ include/linux/umd.h #include <linux/umh.h> +#include <linux/path.h> - struct umd_info { - const char *driver_name; + #ifdef CONFIG_BPFILTER + void __exit_umh(struct task_struct *tsk); @@ include/linux/umd.h: struct umd_info { struct file *pipe_from_umh; struct list_head list; @@ kernel/umd.c #include <linux/pipe_fs_i.h> +#include <linux/mount.h> +#include <linux/fs_struct.h> ++#include <linux/task_work.h> #include <linux/umd.h> static LIST_HEAD(umh_list); @@ kernel/umd.c + return ERR_PTR(err); + } + -+ __fput_sync(file); ++ fput(file); ++ ++ /* Flush delayed fput so exec can open the file read-only */ ++ flush_delayed_fput(); ++ task_work_run(); + return mnt; +} + 9: e4ff478e77c9 ! 9: eeae92e3f0da umh: Stop calling do_execve_file @@ Commit message call_usermodehelper_exec_async that would call do_execve_file instead of do_execve if file was set. + Link: https://lkml.kernel.org/r/877dvuf0i7.fsf_-_@x220.int.ebiederm.org + Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> ## include/linux/umh.h ## 10: dc0a38f6bd51 ! 10: c7fdaf5660b8 exec: Remove do_execve_file @@ Commit message Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> [1] https://lore.kernel.org/linux-fsdevel/2a8775b4-1dd5-9d5c-aa42-9872445e0942@i-love.sakura.ne.jp/ + Link: https://lkml.kernel.org/r/871rm2f0hi.fsf_-_@x220.int.ebiederm.org + Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> ## fs/exec.c ## 11: d0c0c2ddf53b ! 11: 43d08e6986a7 bpfilter: Move bpfilter_umh back into init data @@ Commit message the blob the blob no longer needs to live .rodata to allow for restarting. So move the blob back to .init.rodata. + Link: https://lkml.kernel.org/r/87sgeidlvq.fsf_-_@x220.int.ebiederm.org + Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> ## net/bpfilter/bpfilter_umh_blob.S ## 12: 51b703ad75dd ! 12: 729ee744af46 umd: Track user space drivers with struct pid @@ Commit message As the tgid is now refcounted verify the tgid is NULL at the start of fork_usermode_driver to avoid the possibility of silent pid leaks. + Link: https://lkml.kernel.org/r/87mu4qdlv2.fsf_-_@x220.int.ebiederm.org + Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> ## include/linux/umd.h ## 13: cdadf89503c9 ! 13: 2d85b10b965e bpfilter: Take advantage of the facilities of struct pid @@ Commit message struct pid can be tested to see if a process still exists, and that struct pid has a wait queue that notifies when the process dies. + Link: https://lkml.kernel.org/r/87h7uydlu9.fsf_-_@x220.int.ebiederm.org + Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> ## include/linux/bpfilter.h ## 14: 1d621649e144 ! 14: 6e7e8ddd2b44 umd: Remove exit_umh @@ Commit message callback is what exit_umh exists to call. So remove exit_umh and all of it's associated booking. + Link: https://lkml.kernel.org/r/87bll6dlte.fsf_-_@x220.int.ebiederm.org + Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> ## include/linux/sched.h ## @@ include/linux/sched.h: extern struct pid *cad_pid; #define PF_NO_SETAFFINITY 0x04000000 /* Userland is not allowed to meddle with cpus_mask */ #define PF_MCE_EARLY 0x08000000 /* Early kill for mce process policy */ #define PF_MEMALLOC_NOCMA 0x10000000 /* All allocation request will have _GFP_MOVABLE cleared */ -@@ include/linux/sched.h: static inline void rseq_execve(struct task_struct *t) - - #endif + + ## include/linux/umd.h ## +@@ + #include <linux/umh.h> + #include <linux/path.h> +-#ifdef CONFIG_BPFILTER -void __exit_umh(struct task_struct *tsk); - -static inline void exit_umh(struct task_struct *tsk) @@ include/linux/sched.h: static inline void rseq_execve(struct task_struct *t) - if (unlikely(tsk->flags & PF_UMH)) - __exit_umh(tsk); -} +-#else +-static inline void exit_umh(struct task_struct *tsk) +-{ +-} +-#endif - - #ifdef CONFIG_DEBUG_RSEQ - - void rseq_syscall(struct pt_regs *regs); - - ## include/linux/umd.h ## -@@ include/linux/umd.h: struct umd_info { + struct umd_info { const char *driver_name; struct file *pipe_to_umh; struct file *pipe_from_umh; @@ include/linux/umd.h: struct umd_info { }; ## kernel/exit.c ## +@@ + #include <linux/random.h> + #include <linux/rcuwait.h> + #include <linux/compat.h> +-#include <linux/umd.h> + + #include <linux/uaccess.h> + #include <asm/unistd.h> @@ kernel/exit.c: void __noreturn do_exit(long code) exit_task_namespaces(tsk); exit_task_work(tsk); @@ kernel/exit.c: void __noreturn do_exit(long code) ## kernel/umd.c ## @@ - #include <linux/fs_struct.h> + #include <linux/task_work.h> #include <linux/umd.h> -static LIST_HEAD(umh_list); -: ------------ > 15: 662deff06d76 umd: Stop using split_argv ^ permalink raw reply [flat|nested] 72+ messages in thread
* [PATCH v2 01/15] umh: Capture the pid in umh_pipe_setup 2020-06-29 19:55 ` [PATCH v2 00/15] Make the user mode driver code a better citizen Eric W. Biederman @ 2020-06-29 19:56 ` Eric W. Biederman 2020-06-29 19:57 ` [PATCH v2 02/15] umh: Move setting PF_UMH into umh_pipe_setup Eric W. Biederman ` (16 subsequent siblings) 17 siblings, 0 replies; 72+ messages in thread From: Eric W. Biederman @ 2020-06-29 19:56 UTC (permalink / raw) To: linux-kernel Cc: David Miller, Greg Kroah-Hartman, Tetsuo Handa, Alexei Starovoitov, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds The pid in struct subprocess_info is only used by umh_clean_and_save_pid to write the pid into umh_info. Instead always capture the pid on struct umh_info in umh_pipe_setup, removing code that is specific to user mode drivers from the common user path of user mode helpers. Link: https://lkml.kernel.org/r/87h7uygf9i.fsf_-_@x220.int.ebiederm.org Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- include/linux/umh.h | 1 - kernel/umh.c | 5 ++--- 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/include/linux/umh.h b/include/linux/umh.h index 0c08de356d0d..aae16a0ebd0f 100644 --- a/include/linux/umh.h +++ b/include/linux/umh.h @@ -25,7 +25,6 @@ struct subprocess_info { struct file *file; int wait; int retval; - pid_t pid; int (*init)(struct subprocess_info *info, struct cred *new); void (*cleanup)(struct subprocess_info *info); void *data; diff --git a/kernel/umh.c b/kernel/umh.c index 79f139a7ca03..c2a582b3a2bf 100644 --- a/kernel/umh.c +++ b/kernel/umh.c @@ -102,7 +102,6 @@ static int call_usermodehelper_exec_async(void *data) commit_creds(new); - sub_info->pid = task_pid_nr(current); if (sub_info->file) { retval = do_execve_file(sub_info->file, sub_info->argv, sub_info->envp); @@ -468,6 +467,7 @@ static int umh_pipe_setup(struct subprocess_info *info, struct cred *new) umh_info->pipe_to_umh = to_umh[1]; umh_info->pipe_from_umh = from_umh[0]; + umh_info->pid = task_pid_nr(current); return 0; } @@ -476,13 +476,12 @@ static void umh_clean_and_save_pid(struct subprocess_info *info) struct umh_info *umh_info = info->data; /* cleanup if umh_pipe_setup() was successful but exec failed */ - if (info->pid && info->retval) { + if (info->retval) { fput(umh_info->pipe_to_umh); fput(umh_info->pipe_from_umh); } argv_free(info->argv); - umh_info->pid = info->pid; } /** -- 2.25.0 ^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH v2 02/15] umh: Move setting PF_UMH into umh_pipe_setup 2020-06-29 19:55 ` [PATCH v2 00/15] Make the user mode driver code a better citizen Eric W. Biederman 2020-06-29 19:56 ` [PATCH v2 01/15] umh: Capture the pid in umh_pipe_setup Eric W. Biederman @ 2020-06-29 19:57 ` Eric W. Biederman 2020-06-29 19:57 ` [PATCH v2 03/15] umh: Rename the user mode driver helpers for clarity Eric W. Biederman ` (15 subsequent siblings) 17 siblings, 0 replies; 72+ messages in thread From: Eric W. Biederman @ 2020-06-29 19:57 UTC (permalink / raw) To: linux-kernel Cc: David Miller, Greg Kroah-Hartman, Tetsuo Handa, Alexei Starovoitov, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds I am separating the code specific to user mode drivers from the code for ordinary user space helpers. Move setting of PF_UMH from call_usermodehelper_exec_async which is core user mode helper code into umh_pipe_setup which is user mode driver code. The code is equally as easy to write in one location as the other and the movement minimizes the impact of the user mode driver code on the core of the user mode helper code. Setting PF_UMH unconditionally is harmless as an action will only happen if it is paired with an entry on umh_list. Link: https://lkml.kernel.org/r/87bll6gf8t.fsf_-_@x220.int.ebiederm.org Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- kernel/umh.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/kernel/umh.c b/kernel/umh.c index c2a582b3a2bf..e6b9d6636850 100644 --- a/kernel/umh.c +++ b/kernel/umh.c @@ -102,12 +102,10 @@ static int call_usermodehelper_exec_async(void *data) commit_creds(new); - if (sub_info->file) { + if (sub_info->file) retval = do_execve_file(sub_info->file, sub_info->argv, sub_info->envp); - if (!retval) - current->flags |= PF_UMH; - } else + else retval = do_execve(getname_kernel(sub_info->path), (const char __user *const __user *)sub_info->argv, (const char __user *const __user *)sub_info->envp); @@ -468,6 +466,7 @@ static int umh_pipe_setup(struct subprocess_info *info, struct cred *new) umh_info->pipe_to_umh = to_umh[1]; umh_info->pipe_from_umh = from_umh[0]; umh_info->pid = task_pid_nr(current); + current->flags |= PF_UMH; return 0; } -- 2.25.0 ^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH v2 03/15] umh: Rename the user mode driver helpers for clarity 2020-06-29 19:55 ` [PATCH v2 00/15] Make the user mode driver code a better citizen Eric W. Biederman 2020-06-29 19:56 ` [PATCH v2 01/15] umh: Capture the pid in umh_pipe_setup Eric W. Biederman 2020-06-29 19:57 ` [PATCH v2 02/15] umh: Move setting PF_UMH into umh_pipe_setup Eric W. Biederman @ 2020-06-29 19:57 ` Eric W. Biederman 2020-06-29 19:59 ` [PATCH v2 04/15] umh: Remove call_usermodehelper_setup_file Eric W. Biederman ` (14 subsequent siblings) 17 siblings, 0 replies; 72+ messages in thread From: Eric W. Biederman @ 2020-06-29 19:57 UTC (permalink / raw) To: linux-kernel Cc: David Miller, Greg Kroah-Hartman, Tetsuo Handa, Alexei Starovoitov, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds Now that the functionality of umh_setup_pipe and umh_clean_and_save_pid has changed their names are too specific and don't make much sense. Instead name them umd_setup and umd_cleanup for the functional role in setting up user mode drivers. Link: https://lkml.kernel.org/r/875zbegf82.fsf_-_@x220.int.ebiederm.org Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- kernel/umh.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/kernel/umh.c b/kernel/umh.c index e6b9d6636850..26c3d493f168 100644 --- a/kernel/umh.c +++ b/kernel/umh.c @@ -429,7 +429,7 @@ struct subprocess_info *call_usermodehelper_setup_file(struct file *file, return sub_info; } -static int umh_pipe_setup(struct subprocess_info *info, struct cred *new) +static int umd_setup(struct subprocess_info *info, struct cred *new) { struct umh_info *umh_info = info->data; struct file *from_umh[2]; @@ -470,11 +470,11 @@ static int umh_pipe_setup(struct subprocess_info *info, struct cred *new) return 0; } -static void umh_clean_and_save_pid(struct subprocess_info *info) +static void umd_cleanup(struct subprocess_info *info) { struct umh_info *umh_info = info->data; - /* cleanup if umh_pipe_setup() was successful but exec failed */ + /* cleanup if umh_setup() was successful but exec failed */ if (info->retval) { fput(umh_info->pipe_to_umh); fput(umh_info->pipe_from_umh); @@ -520,8 +520,8 @@ int fork_usermode_blob(void *data, size_t len, struct umh_info *info) } err = -ENOMEM; - sub_info = call_usermodehelper_setup_file(file, umh_pipe_setup, - umh_clean_and_save_pid, info); + sub_info = call_usermodehelper_setup_file(file, umd_setup, umd_cleanup, + info); if (!sub_info) goto out; -- 2.25.0 ^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH v2 04/15] umh: Remove call_usermodehelper_setup_file. 2020-06-29 19:55 ` [PATCH v2 00/15] Make the user mode driver code a better citizen Eric W. Biederman ` (2 preceding siblings ...) 2020-06-29 19:57 ` [PATCH v2 03/15] umh: Rename the user mode driver helpers for clarity Eric W. Biederman @ 2020-06-29 19:59 ` Eric W. Biederman 2020-06-29 20:00 ` [PATCH v2 05/15] umh: Separate the user mode driver and the user mode helper support Eric W. Biederman ` (13 subsequent siblings) 17 siblings, 0 replies; 72+ messages in thread From: Eric W. Biederman @ 2020-06-29 19:59 UTC (permalink / raw) To: linux-kernel Cc: David Miller, Greg Kroah-Hartman, Tetsuo Handa, Alexei Starovoitov, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds The only caller of call_usermodehelper_setup_file is fork_usermode_blob. In fork_usermode_blob replace call_usermodehelper_setup_file with call_usermodehelper_setup and delete fork_usermodehelper_setup_file. For this to work the argv_free is moved from umh_clean_and_save_pid to fork_usermode_blob. Link: https://lkml.kernel.org/r/87zh8qf0mp.fsf_-_@x220.int.ebiederm.org Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- include/linux/umh.h | 3 --- kernel/umh.c | 42 +++++++++++------------------------------- 2 files changed, 11 insertions(+), 34 deletions(-) diff --git a/include/linux/umh.h b/include/linux/umh.h index aae16a0ebd0f..de08af00c68a 100644 --- a/include/linux/umh.h +++ b/include/linux/umh.h @@ -39,9 +39,6 @@ call_usermodehelper_setup(const char *path, char **argv, char **envp, int (*init)(struct subprocess_info *info, struct cred *new), void (*cleanup)(struct subprocess_info *), void *data); -struct subprocess_info *call_usermodehelper_setup_file(struct file *file, - int (*init)(struct subprocess_info *info, struct cred *new), - void (*cleanup)(struct subprocess_info *), void *data); struct umh_info { const char *cmdline; struct file *pipe_to_umh; diff --git a/kernel/umh.c b/kernel/umh.c index 26c3d493f168..b8fa9b99b366 100644 --- a/kernel/umh.c +++ b/kernel/umh.c @@ -402,33 +402,6 @@ struct subprocess_info *call_usermodehelper_setup(const char *path, char **argv, } EXPORT_SYMBOL(call_usermodehelper_setup); -struct subprocess_info *call_usermodehelper_setup_file(struct file *file, - int (*init)(struct subprocess_info *info, struct cred *new), - void (*cleanup)(struct subprocess_info *info), void *data) -{ - struct subprocess_info *sub_info; - struct umh_info *info = data; - const char *cmdline = (info->cmdline) ? info->cmdline : "usermodehelper"; - - sub_info = kzalloc(sizeof(struct subprocess_info), GFP_KERNEL); - if (!sub_info) - return NULL; - - sub_info->argv = argv_split(GFP_KERNEL, cmdline, NULL); - if (!sub_info->argv) { - kfree(sub_info); - return NULL; - } - - INIT_WORK(&sub_info->work, call_usermodehelper_exec_work); - sub_info->path = "none"; - sub_info->file = file; - sub_info->init = init; - sub_info->cleanup = cleanup; - sub_info->data = data; - return sub_info; -} - static int umd_setup(struct subprocess_info *info, struct cred *new) { struct umh_info *umh_info = info->data; @@ -479,8 +452,6 @@ static void umd_cleanup(struct subprocess_info *info) fput(umh_info->pipe_to_umh); fput(umh_info->pipe_from_umh); } - - argv_free(info->argv); } /** @@ -501,7 +472,9 @@ static void umd_cleanup(struct subprocess_info *info) */ int fork_usermode_blob(void *data, size_t len, struct umh_info *info) { + const char *cmdline = (info->cmdline) ? info->cmdline : "usermodehelper"; struct subprocess_info *sub_info; + char **argv = NULL; struct file *file; ssize_t written; loff_t pos = 0; @@ -520,11 +493,16 @@ int fork_usermode_blob(void *data, size_t len, struct umh_info *info) } err = -ENOMEM; - sub_info = call_usermodehelper_setup_file(file, umd_setup, umd_cleanup, - info); + argv = argv_split(GFP_KERNEL, cmdline, NULL); + if (!argv) + goto out; + + sub_info = call_usermodehelper_setup("none", argv, NULL, GFP_KERNEL, + umd_setup, umd_cleanup, info); if (!sub_info) goto out; + sub_info->file = file; err = call_usermodehelper_exec(sub_info, UMH_WAIT_EXEC); if (!err) { mutex_lock(&umh_list_lock); @@ -532,6 +510,8 @@ int fork_usermode_blob(void *data, size_t len, struct umh_info *info) mutex_unlock(&umh_list_lock); } out: + if (argv) + argv_free(argv); fput(file); return err; } -- 2.25.0 ^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH v2 05/15] umh: Separate the user mode driver and the user mode helper support 2020-06-29 19:55 ` [PATCH v2 00/15] Make the user mode driver code a better citizen Eric W. Biederman ` (3 preceding siblings ...) 2020-06-29 19:59 ` [PATCH v2 04/15] umh: Remove call_usermodehelper_setup_file Eric W. Biederman @ 2020-06-29 20:00 ` Eric W. Biederman 2020-06-30 16:58 ` Linus Torvalds 2020-06-29 20:01 ` [PATCH v2 06/15] umd: For clarity rename umh_info umd_info Eric W. Biederman ` (12 subsequent siblings) 17 siblings, 1 reply; 72+ messages in thread From: Eric W. Biederman @ 2020-06-29 20:00 UTC (permalink / raw) To: linux-kernel Cc: David Miller, Greg Kroah-Hartman, Tetsuo Handa, Alexei Starovoitov, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds This makes it clear which code is part of the core user mode helper support and which code is needed to implement user mode drivers. This makes the kernel smaller for everyone who does not use a usermode driver. v2: Moved exit_umh from sched.h to umd.h and handle the case when the code is compiled out. Link: https://lkml.kernel.org/r/87tuyyf0ln.fsf_-_@x220.int.ebiederm.org Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- include/linux/bpfilter.h | 2 +- include/linux/sched.h | 8 --- include/linux/umd.h | 30 ++++++++ include/linux/umh.h | 10 --- kernel/Makefile | 1 + kernel/exit.c | 1 + kernel/umd.c | 146 +++++++++++++++++++++++++++++++++++++++ kernel/umh.c | 139 ------------------------------------- 8 files changed, 179 insertions(+), 158 deletions(-) create mode 100644 include/linux/umd.h create mode 100644 kernel/umd.c diff --git a/include/linux/bpfilter.h b/include/linux/bpfilter.h index d815622cd31e..b42e44e29033 100644 --- a/include/linux/bpfilter.h +++ b/include/linux/bpfilter.h @@ -3,7 +3,7 @@ #define _LINUX_BPFILTER_H #include <uapi/linux/bpfilter.h> -#include <linux/umh.h> +#include <linux/umd.h> struct sock; int bpfilter_ip_set_sockopt(struct sock *sk, int optname, char __user *optval, diff --git a/include/linux/sched.h b/include/linux/sched.h index b62e6aaf28f0..59d1e92bb88e 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -2020,14 +2020,6 @@ static inline void rseq_execve(struct task_struct *t) #endif -void __exit_umh(struct task_struct *tsk); - -static inline void exit_umh(struct task_struct *tsk) -{ - if (unlikely(tsk->flags & PF_UMH)) - __exit_umh(tsk); -} - #ifdef CONFIG_DEBUG_RSEQ void rseq_syscall(struct pt_regs *regs); diff --git a/include/linux/umd.h b/include/linux/umd.h new file mode 100644 index 000000000000..ef40bee590c1 --- /dev/null +++ b/include/linux/umd.h @@ -0,0 +1,30 @@ +#ifndef __LINUX_UMD_H__ +#define __LINUX_UMD_H__ + +#include <linux/umh.h> + +#ifdef CONFIG_BPFILTER +void __exit_umh(struct task_struct *tsk); + +static inline void exit_umh(struct task_struct *tsk) +{ + if (unlikely(tsk->flags & PF_UMH)) + __exit_umh(tsk); +} +#else +static inline void exit_umh(struct task_struct *tsk) +{ +} +#endif + +struct umh_info { + const char *cmdline; + struct file *pipe_to_umh; + struct file *pipe_from_umh; + struct list_head list; + void (*cleanup)(struct umh_info *info); + pid_t pid; +}; +int fork_usermode_blob(void *data, size_t len, struct umh_info *info); + +#endif /* __LINUX_UMD_H__ */ diff --git a/include/linux/umh.h b/include/linux/umh.h index de08af00c68a..73173c4a07e5 100644 --- a/include/linux/umh.h +++ b/include/linux/umh.h @@ -39,16 +39,6 @@ call_usermodehelper_setup(const char *path, char **argv, char **envp, int (*init)(struct subprocess_info *info, struct cred *new), void (*cleanup)(struct subprocess_info *), void *data); -struct umh_info { - const char *cmdline; - struct file *pipe_to_umh; - struct file *pipe_from_umh; - struct list_head list; - void (*cleanup)(struct umh_info *info); - pid_t pid; -}; -int fork_usermode_blob(void *data, size_t len, struct umh_info *info); - extern int call_usermodehelper_exec(struct subprocess_info *info, int wait); diff --git a/kernel/Makefile b/kernel/Makefile index f3218bc5ec69..a81d7354323c 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -12,6 +12,7 @@ obj-y = fork.o exec_domain.o panic.o \ notifier.o ksysfs.o cred.o reboot.o \ async.o range.o smpboot.o ucount.o +obj-$(CONFIG_BPFILTER) += umd.o obj-$(CONFIG_MODULES) += kmod.o obj-$(CONFIG_MULTIUSER) += groups.o diff --git a/kernel/exit.c b/kernel/exit.c index 727150f28103..b94fe03e609c 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -63,6 +63,7 @@ #include <linux/random.h> #include <linux/rcuwait.h> #include <linux/compat.h> +#include <linux/umd.h> #include <linux/uaccess.h> #include <asm/unistd.h> diff --git a/kernel/umd.c b/kernel/umd.c new file mode 100644 index 000000000000..99af9d594eca --- /dev/null +++ b/kernel/umd.c @@ -0,0 +1,146 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * umd - User mode driver support + */ +#include <linux/shmem_fs.h> +#include <linux/pipe_fs_i.h> +#include <linux/umd.h> + +static LIST_HEAD(umh_list); +static DEFINE_MUTEX(umh_list_lock); + +static int umd_setup(struct subprocess_info *info, struct cred *new) +{ + struct umh_info *umh_info = info->data; + struct file *from_umh[2]; + struct file *to_umh[2]; + int err; + + /* create pipe to send data to umh */ + err = create_pipe_files(to_umh, 0); + if (err) + return err; + err = replace_fd(0, to_umh[0], 0); + fput(to_umh[0]); + if (err < 0) { + fput(to_umh[1]); + return err; + } + + /* create pipe to receive data from umh */ + err = create_pipe_files(from_umh, 0); + if (err) { + fput(to_umh[1]); + replace_fd(0, NULL, 0); + return err; + } + err = replace_fd(1, from_umh[1], 0); + fput(from_umh[1]); + if (err < 0) { + fput(to_umh[1]); + replace_fd(0, NULL, 0); + fput(from_umh[0]); + return err; + } + + umh_info->pipe_to_umh = to_umh[1]; + umh_info->pipe_from_umh = from_umh[0]; + umh_info->pid = task_pid_nr(current); + current->flags |= PF_UMH; + return 0; +} + +static void umd_cleanup(struct subprocess_info *info) +{ + struct umh_info *umh_info = info->data; + + /* cleanup if umh_setup() was successful but exec failed */ + if (info->retval) { + fput(umh_info->pipe_to_umh); + fput(umh_info->pipe_from_umh); + } +} + +/** + * fork_usermode_blob - fork a blob of bytes as a usermode process + * @data: a blob of bytes that can be do_execv-ed as a file + * @len: length of the blob + * @info: information about usermode process (shouldn't be NULL) + * + * If info->cmdline is set it will be used as command line for the + * user process, else "usermodehelper" is used. + * + * Returns either negative error or zero which indicates success + * in executing a blob of bytes as a usermode process. In such + * case 'struct umh_info *info' is populated with two pipes + * and a pid of the process. The caller is responsible for health + * check of the user process, killing it via pid, and closing the + * pipes when user process is no longer needed. + */ +int fork_usermode_blob(void *data, size_t len, struct umh_info *info) +{ + const char *cmdline = (info->cmdline) ? info->cmdline : "usermodehelper"; + struct subprocess_info *sub_info; + char **argv = NULL; + struct file *file; + ssize_t written; + loff_t pos = 0; + int err; + + file = shmem_kernel_file_setup("", len, 0); + if (IS_ERR(file)) + return PTR_ERR(file); + + written = kernel_write(file, data, len, &pos); + if (written != len) { + err = written; + if (err >= 0) + err = -ENOMEM; + goto out; + } + + err = -ENOMEM; + argv = argv_split(GFP_KERNEL, cmdline, NULL); + if (!argv) + goto out; + + sub_info = call_usermodehelper_setup("none", argv, NULL, GFP_KERNEL, + umd_setup, umd_cleanup, info); + if (!sub_info) + goto out; + + sub_info->file = file; + err = call_usermodehelper_exec(sub_info, UMH_WAIT_EXEC); + if (!err) { + mutex_lock(&umh_list_lock); + list_add(&info->list, &umh_list); + mutex_unlock(&umh_list_lock); + } +out: + if (argv) + argv_free(argv); + fput(file); + return err; +} +EXPORT_SYMBOL_GPL(fork_usermode_blob); + +void __exit_umh(struct task_struct *tsk) +{ + struct umh_info *info; + pid_t pid = tsk->pid; + + mutex_lock(&umh_list_lock); + list_for_each_entry(info, &umh_list, list) { + if (info->pid == pid) { + list_del(&info->list); + mutex_unlock(&umh_list_lock); + goto out; + } + } + mutex_unlock(&umh_list_lock); + return; +out: + if (info->cleanup) + info->cleanup(info); +} + diff --git a/kernel/umh.c b/kernel/umh.c index b8fa9b99b366..3e4e453d45c8 100644 --- a/kernel/umh.c +++ b/kernel/umh.c @@ -26,8 +26,6 @@ #include <linux/ptrace.h> #include <linux/async.h> #include <linux/uaccess.h> -#include <linux/shmem_fs.h> -#include <linux/pipe_fs_i.h> #include <trace/events/module.h> @@ -38,8 +36,6 @@ static kernel_cap_t usermodehelper_bset = CAP_FULL_SET; static kernel_cap_t usermodehelper_inheritable = CAP_FULL_SET; static DEFINE_SPINLOCK(umh_sysctl_lock); static DECLARE_RWSEM(umhelper_sem); -static LIST_HEAD(umh_list); -static DEFINE_MUTEX(umh_list_lock); static void call_usermodehelper_freeinfo(struct subprocess_info *info) { @@ -402,121 +398,6 @@ struct subprocess_info *call_usermodehelper_setup(const char *path, char **argv, } EXPORT_SYMBOL(call_usermodehelper_setup); -static int umd_setup(struct subprocess_info *info, struct cred *new) -{ - struct umh_info *umh_info = info->data; - struct file *from_umh[2]; - struct file *to_umh[2]; - int err; - - /* create pipe to send data to umh */ - err = create_pipe_files(to_umh, 0); - if (err) - return err; - err = replace_fd(0, to_umh[0], 0); - fput(to_umh[0]); - if (err < 0) { - fput(to_umh[1]); - return err; - } - - /* create pipe to receive data from umh */ - err = create_pipe_files(from_umh, 0); - if (err) { - fput(to_umh[1]); - replace_fd(0, NULL, 0); - return err; - } - err = replace_fd(1, from_umh[1], 0); - fput(from_umh[1]); - if (err < 0) { - fput(to_umh[1]); - replace_fd(0, NULL, 0); - fput(from_umh[0]); - return err; - } - - umh_info->pipe_to_umh = to_umh[1]; - umh_info->pipe_from_umh = from_umh[0]; - umh_info->pid = task_pid_nr(current); - current->flags |= PF_UMH; - return 0; -} - -static void umd_cleanup(struct subprocess_info *info) -{ - struct umh_info *umh_info = info->data; - - /* cleanup if umh_setup() was successful but exec failed */ - if (info->retval) { - fput(umh_info->pipe_to_umh); - fput(umh_info->pipe_from_umh); - } -} - -/** - * fork_usermode_blob - fork a blob of bytes as a usermode process - * @data: a blob of bytes that can be do_execv-ed as a file - * @len: length of the blob - * @info: information about usermode process (shouldn't be NULL) - * - * If info->cmdline is set it will be used as command line for the - * user process, else "usermodehelper" is used. - * - * Returns either negative error or zero which indicates success - * in executing a blob of bytes as a usermode process. In such - * case 'struct umh_info *info' is populated with two pipes - * and a pid of the process. The caller is responsible for health - * check of the user process, killing it via pid, and closing the - * pipes when user process is no longer needed. - */ -int fork_usermode_blob(void *data, size_t len, struct umh_info *info) -{ - const char *cmdline = (info->cmdline) ? info->cmdline : "usermodehelper"; - struct subprocess_info *sub_info; - char **argv = NULL; - struct file *file; - ssize_t written; - loff_t pos = 0; - int err; - - file = shmem_kernel_file_setup("", len, 0); - if (IS_ERR(file)) - return PTR_ERR(file); - - written = kernel_write(file, data, len, &pos); - if (written != len) { - err = written; - if (err >= 0) - err = -ENOMEM; - goto out; - } - - err = -ENOMEM; - argv = argv_split(GFP_KERNEL, cmdline, NULL); - if (!argv) - goto out; - - sub_info = call_usermodehelper_setup("none", argv, NULL, GFP_KERNEL, - umd_setup, umd_cleanup, info); - if (!sub_info) - goto out; - - sub_info->file = file; - err = call_usermodehelper_exec(sub_info, UMH_WAIT_EXEC); - if (!err) { - mutex_lock(&umh_list_lock); - list_add(&info->list, &umh_list); - mutex_unlock(&umh_list_lock); - } -out: - if (argv) - argv_free(argv); - fput(file); - return err; -} -EXPORT_SYMBOL_GPL(fork_usermode_blob); - /** * call_usermodehelper_exec - start a usermode application * @sub_info: information about the subprocessa @@ -678,26 +559,6 @@ static int proc_cap_handler(struct ctl_table *table, int write, return 0; } -void __exit_umh(struct task_struct *tsk) -{ - struct umh_info *info; - pid_t pid = tsk->pid; - - mutex_lock(&umh_list_lock); - list_for_each_entry(info, &umh_list, list) { - if (info->pid == pid) { - list_del(&info->list); - mutex_unlock(&umh_list_lock); - goto out; - } - } - mutex_unlock(&umh_list_lock); - return; -out: - if (info->cleanup) - info->cleanup(info); -} - struct ctl_table usermodehelper_table[] = { { .procname = "bset", -- 2.25.0 ^ permalink raw reply related [flat|nested] 72+ messages in thread
* Re: [PATCH v2 05/15] umh: Separate the user mode driver and the user mode helper support 2020-06-29 20:00 ` [PATCH v2 05/15] umh: Separate the user mode driver and the user mode helper support Eric W. Biederman @ 2020-06-30 16:58 ` Linus Torvalds 2020-07-01 17:18 ` Eric W. Biederman 0 siblings, 1 reply; 72+ messages in thread From: Linus Torvalds @ 2020-06-30 16:58 UTC (permalink / raw) To: Eric W. Biederman Cc: Linux Kernel Mailing List, David Miller, Greg Kroah-Hartman, Tetsuo Handa, Alexei Starovoitov, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain On Mon, Jun 29, 2020 at 1:05 PM Eric W. Biederman <ebiederm@xmission.com> wrote: > > This makes it clear which code is part of the core user mode > helper support and which code is needed to implement user mode > drivers. > > kernel/umd.c | 146 +++++++++++++++++++++++++++++++++++++++ > kernel/umh.c | 139 ------------------------------------- I certainly don't object to the split, but I hate the name. We have uml, umd and umh for user mode {linux, drivers, helper} respectively.And honestly, I don't see the point in using an obscure and unreadable TLA for something like this. I really don't think it would hurt to write out even the full name with "usermode_driver.c" or something like that, would it? Then "umd" could be continued to be used as a prefix for the helper functions, by all means, but if we startv renaming files, can we do it properly? Linus ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH v2 05/15] umh: Separate the user mode driver and the user mode helper support 2020-06-30 16:58 ` Linus Torvalds @ 2020-07-01 17:18 ` Eric W. Biederman 2020-07-01 17:42 ` Alexei Starovoitov 0 siblings, 1 reply; 72+ messages in thread From: Eric W. Biederman @ 2020-07-01 17:18 UTC (permalink / raw) To: Linus Torvalds Cc: Linux Kernel Mailing List, David Miller, Greg Kroah-Hartman, Tetsuo Handa, Alexei Starovoitov, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain Linus Torvalds <torvalds@linux-foundation.org> writes: > On Mon, Jun 29, 2020 at 1:05 PM Eric W. Biederman <ebiederm@xmission.com> wrote: >> >> This makes it clear which code is part of the core user mode >> helper support and which code is needed to implement user mode >> drivers. >> >> kernel/umd.c | 146 +++++++++++++++++++++++++++++++++++++++ >> kernel/umh.c | 139 ------------------------------------- > > I certainly don't object to the split, but I hate the name. > > We have uml, umd and umh for user mode {linux, drivers, helper} > respectively.And honestly, I don't see the point in using an obscure > and unreadable TLA for something like this. > > I really don't think it would hurt to write out even the full name > with "usermode_driver.c" or something like that, would it? > > Then "umd" could be continued to be used as a prefix for the helper > functions, by all means, but if we startv renaming files, can we do it > properly? I will take care of it. I have to respin the patchset for a silly bug anyways. Eric ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH v2 05/15] umh: Separate the user mode driver and the user mode helper support 2020-07-01 17:18 ` Eric W. Biederman @ 2020-07-01 17:42 ` Alexei Starovoitov 0 siblings, 0 replies; 72+ messages in thread From: Alexei Starovoitov @ 2020-07-01 17:42 UTC (permalink / raw) To: Eric W. Biederman Cc: Linus Torvalds, Linux Kernel Mailing List, David Miller, Greg Kroah-Hartman, Tetsuo Handa, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain On Wed, Jul 1, 2020 at 10:23 AM Eric W. Biederman <ebiederm@xmission.com> wrote: > > Linus Torvalds <torvalds@linux-foundation.org> writes: > > > On Mon, Jun 29, 2020 at 1:05 PM Eric W. Biederman <ebiederm@xmission.com> wrote: > >> > >> This makes it clear which code is part of the core user mode > >> helper support and which code is needed to implement user mode > >> drivers. > >> > >> kernel/umd.c | 146 +++++++++++++++++++++++++++++++++++++++ > >> kernel/umh.c | 139 ------------------------------------- > > > > I certainly don't object to the split, but I hate the name. > > > > We have uml, umd and umh for user mode {linux, drivers, helper} > > respectively.And honestly, I don't see the point in using an obscure > > and unreadable TLA for something like this. > > > > I really don't think it would hurt to write out even the full name > > with "usermode_driver.c" or something like that, would it? > > > > Then "umd" could be continued to be used as a prefix for the helper > > functions, by all means, but if we startv renaming files, can we do it > > properly? > > I will take care of it. I have to respin the patchset for a silly bug anyways. I guess with the header name too: umd.h -> usermode_driver.h ? ^ permalink raw reply [flat|nested] 72+ messages in thread
* [PATCH v2 06/15] umd: For clarity rename umh_info umd_info 2020-06-29 19:55 ` [PATCH v2 00/15] Make the user mode driver code a better citizen Eric W. Biederman ` (4 preceding siblings ...) 2020-06-29 20:00 ` [PATCH v2 05/15] umh: Separate the user mode driver and the user mode helper support Eric W. Biederman @ 2020-06-29 20:01 ` Eric W. Biederman 2020-06-29 20:02 ` [PATCH v2 07/15] umd: Rename umd_info.cmdline umd_info.driver_name Eric W. Biederman ` (11 subsequent siblings) 17 siblings, 0 replies; 72+ messages in thread From: Eric W. Biederman @ 2020-06-29 20:01 UTC (permalink / raw) To: linux-kernel Cc: David Miller, Greg Kroah-Hartman, Tetsuo Handa, Alexei Starovoitov, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds This structure is only used for user mode drivers so change the prefix from umh to umd to make that clear. Link: https://lkml.kernel.org/r/87o8p6f0kw.fsf_-_@x220.int.ebiederm.org Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- include/linux/bpfilter.h | 2 +- include/linux/umd.h | 6 +++--- kernel/umd.c | 20 ++++++++++---------- net/ipv4/bpfilter/sockopt.c | 2 +- 4 files changed, 15 insertions(+), 15 deletions(-) diff --git a/include/linux/bpfilter.h b/include/linux/bpfilter.h index b42e44e29033..4b43d2240172 100644 --- a/include/linux/bpfilter.h +++ b/include/linux/bpfilter.h @@ -11,7 +11,7 @@ int bpfilter_ip_set_sockopt(struct sock *sk, int optname, char __user *optval, int bpfilter_ip_get_sockopt(struct sock *sk, int optname, char __user *optval, int __user *optlen); struct bpfilter_umh_ops { - struct umh_info info; + struct umd_info info; /* since ip_getsockopt() can run in parallel, serialize access to umh */ struct mutex lock; int (*sockopt)(struct sock *sk, int optname, diff --git a/include/linux/umd.h b/include/linux/umd.h index ef40bee590c1..58a9c603c78d 100644 --- a/include/linux/umd.h +++ b/include/linux/umd.h @@ -17,14 +17,14 @@ static inline void exit_umh(struct task_struct *tsk) } #endif -struct umh_info { +struct umd_info { const char *cmdline; struct file *pipe_to_umh; struct file *pipe_from_umh; struct list_head list; - void (*cleanup)(struct umh_info *info); + void (*cleanup)(struct umd_info *info); pid_t pid; }; -int fork_usermode_blob(void *data, size_t len, struct umh_info *info); +int fork_usermode_blob(void *data, size_t len, struct umd_info *info); #endif /* __LINUX_UMD_H__ */ diff --git a/kernel/umd.c b/kernel/umd.c index 99af9d594eca..f7dacb19c705 100644 --- a/kernel/umd.c +++ b/kernel/umd.c @@ -11,7 +11,7 @@ static DEFINE_MUTEX(umh_list_lock); static int umd_setup(struct subprocess_info *info, struct cred *new) { - struct umh_info *umh_info = info->data; + struct umd_info *umd_info = info->data; struct file *from_umh[2]; struct file *to_umh[2]; int err; @@ -43,21 +43,21 @@ static int umd_setup(struct subprocess_info *info, struct cred *new) return err; } - umh_info->pipe_to_umh = to_umh[1]; - umh_info->pipe_from_umh = from_umh[0]; - umh_info->pid = task_pid_nr(current); + umd_info->pipe_to_umh = to_umh[1]; + umd_info->pipe_from_umh = from_umh[0]; + umd_info->pid = task_pid_nr(current); current->flags |= PF_UMH; return 0; } static void umd_cleanup(struct subprocess_info *info) { - struct umh_info *umh_info = info->data; + struct umd_info *umd_info = info->data; /* cleanup if umh_setup() was successful but exec failed */ if (info->retval) { - fput(umh_info->pipe_to_umh); - fput(umh_info->pipe_from_umh); + fput(umd_info->pipe_to_umh); + fput(umd_info->pipe_from_umh); } } @@ -72,12 +72,12 @@ static void umd_cleanup(struct subprocess_info *info) * * Returns either negative error or zero which indicates success * in executing a blob of bytes as a usermode process. In such - * case 'struct umh_info *info' is populated with two pipes + * case 'struct umd_info *info' is populated with two pipes * and a pid of the process. The caller is responsible for health * check of the user process, killing it via pid, and closing the * pipes when user process is no longer needed. */ -int fork_usermode_blob(void *data, size_t len, struct umh_info *info) +int fork_usermode_blob(void *data, size_t len, struct umd_info *info) { const char *cmdline = (info->cmdline) ? info->cmdline : "usermodehelper"; struct subprocess_info *sub_info; @@ -126,7 +126,7 @@ EXPORT_SYMBOL_GPL(fork_usermode_blob); void __exit_umh(struct task_struct *tsk) { - struct umh_info *info; + struct umd_info *info; pid_t pid = tsk->pid; mutex_lock(&umh_list_lock); diff --git a/net/ipv4/bpfilter/sockopt.c b/net/ipv4/bpfilter/sockopt.c index 0480918bfc7c..c0dbcc86fcdb 100644 --- a/net/ipv4/bpfilter/sockopt.c +++ b/net/ipv4/bpfilter/sockopt.c @@ -12,7 +12,7 @@ struct bpfilter_umh_ops bpfilter_ops; EXPORT_SYMBOL_GPL(bpfilter_ops); -static void bpfilter_umh_cleanup(struct umh_info *info) +static void bpfilter_umh_cleanup(struct umd_info *info) { mutex_lock(&bpfilter_ops.lock); bpfilter_ops.stop = true; -- 2.25.0 ^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH v2 07/15] umd: Rename umd_info.cmdline umd_info.driver_name 2020-06-29 19:55 ` [PATCH v2 00/15] Make the user mode driver code a better citizen Eric W. Biederman ` (5 preceding siblings ...) 2020-06-29 20:01 ` [PATCH v2 06/15] umd: For clarity rename umh_info umd_info Eric W. Biederman @ 2020-06-29 20:02 ` Eric W. Biederman 2020-06-29 20:03 ` [PATCH v2 08/15] umd: Transform fork_usermode_blob into fork_usermode_driver Eric W. Biederman ` (10 subsequent siblings) 17 siblings, 0 replies; 72+ messages in thread From: Eric W. Biederman @ 2020-06-29 20:02 UTC (permalink / raw) To: linux-kernel Cc: David Miller, Greg Kroah-Hartman, Tetsuo Handa, Alexei Starovoitov, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds The only thing supplied in the cmdline today is the driver name so rename the field to clarify the code. As this value is always supplied stop trying to handle the case of a NULL cmdline. Additionally since we now have a name we can count on use the driver_name any place where the code is looking for a name of the binary. Link: https://lkml.kernel.org/r/87imfef0k3.fsf_-_@x220.int.ebiederm.org Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- include/linux/umd.h | 2 +- kernel/umd.c | 11 ++++------- net/ipv4/bpfilter/sockopt.c | 2 +- 3 files changed, 6 insertions(+), 9 deletions(-) diff --git a/include/linux/umd.h b/include/linux/umd.h index 58a9c603c78d..d827fb038d00 100644 --- a/include/linux/umd.h +++ b/include/linux/umd.h @@ -18,7 +18,7 @@ static inline void exit_umh(struct task_struct *tsk) #endif struct umd_info { - const char *cmdline; + const char *driver_name; struct file *pipe_to_umh; struct file *pipe_from_umh; struct list_head list; diff --git a/kernel/umd.c b/kernel/umd.c index f7dacb19c705..7fe08a8eb231 100644 --- a/kernel/umd.c +++ b/kernel/umd.c @@ -67,9 +67,6 @@ static void umd_cleanup(struct subprocess_info *info) * @len: length of the blob * @info: information about usermode process (shouldn't be NULL) * - * If info->cmdline is set it will be used as command line for the - * user process, else "usermodehelper" is used. - * * Returns either negative error or zero which indicates success * in executing a blob of bytes as a usermode process. In such * case 'struct umd_info *info' is populated with two pipes @@ -79,7 +76,6 @@ static void umd_cleanup(struct subprocess_info *info) */ int fork_usermode_blob(void *data, size_t len, struct umd_info *info) { - const char *cmdline = (info->cmdline) ? info->cmdline : "usermodehelper"; struct subprocess_info *sub_info; char **argv = NULL; struct file *file; @@ -87,7 +83,7 @@ int fork_usermode_blob(void *data, size_t len, struct umd_info *info) loff_t pos = 0; int err; - file = shmem_kernel_file_setup("", len, 0); + file = shmem_kernel_file_setup(info->driver_name, len, 0); if (IS_ERR(file)) return PTR_ERR(file); @@ -100,11 +96,12 @@ int fork_usermode_blob(void *data, size_t len, struct umd_info *info) } err = -ENOMEM; - argv = argv_split(GFP_KERNEL, cmdline, NULL); + argv = argv_split(GFP_KERNEL, info->driver_name, NULL); if (!argv) goto out; - sub_info = call_usermodehelper_setup("none", argv, NULL, GFP_KERNEL, + sub_info = call_usermodehelper_setup(info->driver_name, argv, NULL, + GFP_KERNEL, umd_setup, umd_cleanup, info); if (!sub_info) goto out; diff --git a/net/ipv4/bpfilter/sockopt.c b/net/ipv4/bpfilter/sockopt.c index c0dbcc86fcdb..5050de28333d 100644 --- a/net/ipv4/bpfilter/sockopt.c +++ b/net/ipv4/bpfilter/sockopt.c @@ -70,7 +70,7 @@ static int __init bpfilter_sockopt_init(void) { mutex_init(&bpfilter_ops.lock); bpfilter_ops.stop = true; - bpfilter_ops.info.cmdline = "bpfilter_umh"; + bpfilter_ops.info.driver_name = "bpfilter_umh"; bpfilter_ops.info.cleanup = &bpfilter_umh_cleanup; return 0; -- 2.25.0 ^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH v2 08/15] umd: Transform fork_usermode_blob into fork_usermode_driver 2020-06-29 19:55 ` [PATCH v2 00/15] Make the user mode driver code a better citizen Eric W. Biederman ` (6 preceding siblings ...) 2020-06-29 20:02 ` [PATCH v2 07/15] umd: Rename umd_info.cmdline umd_info.driver_name Eric W. Biederman @ 2020-06-29 20:03 ` Eric W. Biederman 2020-06-29 20:03 ` [PATCH v2 09/15] umh: Stop calling do_execve_file Eric W. Biederman ` (9 subsequent siblings) 17 siblings, 0 replies; 72+ messages in thread From: Eric W. Biederman @ 2020-06-29 20:03 UTC (permalink / raw) To: linux-kernel Cc: David Miller, Greg Kroah-Hartman, Tetsuo Handa, Alexei Starovoitov, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds Instead of loading a binary blob into a temporary file with shmem_kernel_file_setup load a binary blob into a temporary tmpfs filesystem. This means that the blob can be stored in an init section and discared, and it means the binary blob will have a filename so can be executed normally. The only tricky thing about this code is that in the helper function blob_to_mnt __fput_sync is used. That is because a file can not be executed if it is still open for write, and the ordinary delayed close for kernel threads does not happen soon enough, which causes the following exec to fail. The function umd_load_blob is not called with any locks so this should be safe. Executing the blob normally winds up correcting several problems with the user mode driver code discovered by Tetsuo Handa[1]. By passing an ordinary filename into the exec, it is no longer necessary to figure out how to turn a O_RDWR file descriptor into a properly referende counted O_EXEC file descriptor that forbids all writes. For path based LSMs there are no new special cases. [1] https://lore.kernel.org/linux-fsdevel/2a8775b4-1dd5-9d5c-aa42-9872445e0942@i-love.sakura.ne.jp/ Link: https://lkml.kernel.org/r/87d05mf0j9.fsf_-_@x220.int.ebiederm.org Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- include/linux/umd.h | 6 +- kernel/umd.c | 126 +++++++++++++++++++++++++++-------- net/bpfilter/bpfilter_kern.c | 14 +++- 3 files changed, 113 insertions(+), 33 deletions(-) diff --git a/include/linux/umd.h b/include/linux/umd.h index d827fb038d00..12ff8f753ea7 100644 --- a/include/linux/umd.h +++ b/include/linux/umd.h @@ -2,6 +2,7 @@ #define __LINUX_UMD_H__ #include <linux/umh.h> +#include <linux/path.h> #ifdef CONFIG_BPFILTER void __exit_umh(struct task_struct *tsk); @@ -23,8 +24,11 @@ struct umd_info { struct file *pipe_from_umh; struct list_head list; void (*cleanup)(struct umd_info *info); + struct path wd; pid_t pid; }; -int fork_usermode_blob(void *data, size_t len, struct umd_info *info); +int umd_load_blob(struct umd_info *info, const void *data, size_t len); +int umd_unload_blob(struct umd_info *info); +int fork_usermode_driver(struct umd_info *info); #endif /* __LINUX_UMD_H__ */ diff --git a/kernel/umd.c b/kernel/umd.c index 7fe08a8eb231..aaa6f3142e52 100644 --- a/kernel/umd.c +++ b/kernel/umd.c @@ -4,11 +4,98 @@ */ #include <linux/shmem_fs.h> #include <linux/pipe_fs_i.h> +#include <linux/mount.h> +#include <linux/fs_struct.h> +#include <linux/task_work.h> #include <linux/umd.h> static LIST_HEAD(umh_list); static DEFINE_MUTEX(umh_list_lock); +static struct vfsmount *blob_to_mnt(const void *data, size_t len, const char *name) +{ + struct file_system_type *type; + struct vfsmount *mnt; + struct file *file; + ssize_t written; + loff_t pos = 0; + + type = get_fs_type("tmpfs"); + if (!type) + return ERR_PTR(-ENODEV); + + mnt = kern_mount(type); + put_filesystem(type); + if (IS_ERR(mnt)) + return mnt; + + file = file_open_root(mnt->mnt_root, mnt, name, O_CREAT | O_WRONLY, 0700); + if (IS_ERR(file)) { + mntput(mnt); + return ERR_CAST(file); + } + + written = kernel_write(file, data, len, &pos); + if (written != len) { + int err = written; + if (err >= 0) + err = -ENOMEM; + filp_close(file, NULL); + mntput(mnt); + return ERR_PTR(err); + } + + fput(file); + + /* Flush delayed fput so exec can open the file read-only */ + flush_delayed_fput(); + task_work_run(); + return mnt; +} + +/** + * umd_load_blob - Remember a blob of bytes for fork_usermode_driver + * @info: information about usermode driver + * @data: a blob of bytes that can be executed as a file + * @len: The lentgh of the blob + * + */ +int umd_load_blob(struct umd_info *info, const void *data, size_t len) +{ + struct vfsmount *mnt; + + if (WARN_ON_ONCE(info->wd.dentry || info->wd.mnt)) + return -EBUSY; + + mnt = blob_to_mnt(data, len, info->driver_name); + if (IS_ERR(mnt)) + return PTR_ERR(mnt); + + info->wd.mnt = mnt; + info->wd.dentry = mnt->mnt_root; + return 0; +} +EXPORT_SYMBOL_GPL(umd_load_blob); + +/** + * umd_unload_blob - Disassociate @info from a previously loaded blob + * @info: information about usermode driver + * + */ +int umd_unload_blob(struct umd_info *info) +{ + if (WARN_ON_ONCE(!info->wd.mnt || + !info->wd.dentry || + info->wd.mnt->mnt_root != info->wd.dentry)) + return -EINVAL; + + kern_unmount(info->wd.mnt); + info->wd.mnt = NULL; + info->wd.dentry = NULL; + return 0; +} +EXPORT_SYMBOL_GPL(umd_unload_blob); + static int umd_setup(struct subprocess_info *info, struct cred *new) { struct umd_info *umd_info = info->data; @@ -43,6 +130,7 @@ static int umd_setup(struct subprocess_info *info, struct cred *new) return err; } + set_fs_pwd(current->fs, &umd_info->wd); umd_info->pipe_to_umh = to_umh[1]; umd_info->pipe_from_umh = from_umh[0]; umd_info->pid = task_pid_nr(current); @@ -62,39 +150,21 @@ static void umd_cleanup(struct subprocess_info *info) } /** - * fork_usermode_blob - fork a blob of bytes as a usermode process - * @data: a blob of bytes that can be do_execv-ed as a file - * @len: length of the blob - * @info: information about usermode process (shouldn't be NULL) + * fork_usermode_driver - fork a usermode driver + * @info: information about usermode driver (shouldn't be NULL) * - * Returns either negative error or zero which indicates success - * in executing a blob of bytes as a usermode process. In such - * case 'struct umd_info *info' is populated with two pipes - * and a pid of the process. The caller is responsible for health - * check of the user process, killing it via pid, and closing the - * pipes when user process is no longer needed. + * Returns either negative error or zero which indicates success in + * executing a usermode driver. In such case 'struct umd_info *info' + * is populated with two pipes and a pid of the process. The caller is + * responsible for health check of the user process, killing it via + * pid, and closing the pipes when user process is no longer needed. */ -int fork_usermode_blob(void *data, size_t len, struct umd_info *info) +int fork_usermode_driver(struct umd_info *info) { struct subprocess_info *sub_info; char **argv = NULL; - struct file *file; - ssize_t written; - loff_t pos = 0; int err; - file = shmem_kernel_file_setup(info->driver_name, len, 0); - if (IS_ERR(file)) - return PTR_ERR(file); - - written = kernel_write(file, data, len, &pos); - if (written != len) { - err = written; - if (err >= 0) - err = -ENOMEM; - goto out; - } - err = -ENOMEM; argv = argv_split(GFP_KERNEL, info->driver_name, NULL); if (!argv) @@ -106,7 +176,6 @@ int fork_usermode_blob(void *data, size_t len, struct umd_info *info) if (!sub_info) goto out; - sub_info->file = file; err = call_usermodehelper_exec(sub_info, UMH_WAIT_EXEC); if (!err) { mutex_lock(&umh_list_lock); @@ -116,10 +185,9 @@ int fork_usermode_blob(void *data, size_t len, struct umd_info *info) out: if (argv) argv_free(argv); - fput(file); return err; } -EXPORT_SYMBOL_GPL(fork_usermode_blob); +EXPORT_SYMBOL_GPL(fork_usermode_driver); void __exit_umh(struct task_struct *tsk) { diff --git a/net/bpfilter/bpfilter_kern.c b/net/bpfilter/bpfilter_kern.c index c0f0990f30b6..28883b00609d 100644 --- a/net/bpfilter/bpfilter_kern.c +++ b/net/bpfilter/bpfilter_kern.c @@ -77,9 +77,7 @@ static int start_umh(void) int err; /* fork usermode process */ - err = fork_usermode_blob(&bpfilter_umh_start, - &bpfilter_umh_end - &bpfilter_umh_start, - &bpfilter_ops.info); + err = fork_usermode_driver(&bpfilter_ops.info); if (err) return err; bpfilter_ops.stop = false; @@ -98,6 +96,12 @@ static int __init load_umh(void) { int err; + err = umd_load_blob(&bpfilter_ops.info, + &bpfilter_umh_start, + &bpfilter_umh_end - &bpfilter_umh_start); + if (err) + return err; + mutex_lock(&bpfilter_ops.lock); if (!bpfilter_ops.stop) { err = -EFAULT; @@ -110,6 +114,8 @@ static int __init load_umh(void) } out: mutex_unlock(&bpfilter_ops.lock); + if (err) + umd_unload_blob(&bpfilter_ops.info); return err; } @@ -122,6 +128,8 @@ static void __exit fini_umh(void) bpfilter_ops.sockopt = NULL; } mutex_unlock(&bpfilter_ops.lock); + + umd_unload_blob(&bpfilter_ops.info); } module_init(load_umh); module_exit(fini_umh); -- 2.25.0 ^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH v2 09/15] umh: Stop calling do_execve_file 2020-06-29 19:55 ` [PATCH v2 00/15] Make the user mode driver code a better citizen Eric W. Biederman ` (7 preceding siblings ...) 2020-06-29 20:03 ` [PATCH v2 08/15] umd: Transform fork_usermode_blob into fork_usermode_driver Eric W. Biederman @ 2020-06-29 20:03 ` Eric W. Biederman 2020-06-29 20:04 ` [PATCH v2 10/15] exec: Remove do_execve_file Eric W. Biederman ` (8 subsequent siblings) 17 siblings, 0 replies; 72+ messages in thread From: Eric W. Biederman @ 2020-06-29 20:03 UTC (permalink / raw) To: linux-kernel Cc: David Miller, Greg Kroah-Hartman, Tetsuo Handa, Alexei Starovoitov, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds With the user mode driver code changed to not set subprocess_info.file there are no more users of subproces_info.file. Remove this field from struct subprocess_info and remove the only user in call_usermodehelper_exec_async that would call do_execve_file instead of do_execve if file was set. Link: https://lkml.kernel.org/r/877dvuf0i7.fsf_-_@x220.int.ebiederm.org Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- include/linux/umh.h | 1 - kernel/umh.c | 10 +++------- 2 files changed, 3 insertions(+), 8 deletions(-) diff --git a/include/linux/umh.h b/include/linux/umh.h index 73173c4a07e5..244aff638220 100644 --- a/include/linux/umh.h +++ b/include/linux/umh.h @@ -22,7 +22,6 @@ struct subprocess_info { const char *path; char **argv; char **envp; - struct file *file; int wait; int retval; int (*init)(struct subprocess_info *info, struct cred *new); diff --git a/kernel/umh.c b/kernel/umh.c index 3e4e453d45c8..6ca2096298b9 100644 --- a/kernel/umh.c +++ b/kernel/umh.c @@ -98,13 +98,9 @@ static int call_usermodehelper_exec_async(void *data) commit_creds(new); - if (sub_info->file) - retval = do_execve_file(sub_info->file, - sub_info->argv, sub_info->envp); - else - retval = do_execve(getname_kernel(sub_info->path), - (const char __user *const __user *)sub_info->argv, - (const char __user *const __user *)sub_info->envp); + retval = do_execve(getname_kernel(sub_info->path), + (const char __user *const __user *)sub_info->argv, + (const char __user *const __user *)sub_info->envp); out: sub_info->retval = retval; /* -- 2.25.0 ^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH v2 10/15] exec: Remove do_execve_file 2020-06-29 19:55 ` [PATCH v2 00/15] Make the user mode driver code a better citizen Eric W. Biederman ` (8 preceding siblings ...) 2020-06-29 20:03 ` [PATCH v2 09/15] umh: Stop calling do_execve_file Eric W. Biederman @ 2020-06-29 20:04 ` Eric W. Biederman 2020-06-30 5:43 ` Christoph Hellwig 2020-06-29 20:05 ` [PATCH v2 11/15] bpfilter: Move bpfilter_umh back into init data Eric W. Biederman ` (7 subsequent siblings) 17 siblings, 1 reply; 72+ messages in thread From: Eric W. Biederman @ 2020-06-29 20:04 UTC (permalink / raw) To: linux-kernel Cc: David Miller, Greg Kroah-Hartman, Tetsuo Handa, Alexei Starovoitov, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds Now that the last callser has been removed remove this code from exec. For anyone thinking of resurrecing do_execve_file please note that the code was buggy in several fundamental ways. - It did not ensure the file it was passed was read-only and that deny_write_access had been called on it. Which subtlely breaks invaniants in exec. - The caller of do_execve_file was expected to hold and put a reference to the file, but an extra reference for use by exec was not taken so that when exec put it's reference to the file an underflow occured on the file reference count. - The point of the interface was so that a pathname did not need to exist. Which breaks pathname based LSMs. Tetsuo Handa originally reported these issues[1]. While it was clear that deny_write_access was missing the fundamental incompatibility with the passed in O_RDWR filehandle was not immediately recognized. All of these issues were fixed by modifying the usermode driver code to have a path, so it did not need this hack. Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> [1] https://lore.kernel.org/linux-fsdevel/2a8775b4-1dd5-9d5c-aa42-9872445e0942@i-love.sakura.ne.jp/ Link: https://lkml.kernel.org/r/871rm2f0hi.fsf_-_@x220.int.ebiederm.org Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- fs/exec.c | 38 +++++++++----------------------------- include/linux/binfmts.h | 1 - 2 files changed, 9 insertions(+), 30 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index e6e8a9a70327..23dfbb820626 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1818,13 +1818,14 @@ static int exec_binprm(struct linux_binprm *bprm) /* * sys_execve() executes a new program. */ -static int __do_execve_file(int fd, struct filename *filename, - struct user_arg_ptr argv, - struct user_arg_ptr envp, - int flags, struct file *file) +static int do_execveat_common(int fd, struct filename *filename, + struct user_arg_ptr argv, + struct user_arg_ptr envp, + int flags) { char *pathbuf = NULL; struct linux_binprm *bprm; + struct file *file; struct files_struct *displaced; int retval; @@ -1863,8 +1864,7 @@ static int __do_execve_file(int fd, struct filename *filename, check_unsafe_exec(bprm); current->in_execve = 1; - if (!file) - file = do_open_execat(fd, filename, flags); + file = do_open_execat(fd, filename, flags); retval = PTR_ERR(file); if (IS_ERR(file)) goto out_unmark; @@ -1872,9 +1872,7 @@ static int __do_execve_file(int fd, struct filename *filename, sched_exec(); bprm->file = file; - if (!filename) { - bprm->filename = "none"; - } else if (fd == AT_FDCWD || filename->name[0] == '/') { + if (fd == AT_FDCWD || filename->name[0] == '/') { bprm->filename = filename->name; } else { if (filename->name[0] == '\0') @@ -1935,8 +1933,7 @@ static int __do_execve_file(int fd, struct filename *filename, task_numa_free(current, false); free_bprm(bprm); kfree(pathbuf); - if (filename) - putname(filename); + putname(filename); if (displaced) put_files_struct(displaced); return retval; @@ -1967,27 +1964,10 @@ static int __do_execve_file(int fd, struct filename *filename, if (displaced) reset_files_struct(displaced); out_ret: - if (filename) - putname(filename); + putname(filename); return retval; } -static int do_execveat_common(int fd, struct filename *filename, - struct user_arg_ptr argv, - struct user_arg_ptr envp, - int flags) -{ - return __do_execve_file(fd, filename, argv, envp, flags, NULL); -} - -int do_execve_file(struct file *file, void *__argv, void *__envp) -{ - struct user_arg_ptr argv = { .ptr.native = __argv }; - struct user_arg_ptr envp = { .ptr.native = __envp }; - - return __do_execve_file(AT_FDCWD, NULL, argv, envp, 0, file); -} - int do_execve(struct filename *filename, const char __user *const __user *__argv, const char __user *const __user *__envp) diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h index 4a20b7517dd0..7c27d7b57871 100644 --- a/include/linux/binfmts.h +++ b/include/linux/binfmts.h @@ -141,6 +141,5 @@ extern int do_execveat(int, struct filename *, const char __user * const __user *, const char __user * const __user *, int); -int do_execve_file(struct file *file, void *__argv, void *__envp); #endif /* _LINUX_BINFMTS_H */ -- 2.25.0 ^ permalink raw reply related [flat|nested] 72+ messages in thread
* Re: [PATCH v2 10/15] exec: Remove do_execve_file 2020-06-29 20:04 ` [PATCH v2 10/15] exec: Remove do_execve_file Eric W. Biederman @ 2020-06-30 5:43 ` Christoph Hellwig 2020-06-30 12:14 ` Eric W. Biederman 0 siblings, 1 reply; 72+ messages in thread From: Christoph Hellwig @ 2020-06-30 5:43 UTC (permalink / raw) To: Eric W. Biederman Cc: linux-kernel, David Miller, Greg Kroah-Hartman, Tetsuo Handa, Alexei Starovoitov, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds FYI, this clashes badly with my exec rework. I'd suggest you drop everything touching exec here for now, and I can then add the final file based exec removal to the end of my series. ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH v2 10/15] exec: Remove do_execve_file 2020-06-30 5:43 ` Christoph Hellwig @ 2020-06-30 12:14 ` Eric W. Biederman 2020-06-30 13:38 ` Christoph Hellwig 0 siblings, 1 reply; 72+ messages in thread From: Eric W. Biederman @ 2020-06-30 12:14 UTC (permalink / raw) To: Christoph Hellwig Cc: linux-kernel, David Miller, Greg Kroah-Hartman, Tetsuo Handa, Alexei Starovoitov, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds Christoph Hellwig <hch@infradead.org> writes: > FYI, this clashes badly with my exec rework. I'd suggest you > drop everything touching exec here for now, and I can then > add the final file based exec removal to the end of my series. I have looked and I haven't even seen any exec work. Where can it be found? I have working and cleaning up exec for what 3 cycles now. There is still quite a ways to go before it becomes possible to fix some of the deep problems in exec. Removing all of these broken exec special cases is quite frankly the entire point of this patchset. Sight unseen I suggest you send me your exec work and I can merge it into my branch if we are going to conflict badly. Eric ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH v2 10/15] exec: Remove do_execve_file 2020-06-30 12:14 ` Eric W. Biederman @ 2020-06-30 13:38 ` Christoph Hellwig 2020-06-30 14:28 ` Eric W. Biederman 0 siblings, 1 reply; 72+ messages in thread From: Christoph Hellwig @ 2020-06-30 13:38 UTC (permalink / raw) To: Eric W. Biederman Cc: Christoph Hellwig, linux-kernel, David Miller, Greg Kroah-Hartman, Tetsuo Handa, Alexei Starovoitov, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds On Tue, Jun 30, 2020 at 07:14:23AM -0500, Eric W. Biederman wrote: > Christoph Hellwig <hch@infradead.org> writes: > > > FYI, this clashes badly with my exec rework. I'd suggest you > > drop everything touching exec here for now, and I can then > > add the final file based exec removal to the end of my series. > > I have looked and I haven't even seen any exec work. Where can it be > found? > > I have working and cleaning up exec for what 3 cycles now. There is > still quite a ways to go before it becomes possible to fix some of the > deep problems in exec. Removing all of these broken exec special cases > is quite frankly the entire point of this patchset. > > Sight unseen I suggest you send me your exec work and I can merge it > into my branch if we are going to conflict badly. https://lore.kernel.org/linux-fsdevel/20200627072704.2447163-1-hch@lst.de/T/#t ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH v2 10/15] exec: Remove do_execve_file 2020-06-30 13:38 ` Christoph Hellwig @ 2020-06-30 14:28 ` Eric W. Biederman 2020-06-30 16:55 ` Alexei Starovoitov 0 siblings, 1 reply; 72+ messages in thread From: Eric W. Biederman @ 2020-06-30 14:28 UTC (permalink / raw) To: Christoph Hellwig Cc: linux-kernel, David Miller, Greg Kroah-Hartman, Tetsuo Handa, Alexei Starovoitov, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds Christoph Hellwig <hch@infradead.org> writes: > On Tue, Jun 30, 2020 at 07:14:23AM -0500, Eric W. Biederman wrote: >> Christoph Hellwig <hch@infradead.org> writes: >> >> > FYI, this clashes badly with my exec rework. I'd suggest you >> > drop everything touching exec here for now, and I can then >> > add the final file based exec removal to the end of my series. >> >> I have looked and I haven't even seen any exec work. Where can it be >> found? >> >> I have working and cleaning up exec for what 3 cycles now. There is >> still quite a ways to go before it becomes possible to fix some of the >> deep problems in exec. Removing all of these broken exec special cases >> is quite frankly the entire point of this patchset. >> >> Sight unseen I suggest you send me your exec work and I can merge it >> into my branch if we are going to conflict badly. > > https://lore.kernel.org/linux-fsdevel/20200627072704.2447163-1-hch@lst.de/T/#t Looking at your final patch I do not like the construct. static int __do_execveat(int fd, struct filename *filename, const char __user *const __user *argv, const char __user *const __user *envp, const char *const *kernel_argv, const char *const *kernel_envp, int flags, struct file *file); It results in a function that is full of: if (kernel_argv) { // For kernel_exeveat ... } else { // For ordinary exeveat } Which while understandable. I do not think results in good long term maintainble code. The current file paramter that I am getting rid of in my patchset is a stark example of that. Because of all of the if's no one realized that the code had it's file reference counting wrong (amoung other bugs). I think this is important to address as exec has already passed the point where people can fix all of the bugs in exec because the code is so hairy. I think to be maintainable and clear the code exec code is going to need to look something like: static int bprm_execveat(int fd, struct filename *filename, struct bprm *bprm, int flags); int kernel_execve(const char *filename, const char *const *argv, const char *const *envp, int flags) { bprm = kzalloc(sizeof(*pbrm), GFP_KERNEL); bprm->argc = count_kernel_strings(argv); bprm->envc = count_kernel_strings(envp); prepare_arg_pages(bprm); copy_strings_kernel(bprm->envc, envp, bprm); copy_strings_kernel(bprm->argc, argc, bprm); ret = bprm_execveat(AT_FDCWD, filename, bprm); free_bprm(bprm); return ret; } int do_exeveat(int fd, const char *filename, const char __user *const __user *argv, const char __user *const __user *envp, int flags) { bprm = kzalloc(sizeof(*pbrm), GFP_KERNEL); bprm->argc = count_strings(argv); bprm->envc = count_strings(envp); prepare_arg_pages(bprm); copy_strings(bprm->envc, envp, bprm); copy_strings(bprm->argc, argc, bprm); ret = bprm_execveat(fd, filename, bprm); free_bprm(bprm); return ret; } More work is required obviously to make the code above really work but when the dust clears a structure like that doesn't have funny edge cases that can hide bugs and make it tricky to change the code. Eric ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH v2 10/15] exec: Remove do_execve_file 2020-06-30 14:28 ` Eric W. Biederman @ 2020-06-30 16:55 ` Alexei Starovoitov 0 siblings, 0 replies; 72+ messages in thread From: Alexei Starovoitov @ 2020-06-30 16:55 UTC (permalink / raw) To: Eric W. Biederman Cc: Christoph Hellwig, linux-kernel, David Miller, Greg Kroah-Hartman, Tetsuo Handa, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds On Tue, Jun 30, 2020 at 09:28:10AM -0500, Eric W. Biederman wrote: > Christoph Hellwig <hch@infradead.org> writes: > > > On Tue, Jun 30, 2020 at 07:14:23AM -0500, Eric W. Biederman wrote: > >> Christoph Hellwig <hch@infradead.org> writes: > >> > >> > FYI, this clashes badly with my exec rework. I'd suggest you > >> > drop everything touching exec here for now, and I can then > >> > add the final file based exec removal to the end of my series. > >> > >> I have looked and I haven't even seen any exec work. Where can it be > >> found? > >> > >> I have working and cleaning up exec for what 3 cycles now. There is > >> still quite a ways to go before it becomes possible to fix some of the > >> deep problems in exec. Removing all of these broken exec special cases > >> is quite frankly the entire point of this patchset. > >> > >> Sight unseen I suggest you send me your exec work and I can merge it > >> into my branch if we are going to conflict badly. > > > > https://lore.kernel.org/linux-fsdevel/20200627072704.2447163-1-hch@lst.de/T/#t > > > Looking at your final patch I do not like the construct. > > static int __do_execveat(int fd, struct filename *filename, > const char __user *const __user *argv, > const char __user *const __user *envp, > const char *const *kernel_argv, > const char *const *kernel_envp, > int flags, struct file *file); > > > It results in a function that is full of: > if (kernel_argv) { > // For kernel_exeveat > ... > } else { > // For ordinary exeveat > > } > > Which while understandable. I do not think results in good long term > maintainble code. > > The current file paramter that I am getting rid of in my patchset is > a stark example of that. Because of all of the if's no one realized > that the code had it's file reference counting wrong (amoung other > bugs). > > I think this is important to address as exec has already passed > the point where people can fix all of the bugs in exec because > the code is so hairy. > > I think to be maintainable and clear the code exec code is going to > need to look something like: > > static int bprm_execveat(int fd, struct filename *filename, > struct bprm *bprm, int flags); > > int kernel_execve(const char *filename, > const char *const *argv, const char *const *envp, int flags) > { > bprm = kzalloc(sizeof(*pbrm), GFP_KERNEL); > bprm->argc = count_kernel_strings(argv); > bprm->envc = count_kernel_strings(envp); > prepare_arg_pages(bprm); > copy_strings_kernel(bprm->envc, envp, bprm); > copy_strings_kernel(bprm->argc, argc, bprm); > ret = bprm_execveat(AT_FDCWD, filename, bprm); > free_bprm(bprm); > return ret; > } > > int do_exeveat(int fd, const char *filename, > const char __user *const __user *argv, > const char __user *const __user *envp, int flags) > { > bprm = kzalloc(sizeof(*pbrm), GFP_KERNEL); > bprm->argc = count_strings(argv); > bprm->envc = count_strings(envp); > prepare_arg_pages(bprm); > copy_strings(bprm->envc, envp, bprm); > copy_strings(bprm->argc, argc, bprm); > ret = bprm_execveat(fd, filename, bprm); > free_bprm(bprm); > return ret; > } > > More work is required obviously to make the code above really work but > when the dust clears a structure like that doesn't have funny edge cases > that can hide bugs and make it tricky to change the code. +1 to the approach. I think Christoph's work need to be on top of Eric's. ^ permalink raw reply [flat|nested] 72+ messages in thread
* [PATCH v2 11/15] bpfilter: Move bpfilter_umh back into init data 2020-06-29 19:55 ` [PATCH v2 00/15] Make the user mode driver code a better citizen Eric W. Biederman ` (9 preceding siblings ...) 2020-06-29 20:04 ` [PATCH v2 10/15] exec: Remove do_execve_file Eric W. Biederman @ 2020-06-29 20:05 ` Eric W. Biederman 2020-06-29 20:06 ` [PATCH v2 12/15] umd: Track user space drivers with struct pid Eric W. Biederman ` (6 subsequent siblings) 17 siblings, 0 replies; 72+ messages in thread From: Eric W. Biederman @ 2020-06-29 20:05 UTC (permalink / raw) To: linux-kernel Cc: David Miller, Greg Kroah-Hartman, Tetsuo Handa, Alexei Starovoitov, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds To allow for restarts 61fbf5933d42 ("net: bpfilter: restart bpfilter_umh when error occurred") moved the blob holding the userspace binary out of the init sections. Now that loading the blob into a filesystem is separate from executing the blob the blob no longer needs to live .rodata to allow for restarting. So move the blob back to .init.rodata. Link: https://lkml.kernel.org/r/87sgeidlvq.fsf_-_@x220.int.ebiederm.org Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- net/bpfilter/bpfilter_umh_blob.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/bpfilter/bpfilter_umh_blob.S b/net/bpfilter/bpfilter_umh_blob.S index 9ea6100dca87..40311d10d2f2 100644 --- a/net/bpfilter/bpfilter_umh_blob.S +++ b/net/bpfilter/bpfilter_umh_blob.S @@ -1,5 +1,5 @@ /* SPDX-License-Identifier: GPL-2.0 */ - .section .rodata, "a" + .section .init.rodata, "a" .global bpfilter_umh_start bpfilter_umh_start: .incbin "net/bpfilter/bpfilter_umh" -- 2.25.0 ^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH v2 12/15] umd: Track user space drivers with struct pid 2020-06-29 19:55 ` [PATCH v2 00/15] Make the user mode driver code a better citizen Eric W. Biederman ` (10 preceding siblings ...) 2020-06-29 20:05 ` [PATCH v2 11/15] bpfilter: Move bpfilter_umh back into init data Eric W. Biederman @ 2020-06-29 20:06 ` Eric W. Biederman 2020-06-29 20:06 ` [PATCH v2 13/15] bpfilter: Take advantage of the facilities of " Eric W. Biederman ` (5 subsequent siblings) 17 siblings, 0 replies; 72+ messages in thread From: Eric W. Biederman @ 2020-06-29 20:06 UTC (permalink / raw) To: linux-kernel Cc: David Miller, Greg Kroah-Hartman, Tetsuo Handa, Alexei Starovoitov, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds Use struct pid instead of user space pid values that are prone to wrap araound. In addition track the entire thread group instead of just the first thread that is started by exec. There are no multi-threaded user mode drivers today but there is nothing preclucing user drivers from being multi-threaded, so it is just a good idea to track the entire process. Take a reference count on the tgid's in question to make it possible to remove exit_umh in a future change. As a struct pid is available directly use kill_pid_info. The prior process signalling code was iffy in using a userspace pid known to be in the initial pid namespace and then looking up it's task in whatever the current pid namespace is. It worked only because kernel threads always run in the initial pid namespace. As the tgid is now refcounted verify the tgid is NULL at the start of fork_usermode_driver to avoid the possibility of silent pid leaks. Link: https://lkml.kernel.org/r/87mu4qdlv2.fsf_-_@x220.int.ebiederm.org Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- include/linux/umd.h | 2 +- kernel/exit.c | 3 ++- kernel/umd.c | 15 ++++++++++----- net/bpfilter/bpfilter_kern.c | 13 +++++-------- net/ipv4/bpfilter/sockopt.c | 3 ++- 5 files changed, 20 insertions(+), 16 deletions(-) diff --git a/include/linux/umd.h b/include/linux/umd.h index 12ff8f753ea7..edb1c62c62f4 100644 --- a/include/linux/umd.h +++ b/include/linux/umd.h @@ -25,7 +25,7 @@ struct umd_info { struct list_head list; void (*cleanup)(struct umd_info *info); struct path wd; - pid_t pid; + struct pid *tgid; }; int umd_load_blob(struct umd_info *info, const void *data, size_t len); int umd_unload_blob(struct umd_info *info); diff --git a/kernel/exit.c b/kernel/exit.c index b94fe03e609c..b53107abdd31 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -805,7 +805,8 @@ void __noreturn do_exit(long code) exit_task_namespaces(tsk); exit_task_work(tsk); exit_thread(tsk); - exit_umh(tsk); + if (group_dead) + exit_umh(tsk); /* * Flush inherited counters to the parent - before the parent diff --git a/kernel/umd.c b/kernel/umd.c index aaa6f3142e52..c1e8eccaee76 100644 --- a/kernel/umd.c +++ b/kernel/umd.c @@ -133,7 +133,7 @@ static int umd_setup(struct subprocess_info *info, struct cred *new) set_fs_pwd(current->fs, &umd_info->wd); umd_info->pipe_to_umh = to_umh[1]; umd_info->pipe_from_umh = from_umh[0]; - umd_info->pid = task_pid_nr(current); + umd_info->tgid = get_pid(task_tgid(current)); current->flags |= PF_UMH; return 0; } @@ -146,6 +146,8 @@ static void umd_cleanup(struct subprocess_info *info) if (info->retval) { fput(umd_info->pipe_to_umh); fput(umd_info->pipe_from_umh); + put_pid(umd_info->tgid); + umd_info->tgid = NULL; } } @@ -155,9 +157,9 @@ static void umd_cleanup(struct subprocess_info *info) * * Returns either negative error or zero which indicates success in * executing a usermode driver. In such case 'struct umd_info *info' - * is populated with two pipes and a pid of the process. The caller is + * is populated with two pipes and a tgid of the process. The caller is * responsible for health check of the user process, killing it via - * pid, and closing the pipes when user process is no longer needed. + * tgid, and closing the pipes when user process is no longer needed. */ int fork_usermode_driver(struct umd_info *info) { @@ -165,6 +167,9 @@ int fork_usermode_driver(struct umd_info *info) char **argv = NULL; int err; + if (WARN_ON_ONCE(info->tgid)) + return -EBUSY; + err = -ENOMEM; argv = argv_split(GFP_KERNEL, info->driver_name, NULL); if (!argv) @@ -192,11 +197,11 @@ EXPORT_SYMBOL_GPL(fork_usermode_driver); void __exit_umh(struct task_struct *tsk) { struct umd_info *info; - pid_t pid = tsk->pid; + struct pid *tgid = task_tgid(tsk); mutex_lock(&umh_list_lock); list_for_each_entry(info, &umh_list, list) { - if (info->pid == pid) { + if (info->tgid == tgid) { list_del(&info->list); mutex_unlock(&umh_list_lock); goto out; diff --git a/net/bpfilter/bpfilter_kern.c b/net/bpfilter/bpfilter_kern.c index 28883b00609d..b73dedeb6dbf 100644 --- a/net/bpfilter/bpfilter_kern.c +++ b/net/bpfilter/bpfilter_kern.c @@ -15,16 +15,13 @@ extern char bpfilter_umh_end; static void shutdown_umh(void) { - struct task_struct *tsk; + struct umd_info *info = &bpfilter_ops.info; + struct pid *tgid = info->tgid; if (bpfilter_ops.stop) return; - tsk = get_pid_task(find_vpid(bpfilter_ops.info.pid), PIDTYPE_PID); - if (tsk) { - send_sig(SIGKILL, tsk, 1); - put_task_struct(tsk); - } + kill_pid_info(SIGKILL, SEND_SIG_PRIV, tgid); } static void __stop_umh(void) @@ -48,7 +45,7 @@ static int __bpfilter_process_sockopt(struct sock *sk, int optname, req.cmd = optname; req.addr = (long __force __user)optval; req.len = optlen; - if (!bpfilter_ops.info.pid) + if (!bpfilter_ops.info.tgid) goto out; n = __kernel_write(bpfilter_ops.info.pipe_to_umh, &req, sizeof(req), &pos); @@ -81,7 +78,7 @@ static int start_umh(void) if (err) return err; bpfilter_ops.stop = false; - pr_info("Loaded bpfilter_umh pid %d\n", bpfilter_ops.info.pid); + pr_info("Loaded bpfilter_umh pid %d\n", pid_nr(bpfilter_ops.info.tgid)); /* health check that usermode process started correctly */ if (__bpfilter_process_sockopt(NULL, 0, NULL, 0, 0) != 0) { diff --git a/net/ipv4/bpfilter/sockopt.c b/net/ipv4/bpfilter/sockopt.c index 5050de28333d..56cbc43145f6 100644 --- a/net/ipv4/bpfilter/sockopt.c +++ b/net/ipv4/bpfilter/sockopt.c @@ -18,7 +18,8 @@ static void bpfilter_umh_cleanup(struct umd_info *info) bpfilter_ops.stop = true; fput(info->pipe_to_umh); fput(info->pipe_from_umh); - info->pid = 0; + put_pid(info->tgid); + info->tgid = NULL; mutex_unlock(&bpfilter_ops.lock); } -- 2.25.0 ^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH v2 13/15] bpfilter: Take advantage of the facilities of struct pid 2020-06-29 19:55 ` [PATCH v2 00/15] Make the user mode driver code a better citizen Eric W. Biederman ` (11 preceding siblings ...) 2020-06-29 20:06 ` [PATCH v2 12/15] umd: Track user space drivers with struct pid Eric W. Biederman @ 2020-06-29 20:06 ` Eric W. Biederman 2020-06-29 20:07 ` [PATCH v2 14/15] umd: Remove exit_umh Eric W. Biederman ` (4 subsequent siblings) 17 siblings, 0 replies; 72+ messages in thread From: Eric W. Biederman @ 2020-06-29 20:06 UTC (permalink / raw) To: linux-kernel Cc: David Miller, Greg Kroah-Hartman, Tetsuo Handa, Alexei Starovoitov, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds Instead of relying on the exit_umh cleanup callback use the fact a struct pid can be tested to see if a process still exists, and that struct pid has a wait queue that notifies when the process dies. Link: https://lkml.kernel.org/r/87h7uydlu9.fsf_-_@x220.int.ebiederm.org Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- include/linux/bpfilter.h | 3 ++- net/bpfilter/bpfilter_kern.c | 15 +++++---------- net/ipv4/bpfilter/sockopt.c | 15 ++++++++------- 3 files changed, 15 insertions(+), 18 deletions(-) diff --git a/include/linux/bpfilter.h b/include/linux/bpfilter.h index 4b43d2240172..8073ddce73b1 100644 --- a/include/linux/bpfilter.h +++ b/include/linux/bpfilter.h @@ -10,6 +10,8 @@ int bpfilter_ip_set_sockopt(struct sock *sk, int optname, char __user *optval, unsigned int optlen); int bpfilter_ip_get_sockopt(struct sock *sk, int optname, char __user *optval, int __user *optlen); +void bpfilter_umh_cleanup(struct umd_info *info); + struct bpfilter_umh_ops { struct umd_info info; /* since ip_getsockopt() can run in parallel, serialize access to umh */ @@ -18,7 +20,6 @@ struct bpfilter_umh_ops { char __user *optval, unsigned int optlen, bool is_set); int (*start)(void); - bool stop; }; extern struct bpfilter_umh_ops bpfilter_ops; #endif diff --git a/net/bpfilter/bpfilter_kern.c b/net/bpfilter/bpfilter_kern.c index b73dedeb6dbf..91474884ddb7 100644 --- a/net/bpfilter/bpfilter_kern.c +++ b/net/bpfilter/bpfilter_kern.c @@ -18,10 +18,11 @@ static void shutdown_umh(void) struct umd_info *info = &bpfilter_ops.info; struct pid *tgid = info->tgid; - if (bpfilter_ops.stop) - return; - - kill_pid_info(SIGKILL, SEND_SIG_PRIV, tgid); + if (tgid) { + kill_pid_info(SIGKILL, SEND_SIG_PRIV, tgid); + wait_event(tgid->wait_pidfd, !pid_task(tgid, PIDTYPE_TGID)); + bpfilter_umh_cleanup(info); + } } static void __stop_umh(void) @@ -77,7 +78,6 @@ static int start_umh(void) err = fork_usermode_driver(&bpfilter_ops.info); if (err) return err; - bpfilter_ops.stop = false; pr_info("Loaded bpfilter_umh pid %d\n", pid_nr(bpfilter_ops.info.tgid)); /* health check that usermode process started correctly */ @@ -100,16 +100,11 @@ static int __init load_umh(void) return err; mutex_lock(&bpfilter_ops.lock); - if (!bpfilter_ops.stop) { - err = -EFAULT; - goto out; - } err = start_umh(); if (!err && IS_ENABLED(CONFIG_INET)) { bpfilter_ops.sockopt = &__bpfilter_process_sockopt; bpfilter_ops.start = &start_umh; } -out: mutex_unlock(&bpfilter_ops.lock); if (err) umd_unload_blob(&bpfilter_ops.info); diff --git a/net/ipv4/bpfilter/sockopt.c b/net/ipv4/bpfilter/sockopt.c index 56cbc43145f6..9455eb9cec78 100644 --- a/net/ipv4/bpfilter/sockopt.c +++ b/net/ipv4/bpfilter/sockopt.c @@ -12,16 +12,14 @@ struct bpfilter_umh_ops bpfilter_ops; EXPORT_SYMBOL_GPL(bpfilter_ops); -static void bpfilter_umh_cleanup(struct umd_info *info) +void bpfilter_umh_cleanup(struct umd_info *info) { - mutex_lock(&bpfilter_ops.lock); - bpfilter_ops.stop = true; fput(info->pipe_to_umh); fput(info->pipe_from_umh); put_pid(info->tgid); info->tgid = NULL; - mutex_unlock(&bpfilter_ops.lock); } +EXPORT_SYMBOL_GPL(bpfilter_umh_cleanup); static int bpfilter_mbox_request(struct sock *sk, int optname, char __user *optval, @@ -39,7 +37,11 @@ static int bpfilter_mbox_request(struct sock *sk, int optname, goto out; } } - if (bpfilter_ops.stop) { + if (bpfilter_ops.info.tgid && + !pid_has_task(bpfilter_ops.info.tgid, PIDTYPE_TGID)) + bpfilter_umh_cleanup(&bpfilter_ops.info); + + if (!bpfilter_ops.info.tgid) { err = bpfilter_ops.start(); if (err) goto out; @@ -70,9 +72,8 @@ int bpfilter_ip_get_sockopt(struct sock *sk, int optname, char __user *optval, static int __init bpfilter_sockopt_init(void) { mutex_init(&bpfilter_ops.lock); - bpfilter_ops.stop = true; + bpfilter_ops.info.tgid = NULL; bpfilter_ops.info.driver_name = "bpfilter_umh"; - bpfilter_ops.info.cleanup = &bpfilter_umh_cleanup; return 0; } -- 2.25.0 ^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH v2 14/15] umd: Remove exit_umh 2020-06-29 19:55 ` [PATCH v2 00/15] Make the user mode driver code a better citizen Eric W. Biederman ` (12 preceding siblings ...) 2020-06-29 20:06 ` [PATCH v2 13/15] bpfilter: Take advantage of the facilities of " Eric W. Biederman @ 2020-06-29 20:07 ` Eric W. Biederman 2020-06-29 20:08 ` [PATCH v2 15/15] umd: Stop using split_argv Eric W. Biederman ` (3 subsequent siblings) 17 siblings, 0 replies; 72+ messages in thread From: Eric W. Biederman @ 2020-06-29 20:07 UTC (permalink / raw) To: linux-kernel Cc: David Miller, Greg Kroah-Hartman, Tetsuo Handa, Alexei Starovoitov, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds The bffilter code no longer uses the umd_info.cleanup callback. This callback is what exit_umh exists to call. So remove exit_umh and all of it's associated booking. Link: https://lkml.kernel.org/r/87bll6dlte.fsf_-_@x220.int.ebiederm.org Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- include/linux/sched.h | 1 - include/linux/umd.h | 16 ---------------- kernel/exit.c | 3 --- kernel/umd.c | 28 ---------------------------- 4 files changed, 48 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 59d1e92bb88e..edb2020875ad 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1511,7 +1511,6 @@ extern struct pid *cad_pid; #define PF_KTHREAD 0x00200000 /* I am a kernel thread */ #define PF_RANDOMIZE 0x00400000 /* Randomize virtual address space */ #define PF_SWAPWRITE 0x00800000 /* Allowed to write to swap */ -#define PF_UMH 0x02000000 /* I'm an Usermodehelper process */ #define PF_NO_SETAFFINITY 0x04000000 /* Userland is not allowed to meddle with cpus_mask */ #define PF_MCE_EARLY 0x08000000 /* Early kill for mce process policy */ #define PF_MEMALLOC_NOCMA 0x10000000 /* All allocation request will have _GFP_MOVABLE cleared */ diff --git a/include/linux/umd.h b/include/linux/umd.h index edb1c62c62f4..71d8f4a41ad7 100644 --- a/include/linux/umd.h +++ b/include/linux/umd.h @@ -4,26 +4,10 @@ #include <linux/umh.h> #include <linux/path.h> -#ifdef CONFIG_BPFILTER -void __exit_umh(struct task_struct *tsk); - -static inline void exit_umh(struct task_struct *tsk) -{ - if (unlikely(tsk->flags & PF_UMH)) - __exit_umh(tsk); -} -#else -static inline void exit_umh(struct task_struct *tsk) -{ -} -#endif - struct umd_info { const char *driver_name; struct file *pipe_to_umh; struct file *pipe_from_umh; - struct list_head list; - void (*cleanup)(struct umd_info *info); struct path wd; struct pid *tgid; }; diff --git a/kernel/exit.c b/kernel/exit.c index b53107abdd31..42f079eb71e5 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -63,7 +63,6 @@ #include <linux/random.h> #include <linux/rcuwait.h> #include <linux/compat.h> -#include <linux/umd.h> #include <linux/uaccess.h> #include <asm/unistd.h> @@ -805,8 +804,6 @@ void __noreturn do_exit(long code) exit_task_namespaces(tsk); exit_task_work(tsk); exit_thread(tsk); - if (group_dead) - exit_umh(tsk); /* * Flush inherited counters to the parent - before the parent diff --git a/kernel/umd.c b/kernel/umd.c index c1e8eccaee76..4188b71de267 100644 --- a/kernel/umd.c +++ b/kernel/umd.c @@ -9,9 +9,6 @@ #include <linux/task_work.h> #include <linux/umd.h> -static LIST_HEAD(umh_list); -static DEFINE_MUTEX(umh_list_lock); - static struct vfsmount *blob_to_mnt(const void *data, size_t len, const char *name) { struct file_system_type *type; @@ -134,7 +131,6 @@ static int umd_setup(struct subprocess_info *info, struct cred *new) umd_info->pipe_to_umh = to_umh[1]; umd_info->pipe_from_umh = from_umh[0]; umd_info->tgid = get_pid(task_tgid(current)); - current->flags |= PF_UMH; return 0; } @@ -182,11 +178,6 @@ int fork_usermode_driver(struct umd_info *info) goto out; err = call_usermodehelper_exec(sub_info, UMH_WAIT_EXEC); - if (!err) { - mutex_lock(&umh_list_lock); - list_add(&info->list, &umh_list); - mutex_unlock(&umh_list_lock); - } out: if (argv) argv_free(argv); @@ -194,23 +185,4 @@ int fork_usermode_driver(struct umd_info *info) } EXPORT_SYMBOL_GPL(fork_usermode_driver); -void __exit_umh(struct task_struct *tsk) -{ - struct umd_info *info; - struct pid *tgid = task_tgid(tsk); - - mutex_lock(&umh_list_lock); - list_for_each_entry(info, &umh_list, list) { - if (info->tgid == tgid) { - list_del(&info->list); - mutex_unlock(&umh_list_lock); - goto out; - } - } - mutex_unlock(&umh_list_lock); - return; -out: - if (info->cleanup) - info->cleanup(info); -} -- 2.25.0 ^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH v2 15/15] umd: Stop using split_argv 2020-06-29 19:55 ` [PATCH v2 00/15] Make the user mode driver code a better citizen Eric W. Biederman ` (13 preceding siblings ...) 2020-06-29 20:07 ` [PATCH v2 14/15] umd: Remove exit_umh Eric W. Biederman @ 2020-06-29 20:08 ` Eric W. Biederman 2020-06-29 22:12 ` [PATCH v2 00/15] Make the user mode driver code a better citizen Alexei Starovoitov ` (2 subsequent siblings) 17 siblings, 0 replies; 72+ messages in thread From: Eric W. Biederman @ 2020-06-29 20:08 UTC (permalink / raw) To: linux-kernel Cc: David Miller, Greg Kroah-Hartman, Tetsuo Handa, Alexei Starovoitov, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds There is exactly one argument so there is nothing to split. All split_argv does now is cause confusion and avoid the need for a cast when passing a "const char *" string to call_usermodehelper_setup. So avoid confusion and the possibility of an odd driver name causing problems by just using a fixed argv array with a cast in the call to call_usermodehelper_setup. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- kernel/umd.c | 12 +++--------- 1 file changed, 3 insertions(+), 9 deletions(-) diff --git a/kernel/umd.c b/kernel/umd.c index 4188b71de267..ff79fb16d738 100644 --- a/kernel/umd.c +++ b/kernel/umd.c @@ -160,27 +160,21 @@ static void umd_cleanup(struct subprocess_info *info) int fork_usermode_driver(struct umd_info *info) { struct subprocess_info *sub_info; - char **argv = NULL; + const char *argv[] = { info->driver_name, NULL }; int err; if (WARN_ON_ONCE(info->tgid)) return -EBUSY; err = -ENOMEM; - argv = argv_split(GFP_KERNEL, info->driver_name, NULL); - if (!argv) - goto out; - - sub_info = call_usermodehelper_setup(info->driver_name, argv, NULL, - GFP_KERNEL, + sub_info = call_usermodehelper_setup(info->driver_name, + (char **)argv, NULL, GFP_KERNEL, umd_setup, umd_cleanup, info); if (!sub_info) goto out; err = call_usermodehelper_exec(sub_info, UMH_WAIT_EXEC); out: - if (argv) - argv_free(argv); return err; } EXPORT_SYMBOL_GPL(fork_usermode_driver); -- 2.25.0 ^ permalink raw reply related [flat|nested] 72+ messages in thread
* Re: [PATCH v2 00/15] Make the user mode driver code a better citizen 2020-06-29 19:55 ` [PATCH v2 00/15] Make the user mode driver code a better citizen Eric W. Biederman ` (14 preceding siblings ...) 2020-06-29 20:08 ` [PATCH v2 15/15] umd: Stop using split_argv Eric W. Biederman @ 2020-06-29 22:12 ` Alexei Starovoitov 2020-06-30 1:13 ` Eric W. Biederman 2020-06-30 12:29 ` Eric W. Biederman 2020-07-02 16:40 ` [PATCH v3 00/16] " Eric W. Biederman 2020-07-08 5:20 ` [PATCH v2 00/15] " Luis Chamberlain 17 siblings, 2 replies; 72+ messages in thread From: Alexei Starovoitov @ 2020-06-29 22:12 UTC (permalink / raw) To: Eric W. Biederman Cc: linux-kernel, David Miller, Greg Kroah-Hartman, Tetsuo Handa, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds On Mon, Jun 29, 2020 at 02:55:05PM -0500, Eric W. Biederman wrote: > > I have tested thes changes by booting with the code compiled in and > by killing "bpfilter_umh" and running iptables -vnL to restart > the userspace driver. > > I have compiled tested each change with and without CONFIG_BPFILTER > enabled. With CONFIG_BPFILTER=y CONFIG_BPFILTER_UMH=m it doesn't build: ERROR: modpost: "kill_pid_info" [net/bpfilter/bpfilter.ko] undefined! I've added: +EXPORT_SYMBOL(kill_pid_info); to continue testing... And then did: while true; do iptables -L;rmmod bpfilter; done Unfortunately sometimes 'rmmod bpfilter' hangs in wait_event(). I suspect patch 13 is somehow responsible: + if (tgid) { + kill_pid_info(SIGKILL, SEND_SIG_PRIV, tgid); + wait_event(tgid->wait_pidfd, !pid_task(tgid, PIDTYPE_TGID)); + bpfilter_umh_cleanup(info); + } I cannot figure out why it hangs. Some sort of race ? Since adding short delay between kill and wait makes it work. ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH v2 00/15] Make the user mode driver code a better citizen 2020-06-29 22:12 ` [PATCH v2 00/15] Make the user mode driver code a better citizen Alexei Starovoitov @ 2020-06-30 1:13 ` Eric W. Biederman 2020-06-30 6:16 ` Tetsuo Handa 2020-06-30 12:29 ` Eric W. Biederman 1 sibling, 1 reply; 72+ messages in thread From: Eric W. Biederman @ 2020-06-30 1:13 UTC (permalink / raw) To: Alexei Starovoitov Cc: linux-kernel, David Miller, Greg Kroah-Hartman, Tetsuo Handa, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds Alexei Starovoitov <alexei.starovoitov@gmail.com> writes: > On Mon, Jun 29, 2020 at 02:55:05PM -0500, Eric W. Biederman wrote: >> >> I have tested thes changes by booting with the code compiled in and >> by killing "bpfilter_umh" and running iptables -vnL to restart >> the userspace driver. >> >> I have compiled tested each change with and without CONFIG_BPFILTER >> enabled. > > With > CONFIG_BPFILTER=y > CONFIG_BPFILTER_UMH=m > it doesn't build: > > ERROR: modpost: "kill_pid_info" [net/bpfilter/bpfilter.ko] undefined! > > I've added: > +EXPORT_SYMBOL(kill_pid_info); > to continue testing... > > And then did: > while true; do iptables -L;rmmod bpfilter; done > > Unfortunately sometimes 'rmmod bpfilter' hangs in wait_event(). > > I suspect patch 13 is somehow responsible: > + if (tgid) { > + kill_pid_info(SIGKILL, SEND_SIG_PRIV, tgid); > + wait_event(tgid->wait_pidfd, !pid_task(tgid, PIDTYPE_TGID)); > + bpfilter_umh_cleanup(info); > + } > > I cannot figure out why it hangs. Some sort of race ? > Since adding short delay between kill and wait makes it work. Thanks. I will take a look. Eric ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH v2 00/15] Make the user mode driver code a better citizen 2020-06-30 1:13 ` Eric W. Biederman @ 2020-06-30 6:16 ` Tetsuo Handa 0 siblings, 0 replies; 72+ messages in thread From: Tetsuo Handa @ 2020-06-30 6:16 UTC (permalink / raw) To: Eric W. Biederman, Alexei Starovoitov Cc: linux-kernel, David Miller, Greg Kroah-Hartman, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds On 2020/06/30 10:13, Eric W. Biederman wrote: > Alexei Starovoitov <alexei.starovoitov@gmail.com> writes: > >> On Mon, Jun 29, 2020 at 02:55:05PM -0500, Eric W. Biederman wrote: >>> >>> I have tested thes changes by booting with the code compiled in and >>> by killing "bpfilter_umh" and running iptables -vnL to restart >>> the userspace driver. >>> >>> I have compiled tested each change with and without CONFIG_BPFILTER >>> enabled. >> >> With >> CONFIG_BPFILTER=y >> CONFIG_BPFILTER_UMH=m >> it doesn't build: >> >> ERROR: modpost: "kill_pid_info" [net/bpfilter/bpfilter.ko] undefined! >> >> I've added: >> +EXPORT_SYMBOL(kill_pid_info); >> to continue testing... kill_pid() is already exported. >> >> And then did: >> while true; do iptables -L;rmmod bpfilter; done >> >> Unfortunately sometimes 'rmmod bpfilter' hangs in wait_event(). >> >> I suspect patch 13 is somehow responsible: >> + if (tgid) { >> + kill_pid_info(SIGKILL, SEND_SIG_PRIV, tgid); >> + wait_event(tgid->wait_pidfd, !pid_task(tgid, PIDTYPE_TGID)); >> + bpfilter_umh_cleanup(info); >> + } >> >> I cannot figure out why it hangs. Some sort of race ? >> Since adding short delay between kill and wait makes it work. Because there is a race window that detach_pid() from __unhash_process() from __exit_signal() from release_task() from exit_notify() from do_exit() is called some time after wake_up_all(&pid->wait_pidfd) from do_notify_pidfd() from do_notify_parent() from exit_notify() from do_exit() was called (in other words, we can't use pid->wait_pidfd when pid_task() is used at wait_event()) ? Below are changes I suggest. diff --git a/kernel/umd.c b/kernel/umd.c index ff79fb16d738..f688813b8830 100644 --- a/kernel/umd.c +++ b/kernel/umd.c @@ -26,7 +26,7 @@ static struct vfsmount *blob_to_mnt(const void *data, size_t len, const char *na if (IS_ERR(mnt)) return mnt; - file = file_open_root(mnt->mnt_root, mnt, name, O_CREAT | O_WRONLY, 0700); + file = file_open_root(mnt->mnt_root, mnt, name, O_CREAT | O_WRONLY | O_EXCL, 0700); if (IS_ERR(file)) { mntput(mnt); return ERR_CAST(file); @@ -52,16 +52,18 @@ static struct vfsmount *blob_to_mnt(const void *data, size_t len, const char *na /** * umd_load_blob - Remember a blob of bytes for fork_usermode_driver - * @info: information about usermode driver - * @data: a blob of bytes that can be executed as a file - * @len: The lentgh of the blob + * @info: information about usermode driver (shouldn't be NULL) + * @data: a blob of bytes that can be executed as a file (shouldn't be NULL) + * @len: The lentgh of the blob (shouldn't be 0) * */ int umd_load_blob(struct umd_info *info, const void *data, size_t len) { struct vfsmount *mnt; - if (WARN_ON_ONCE(info->wd.dentry || info->wd.mnt)) + if (!info || !info->driver_name || !data || !len) + return -EINVAL; + if (info->wd.dentry || info->wd.mnt) return -EBUSY; mnt = blob_to_mnt(data, len, info->driver_name); @@ -76,15 +78,14 @@ EXPORT_SYMBOL_GPL(umd_load_blob); /** * umd_unload_blob - Disassociate @info from a previously loaded blob - * @info: information about usermode driver + * @info: information about usermode driver (shouldn't be NULL) * */ int umd_unload_blob(struct umd_info *info) { - if (WARN_ON_ONCE(!info->wd.mnt || - !info->wd.dentry || - info->wd.mnt->mnt_root != info->wd.dentry)) + if (!info || !info->driver_name || !info->wd.dentry || !info->wd.mnt) return -EINVAL; + BUG_ON(info->wd.mnt->mnt_root != info->wd.dentry); kern_unmount(info->wd.mnt); info->wd.mnt = NULL; @@ -138,7 +139,7 @@ static void umd_cleanup(struct subprocess_info *info) { struct umd_info *umd_info = info->data; - /* cleanup if umh_setup() was successful but exec failed */ + /* cleanup if umd_setup() was successful but exec failed */ if (info->retval) { fput(umd_info->pipe_to_umh); fput(umd_info->pipe_from_umh); @@ -163,7 +164,10 @@ int fork_usermode_driver(struct umd_info *info) const char *argv[] = { info->driver_name, NULL }; int err; - if (WARN_ON_ONCE(info->tgid)) + if (!info || !info->driver_name || !info->wd.dentry || !info->wd.mnt) + return -EINVAL; + BUG_ON(info->wd.mnt->mnt_root != info->wd.dentry); + if (info->tgid) return -EBUSY; err = -ENOMEM; diff --git a/net/bpfilter/bpfilter_kern.c b/net/bpfilter/bpfilter_kern.c index 91474884ddb7..9dd70aacb81a 100644 --- a/net/bpfilter/bpfilter_kern.c +++ b/net/bpfilter/bpfilter_kern.c @@ -19,8 +19,13 @@ static void shutdown_umh(void) struct pid *tgid = info->tgid; if (tgid) { - kill_pid_info(SIGKILL, SEND_SIG_PRIV, tgid); - wait_event(tgid->wait_pidfd, !pid_task(tgid, PIDTYPE_TGID)); + kill_pid(tgid, SIGKILL, 1); + while (({ bool done; + rcu_read_lock(); + done = !pid_task(tgid, PIDTYPE_TGID); + rcu_read_unlock(); + done; })) + schedule_timeout_uninterruptible(1); bpfilter_umh_cleanup(info); } } ^ permalink raw reply related [flat|nested] 72+ messages in thread
* Re: [PATCH v2 00/15] Make the user mode driver code a better citizen 2020-06-29 22:12 ` [PATCH v2 00/15] Make the user mode driver code a better citizen Alexei Starovoitov 2020-06-30 1:13 ` Eric W. Biederman @ 2020-06-30 12:29 ` Eric W. Biederman 2020-06-30 13:21 ` Tetsuo Handa 2020-06-30 16:52 ` Alexei Starovoitov 1 sibling, 2 replies; 72+ messages in thread From: Eric W. Biederman @ 2020-06-30 12:29 UTC (permalink / raw) To: Alexei Starovoitov Cc: linux-kernel, David Miller, Greg Kroah-Hartman, Tetsuo Handa, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds Alexei Starovoitov <alexei.starovoitov@gmail.com> writes: 2> On Mon, Jun 29, 2020 at 02:55:05PM -0500, Eric W. Biederman wrote: >> >> I have tested thes changes by booting with the code compiled in and >> by killing "bpfilter_umh" and running iptables -vnL to restart >> the userspace driver. >> >> I have compiled tested each change with and without CONFIG_BPFILTER >> enabled. > > With > CONFIG_BPFILTER=y > CONFIG_BPFILTER_UMH=m > it doesn't build: > > ERROR: modpost: "kill_pid_info" [net/bpfilter/bpfilter.ko] undefined! > > I've added: > +EXPORT_SYMBOL(kill_pid_info); > to continue testing... I am rather surprised I thought Tetsuo had already compile tested modules. > I suspect patch 13 is somehow responsible: > + if (tgid) { > + kill_pid_info(SIGKILL, SEND_SIG_PRIV, tgid); > + wait_event(tgid->wait_pidfd, !pid_task(tgid, PIDTYPE_TGID)); > + bpfilter_umh_cleanup(info); > + } > > I cannot figure out why it hangs. Some sort of race ? > Since adding short delay between kill and wait makes it work. Having had a chance to sleep kill_pid_info was a thinko, as was !pid_task. It should have been !pid_has_task as that takes the proper rcu locking. I don't know if that is going to be enough to fix the wait_event but those are obvious bugs that need to be fixed. diff --git a/net/bpfilter/bpfilter_kern.c b/net/bpfilter/bpfilter_kern.c index 91474884ddb7..3e1874030daa 100644 --- a/net/bpfilter/bpfilter_kern.c +++ b/net/bpfilter/bpfilter_kern.c @@ -19,8 +19,8 @@ static void shutdown_umh(void) struct pid *tgid = info->tgid; if (tgid) { - kill_pid_info(SIGKILL, SEND_SIG_PRIV, tgid); - wait_event(tgid->wait_pidfd, !pid_task(tgid, PIDTYPE_TGID)); + kill_pid(tgid, SIGKILL, 1); + wait_event(tgid->wait_pidfd, !pid_has_task(tgid, PIDTYPE_TGID)); bpfilter_umh_cleanup(info); } } > And then did: > while true; do iptables -L;rmmod bpfilter; done > > Unfortunately sometimes 'rmmod bpfilter' hangs in wait_event(). Hmm. The wake up happens just of tgid->wait_pidfd happens just before release_task is called so there is a race. As it is possible to wake up and then go back to sleep before pid_has_task becomes false. So I think I need a friendly helper that does: bool task_has_exited(struct pid *tgid) { bool exited = false; rcu_read_lock(); tsk = pid_task(tgid, PIDTYPE_TGID); exited = !!tsk; if (tsk) { exited = !!tsk->exit_state; out: rcu_unlock(); return exited; } There should be a sensible way to do that. Eric ^ permalink raw reply related [flat|nested] 72+ messages in thread
* Re: [PATCH v2 00/15] Make the user mode driver code a better citizen 2020-06-30 12:29 ` Eric W. Biederman @ 2020-06-30 13:21 ` Tetsuo Handa 2020-07-02 13:08 ` Eric W. Biederman 2020-06-30 16:52 ` Alexei Starovoitov 1 sibling, 1 reply; 72+ messages in thread From: Tetsuo Handa @ 2020-06-30 13:21 UTC (permalink / raw) To: Eric W. Biederman, Alexei Starovoitov Cc: linux-kernel, David Miller, Greg Kroah-Hartman, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds On 2020/06/30 21:29, Eric W. Biederman wrote: > Hmm. The wake up happens just of tgid->wait_pidfd happens just before > release_task is called so there is a race. As it is possible to wake > up and then go back to sleep before pid_has_task becomes false. What is the reason we want to wait until pid_has_task() becomes false? - wait_event(tgid->wait_pidfd, !pid_has_task(tgid, PIDTYPE_TGID)); + while (!wait_event_timeout(tgid->wait_pidfd, !pid_has_task(tgid, PIDTYPE_TGID), 1)); By the way, commit 4a9d4b024a3102fc ("switch fput to task_work_add") says that use of flush_delayed_fput() has to be careful. Al, is it safe to call flush_delayed_fput() from blob_to_mnt() from umd_load_blob() (which might be called from both kernel thread and from process context (e.g. init_module() syscall by /sbin/insmod )) ? ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH v2 00/15] Make the user mode driver code a better citizen 2020-06-30 13:21 ` Tetsuo Handa @ 2020-07-02 13:08 ` Eric W. Biederman 2020-07-02 13:40 ` Tetsuo Handa 0 siblings, 1 reply; 72+ messages in thread From: Eric W. Biederman @ 2020-07-02 13:08 UTC (permalink / raw) To: Tetsuo Handa Cc: Alexei Starovoitov, linux-kernel, David Miller, Greg Kroah-Hartman, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> writes: > On 2020/06/30 21:29, Eric W. Biederman wrote: >> Hmm. The wake up happens just of tgid->wait_pidfd happens just before >> release_task is called so there is a race. As it is possible to wake >> up and then go back to sleep before pid_has_task becomes false. > > What is the reason we want to wait until pid_has_task() becomes false? > > - wait_event(tgid->wait_pidfd, !pid_has_task(tgid, PIDTYPE_TGID)); > + while (!wait_event_timeout(tgid->wait_pidfd, !pid_has_task(tgid, PIDTYPE_TGID), 1)); So that it is safe to call bpfilter_umh_cleanup. The previous code performed the wait by having a callback in do_exit. It might be possible to call bpf_umh_cleanup early but I have not done that analysis. To perform the test correctly what I have right now is: bool thread_group_exited(struct pid *pid) { struct task_struct *tsk; bool exited; rcu_read_lock(); tsk = pid_task(pid, PIDTYPE_PID); exited = !tsk || (READ_ONCE(tsk->exit_state) && thread_group_empty(tsk)); rcu_read_unlock(); return exited; } Which is factored out of pidfd_poll. Which means that this won't be something that the bpfilter code has to maintain. That seems to be a fundamentally good facility to have regardless of bpfilter. I will post the whole thing in a bit once I have a chance to dot my i's and cross my t's. > By the way, commit 4a9d4b024a3102fc ("switch fput to task_work_add") says > that use of flush_delayed_fput() has to be careful. Al, is it safe to call > flush_delayed_fput() from blob_to_mnt() from umd_load_blob() (which might be > called from both kernel thread and from process context (e.g. init_module() > syscall by /sbin/insmod )) ? And __fput_sync needs to be even more careful. umd_load_blob is called in these changes without any locks held. We fundamentally AKA in any correct version of this code need to flush the file descriptor before we call exec or exec can not open it a read-only denying all writes from any other opens. The use case of flush_delayed_fput is exactly the same as that used when loading the initramfs. Eric ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH v2 00/15] Make the user mode driver code a better citizen 2020-07-02 13:08 ` Eric W. Biederman @ 2020-07-02 13:40 ` Tetsuo Handa 2020-07-02 16:02 ` Eric W. Biederman 0 siblings, 1 reply; 72+ messages in thread From: Tetsuo Handa @ 2020-07-02 13:40 UTC (permalink / raw) To: Eric W. Biederman Cc: Alexei Starovoitov, linux-kernel, David Miller, Greg Kroah-Hartman, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds On 2020/07/02 22:08, Eric W. Biederman wrote: > Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> writes: > >> On 2020/06/30 21:29, Eric W. Biederman wrote: >>> Hmm. The wake up happens just of tgid->wait_pidfd happens just before >>> release_task is called so there is a race. As it is possible to wake >>> up and then go back to sleep before pid_has_task becomes false. >> >> What is the reason we want to wait until pid_has_task() becomes false? >> >> - wait_event(tgid->wait_pidfd, !pid_has_task(tgid, PIDTYPE_TGID)); >> + while (!wait_event_timeout(tgid->wait_pidfd, !pid_has_task(tgid, PIDTYPE_TGID), 1)); > > So that it is safe to call bpfilter_umh_cleanup. The previous code > performed the wait by having a callback in do_exit. But bpfilter_umh_cleanup() does only fput(info->pipe_to_umh); fput(info->pipe_from_umh); put_pid(info->tgid); info->tgid = NULL; which is (I think) already safe regardless of the usermode process because bpfilter_umh_cleanup() merely closes one side of two pipes used between two processes and forgets about the usermode process. > > It might be possible to call bpf_umh_cleanup early but I have not done > that analysis. > > To perform the test correctly what I have right now is: Waiting for the termination of a SIGKILLed usermode process is not such simple. If a usermode process was killed by the OOM killer, it might take minutes for the killed process to reach do_exit() due to invisible memory allocation dependency chain. Since the OOM killer kicks the OOM reaper, and the OOM reaper forgets about the killed process after one second if mmap_sem could not be held (in order to avoid OOM deadlock), the OOM situation will be eventually solved; but there is no guarantee that the killed process can reach do_exit() in a short period. ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH v2 00/15] Make the user mode driver code a better citizen 2020-07-02 13:40 ` Tetsuo Handa @ 2020-07-02 16:02 ` Eric W. Biederman 2020-07-03 13:19 ` Tetsuo Handa 0 siblings, 1 reply; 72+ messages in thread From: Eric W. Biederman @ 2020-07-02 16:02 UTC (permalink / raw) To: Tetsuo Handa Cc: Alexei Starovoitov, linux-kernel, David Miller, Greg Kroah-Hartman, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> writes: > On 2020/07/02 22:08, Eric W. Biederman wrote: >> Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> writes: >> >>> On 2020/06/30 21:29, Eric W. Biederman wrote: >>>> Hmm. The wake up happens just of tgid->wait_pidfd happens just before >>>> release_task is called so there is a race. As it is possible to wake >>>> up and then go back to sleep before pid_has_task becomes false. >>> >>> What is the reason we want to wait until pid_has_task() becomes false? >>> >>> - wait_event(tgid->wait_pidfd, !pid_has_task(tgid, PIDTYPE_TGID)); >>> + while (!wait_event_timeout(tgid->wait_pidfd, !pid_has_task(tgid, PIDTYPE_TGID), 1)); >> >> So that it is safe to call bpfilter_umh_cleanup. The previous code >> performed the wait by having a callback in do_exit. > > But bpfilter_umh_cleanup() does only > > fput(info->pipe_to_umh); > fput(info->pipe_from_umh); > put_pid(info->tgid); > info->tgid = NULL; > > which is (I think) already safe regardless of the usermode process because > bpfilter_umh_cleanup() merely closes one side of two pipes used between > two processes and forgets about the usermode process. It is not safe. Baring bugs there is only one use of shtudown_umh that matters. The one in fini_umh. The use of the file by the mm must be finished before umd_unload_blob. AKA unmount. Which completely frees the filesystem. >> It might be possible to call bpf_umh_cleanup early but I have not done >> that analysis. >> >> To perform the test correctly what I have right now is: > > Waiting for the termination of a SIGKILLed usermode process is not > such simple. The waiting is that simple. You are correct it might not be a quick process. A good general principle is to start with something simple and correct for what it does, and then to make it more complicated when real world cases show up, and it can be understood what the real challenges are. I am not going to merge known broken code but I am also not going to overcomplicate it. Dealing with very rare and pathological cases that are not handled or considered today is out of scope for my patchset. Eric ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH v2 00/15] Make the user mode driver code a better citizen 2020-07-02 16:02 ` Eric W. Biederman @ 2020-07-03 13:19 ` Tetsuo Handa 2020-07-03 22:25 ` Eric W. Biederman 0 siblings, 1 reply; 72+ messages in thread From: Tetsuo Handa @ 2020-07-03 13:19 UTC (permalink / raw) To: Eric W. Biederman, Al Viro, Casey Schaufler Cc: Alexei Starovoitov, linux-kernel, David Miller, Greg Kroah-Hartman, Kees Cook, Andrew Morton, Alexei Starovoitov, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Luis Chamberlain, Linus Torvalds On 2020/07/02 22:08, Eric W. Biederman wrote: >> By the way, commit 4a9d4b024a3102fc ("switch fput to task_work_add") says >> that use of flush_delayed_fput() has to be careful. Al, is it safe to call >> flush_delayed_fput() from blob_to_mnt() from umd_load_blob() (which might be >> called from both kernel thread and from process context (e.g. init_module() >> syscall by /sbin/insmod )) ? > > And __fput_sync needs to be even more careful. > umd_load_blob is called in these changes without any locks held. But where is the guarantee that a thread which called flush_delayed_fput() waits for the completion of processing _all_ "struct file" linked into delayed_fput_list ? If some other thread or delayed_fput_work (scheduled by fput_many()) called flush_delayed_fput() between blob_to_mnt()'s fput(file) and flush_delayed_fput() sequence? blob_to_mnt()'s flush_delayed_fput() can miss the "struct file" which needs to be processed before execve(), can't it? Also, I don't know how convoluted the dependency of all "struct file" linked into delayed_fput_list might be, for there can be "struct file" which will not be a simple close of tmpfs file created by blob_to_mnt()'s file_open_root() request. On the other hand, although __fput_sync() cannot be called from !PF_KTHREAD threads, there is a guarantee that __fput_sync() waits for the completion of "struct file" which needs to be flushed before execve(), isn't there? > > We fundamentally AKA in any correct version of this code need to flush > the file descriptor before we call exec or exec can not open it a > read-only denying all writes from any other opens. > > The use case of flush_delayed_fput is exactly the same as that used > when loading the initramfs. When loading the initramfs, the number of threads is quite few (which means that the possibility of hitting the race window and convoluted dependency is small). But like EXPORT_SYMBOL_GPL(umd_load_blob) indicates, blob_to_mnt()'s flush_delayed_fput() might be called after many number of threads already started running. On 2020/07/03 1:02, Eric W. Biederman wrote: >>>> On 2020/06/30 21:29, Eric W. Biederman wrote: >>>>> Hmm. The wake up happens just of tgid->wait_pidfd happens just before >>>>> release_task is called so there is a race. As it is possible to wake >>>>> up and then go back to sleep before pid_has_task becomes false. >>>> >>>> What is the reason we want to wait until pid_has_task() becomes false? >>>> >>>> - wait_event(tgid->wait_pidfd, !pid_has_task(tgid, PIDTYPE_TGID)); >>>> + while (!wait_event_timeout(tgid->wait_pidfd, !pid_has_task(tgid, PIDTYPE_TGID), 1)); >>> >>> So that it is safe to call bpfilter_umh_cleanup. The previous code >>> performed the wait by having a callback in do_exit. >> >> But bpfilter_umh_cleanup() does only >> >> fput(info->pipe_to_umh); >> fput(info->pipe_from_umh); >> put_pid(info->tgid); >> info->tgid = NULL; >> >> which is (I think) already safe regardless of the usermode process because >> bpfilter_umh_cleanup() merely closes one side of two pipes used between >> two processes and forgets about the usermode process. > > It is not safe. > > Baring bugs there is only one use of shtudown_umh that matters. The one > in fini_umh. The use of the file by the mm must be finished before > umd_unload_blob. AKA unmount. Which completely frees the filesystem. Do we really need to mount upon umd_load_blob() and unmount upon umd_unload_blob() ? LSM modules might prefer only one instance of filesystem for umd blobs. For pathname based LSMs, since that filesystem is not visible from mount tree, only info->driver_name can be used for distinction. Therefore, one instance of filesystem with files created with file_open_root(O_CREAT | O_WRONLY | O_EXCL) might be preferable. For inode based LSMs, reusing one instance of filesystem created upon early boot might be convenient for labeling. Also, we might want a dedicated filesystem (say, "umdfs") instead of regular tmpfs in order to implement protections without labeling files. Then, we might also be able to implement minimal protections without LSMs. ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH v2 00/15] Make the user mode driver code a better citizen 2020-07-03 13:19 ` Tetsuo Handa @ 2020-07-03 22:25 ` Eric W. Biederman 2020-07-04 6:57 ` Tetsuo Handa 0 siblings, 1 reply; 72+ messages in thread From: Eric W. Biederman @ 2020-07-03 22:25 UTC (permalink / raw) To: Tetsuo Handa Cc: Al Viro, Casey Schaufler, Alexei Starovoitov, linux-kernel, David Miller, Greg Kroah-Hartman, Kees Cook, Andrew Morton, Alexei Starovoitov, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Luis Chamberlain, Linus Torvalds Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> writes: > On 2020/07/02 22:08, Eric W. Biederman wrote: >>> By the way, commit 4a9d4b024a3102fc ("switch fput to task_work_add") says >>> that use of flush_delayed_fput() has to be careful. Al, is it safe to call >>> flush_delayed_fput() from blob_to_mnt() from umd_load_blob() (which might be >>> called from both kernel thread and from process context (e.g. init_module() >>> syscall by /sbin/insmod )) ? >> >> And __fput_sync needs to be even more careful. >> umd_load_blob is called in these changes without any locks held. > > But where is the guarantee that a thread which called flush_delayed_fput() waits for > the completion of processing _all_ "struct file" linked into delayed_fput_list ? > If some other thread or delayed_fput_work (scheduled by fput_many()) called > flush_delayed_fput() between blob_to_mnt()'s fput(file) and flush_delayed_fput() > sequence? blob_to_mnt()'s flush_delayed_fput() can miss the "struct file" which > needs to be processed before execve(), can't it? As a module the guarantee is we call task_work_run. Built into the kernel the guarantee as best I can trace it is that kthreadd hasn't started, and as such nothing that is scheduled has run yet. > Also, I don't know how convoluted the dependency of all "struct file" linked into > delayed_fput_list might be, for there can be "struct file" which will not be a > simple close of tmpfs file created by blob_to_mnt()'s file_open_root() request. > > On the other hand, although __fput_sync() cannot be called from !PF_KTHREAD threads, > there is a guarantee that __fput_sync() waits for the completion of "struct file" > which needs to be flushed before execve(), isn't there? There is really not a good helper or helpers, and this code suggests we have something better. Right now I have used the existing helpers to the best of my ability. If you or someone else wants to write a better version of flushing so that exec can happen be my guest. As far as I can tell what I have is good enough. >> We fundamentally AKA in any correct version of this code need to flush >> the file descriptor before we call exec or exec can not open it a >> read-only denying all writes from any other opens. >> >> The use case of flush_delayed_fput is exactly the same as that used >> when loading the initramfs. > > When loading the initramfs, the number of threads is quite few (which > means that the possibility of hitting the race window and convoluted > dependency is small). But the reality is the code run very early, before the initramfs is initialized in practice. > But like EXPORT_SYMBOL_GPL(umd_load_blob) indicates, blob_to_mnt()'s > flush_delayed_fput() might be called after many number of threads already > started running. At which point the code probably won't be runnig from a kernel thread but instead will be running in a thread where task_work_run is relevant. At worst it is a very small race, where someone else in another thread starts flushing the file. Which means the file could still be completely close before exec. Even that is not necessarily fatal, as the usermode driver code has a respawn capability. Code that is used enough that it hits that race sounds like a very good problem to have from the perspective of the usermode driver code. > Do we really need to mount upon umd_load_blob() and unmount upon umd_unload_blob() ? > LSM modules might prefer only one instance of filesystem for umd > blobs. It is simple. People are free to change it, but a single filesystem seems like a very good place to start with this functionality. > For pathname based LSMs, since that filesystem is not visible from mount tree, only > info->driver_name can be used for distinction. Therefore, one instance of filesystem > with files created with file_open_root(O_CREAT | O_WRONLY | O_EXCL) > might be preferable. I took a quick look and the creation and removal of files with the in-kernel helpers is not particularly easy. Certainly it is more work and thus a higher likelyhood of bugs than what I have done. A directory per driver does sound tempting. Just more work that I am willing to do. > For inode based LSMs, reusing one instance of filesystem created upon early boot might > be convenient for labeling. > > Also, we might want a dedicated filesystem (say, "umdfs") instead of regular tmpfs in > order to implement protections without labeling files. Then, we might also be able to > implement minimal protections without LSMs. All valid points. Nothing sets this design in stone. Nothing says this is the endpoint of the evolution of this code. The entire point of this patchset for me is that I remove the unnecessary special cases from exec and do_exit, so I don't have to deal with the usermode driver code anymore. Eric ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH v2 00/15] Make the user mode driver code a better citizen 2020-07-03 22:25 ` Eric W. Biederman @ 2020-07-04 6:57 ` Tetsuo Handa 2020-07-08 4:46 ` Eric W. Biederman 0 siblings, 1 reply; 72+ messages in thread From: Tetsuo Handa @ 2020-07-04 6:57 UTC (permalink / raw) To: Eric W. Biederman, Al Viro Cc: Casey Schaufler, Alexei Starovoitov, linux-kernel, David Miller, Greg Kroah-Hartman, Kees Cook, Andrew Morton, Alexei Starovoitov, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Luis Chamberlain, Linus Torvalds On 2020/07/04 7:25, Eric W. Biederman wrote: > Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> writes: > >> On 2020/07/02 22:08, Eric W. Biederman wrote: >>>> By the way, commit 4a9d4b024a3102fc ("switch fput to task_work_add") says >>>> that use of flush_delayed_fput() has to be careful. Al, is it safe to call >>>> flush_delayed_fput() from blob_to_mnt() from umd_load_blob() (which might be >>>> called from both kernel thread and from process context (e.g. init_module() >>>> syscall by /sbin/insmod )) ? >>> >>> And __fput_sync needs to be even more careful. >>> umd_load_blob is called in these changes without any locks held. >> >> But where is the guarantee that a thread which called flush_delayed_fput() waits for >> the completion of processing _all_ "struct file" linked into delayed_fput_list ? >> If some other thread or delayed_fput_work (scheduled by fput_many()) called >> flush_delayed_fput() between blob_to_mnt()'s fput(file) and flush_delayed_fput() >> sequence? blob_to_mnt()'s flush_delayed_fput() can miss the "struct file" which >> needs to be processed before execve(), can't it? > > As a module the guarantee is we call task_work_run. No. It is possible that blob_to_mnt() is called by a kernel thread which was started by init_module() syscall by /sbin/insmod . > Built into the kernel the guarantee as best I can trace it is that > kthreadd hasn't started, and as such nothing that is scheduled has run > yet. Have you ever checked how early the kthreadd (PID=2) gets started? ---------- --- a/kernel/fork.c +++ b/kernel/fork.c @@ -2306,6 +2306,7 @@ static __latent_entropy struct task_struct *copy_process( trace_task_newtask(p, clone_flags); uprobe_copy_process(p, clone_flags); + printk(KERN_INFO "Created PID: %u Comm: %s\n", p->pid, p->comm); return p; bad_fork_cancel_cgroup: ---------- ---------- [ 0.090757][ T0] pid_max: default: 65536 minimum: 512 [ 0.090890][ T0] LSM: Security Framework initializing [ 0.090890][ T0] Mount-cache hash table entries: 8192 (order: 4, 65536 bytes, linear) [ 0.090890][ T0] Mountpoint-cache hash table entries: 8192 (order: 4, 65536 bytes, linear) [ 0.090890][ T0] Disabled fast string operations [ 0.090890][ T0] Last level iTLB entries: 4KB 1024, 2MB 1024, 4MB 1024 [ 0.090890][ T0] Last level dTLB entries: 4KB 1024, 2MB 1024, 4MB 1024, 1GB 4 [ 0.090890][ T0] Spectre V1 : Mitigation: usercopy/swapgs barriers and __user pointer sanitization [ 0.090890][ T0] Spectre V2 : Spectre mitigation: kernel not compiled with retpoline; no mitigation available! [ 0.090890][ T0] Speculative Store Bypass: Mitigation: Speculative Store Bypass disabled via prctl and seccomp [ 0.090890][ T0] SRBDS: Unknown: Dependent on hypervisor status [ 0.090890][ T0] MDS: Mitigation: Clear CPU buffers [ 0.090890][ T0] Freeing SMP alternatives memory: 24K [ 0.090890][ T0] Created PID: 1 Comm: swapper/0 [ 0.090890][ T0] Created PID: 2 Comm: swapper/0 [ 0.090890][ T1] smpboot: CPU0: Intel(R) Core(TM) i5-4440S CPU @ 2.80GHz (family: 0x6, model: 0x3c, stepping: 0x3) [ 0.091000][ T2] Created PID: 3 Comm: kthreadd [ 0.091995][ T2] Created PID: 4 Comm: kthreadd [ 0.093028][ T2] Created PID: 5 Comm: kthreadd [ 0.093997][ T2] Created PID: 6 Comm: kthreadd [ 0.094995][ T2] Created PID: 7 Comm: kthreadd [ 0.096037][ T2] Created PID: 8 Comm: kthreadd (...snipped...) [ 0.135716][ T2] Created PID: 13 Comm: kthreadd [ 0.135716][ T1] smp: Bringing up secondary CPUs ... [ 0.135716][ T2] Created PID: 14 Comm: kthreadd [ 0.135716][ T2] Created PID: 15 Comm: kthreadd [ 0.135716][ T2] Created PID: 16 Comm: kthreadd [ 0.135716][ T2] Created PID: 17 Comm: kthreadd [ 0.135716][ T2] Created PID: 18 Comm: kthreadd [ 0.135716][ T1] x86: Booting SMP configuration: (...snipped...) [ 0.901990][ T1] pci 0000:00:00.0: Limiting direct PCI/PCI transfers [ 0.902145][ T1] pci 0000:00:0f.0: Video device with shadowed ROM at [mem 0x000c0000-0x000dffff] [ 0.902213][ T1] pci 0000:02:00.0: CLS mismatch (32 != 64), using 64 bytes [ 0.902224][ T1] Trying to unpack rootfs image as initramfs... [ 1.107993][ T1] Freeing initrd memory: 18876K [ 1.109049][ T1] PCI-DMA: Using software bounce buffering for IO (SWIOTLB) [ 1.111003][ T1] software IO TLB: mapped [mem 0xab000000-0xaf000000] (64MB) [ 1.112136][ T1] check: Scanning for low memory corruption every 60 seconds [ 1.115040][ T2] Created PID: 52 Comm: kthreadd [ 1.116110][ T1] workingset: timestamp_bits=46 max_order=20 bucket_order=0 [ 1.120936][ T1] SGI XFS with ACLs, security attributes, verbose warnings, quota, no debug enabled [ 1.129626][ T2] Created PID: 53 Comm: kthreadd [ 1.131403][ T2] Created PID: 54 Comm: kthreadd ---------- kthreadd (PID=2) is created by swapper/0 (PID=0) immediately after init (PID=1) was created by swapper/0 (PID=0). It is even before secondary CPUs are brought up, and far earlier than unpacking initramfs. And how can we prove that blob_to_mnt() is only called by a kernel thread before some kernel thread that interferes fput() starts running? blob_to_mnt() needs to be prepared for being called after many processes already started running. > >> Also, I don't know how convoluted the dependency of all "struct file" linked into >> delayed_fput_list might be, for there can be "struct file" which will not be a >> simple close of tmpfs file created by blob_to_mnt()'s file_open_root() request. >> >> On the other hand, although __fput_sync() cannot be called from !PF_KTHREAD threads, >> there is a guarantee that __fput_sync() waits for the completion of "struct file" >> which needs to be flushed before execve(), isn't there? > > There is really not a good helper or helpers, and this code suggests we > have something better. Right now I have used the existing helpers to > the best of my ability. If you or someone else wants to write a better > version of flushing so that exec can happen be my guest. > > As far as I can tell what I have is good enough. Just saying what you think is not a "review". I'm waiting for answer from Al Viro because I consider that Al will be the most familiar with fput()'s behavior. At least I consider that if (current->flags & PF_KTHREAD) { __fput_sync(file); } else { fput(file); task_work_run(); } is a candidate for closing the race window. And depending on Al's answer, removing BUG_ON(!(task->flags & PF_KTHREAD)); from __fput_sync() and unconditionally using __fput_sync(file); from blob_to_mnt() might be the better choice. Anyway, I consider that Al's response is important for this "review". > >>> We fundamentally AKA in any correct version of this code need to flush >>> the file descriptor before we call exec or exec can not open it a >>> read-only denying all writes from any other opens. >>> >>> The use case of flush_delayed_fput is exactly the same as that used >>> when loading the initramfs. >> >> When loading the initramfs, the number of threads is quite few (which >> means that the possibility of hitting the race window and convoluted >> dependency is small). > > But the reality is the code run very early, before the initramfs is > initialized in practice. Such expectation is not a reality. > >> But like EXPORT_SYMBOL_GPL(umd_load_blob) indicates, blob_to_mnt()'s >> flush_delayed_fput() might be called after many number of threads already >> started running. > > At which point the code probably won't be runnig from a kernel thread > but instead will be running in a thread where task_work_run is relevant. No. It is possible that blob_to_mnt() is called by a kernel thread which was started by init_module() syscall by /sbin/insmod . > > At worst it is a very small race, where someone else in another thread > starts flushing the file. Which means the file could still be > completely close before exec. Even that is not necessarily fatal, > as the usermode driver code has a respawn capability. > > Code that is used enough that it hits that race sounds like a very > good problem to have from the perspective of the usermode driver code. In general, unconditionally retrying call_usermodehelper() when it returned a negative value (e.g. -ENOENT, -ENOMEM, -EBUSY) is bad. I don't know which code is an implementation of "a respawn capability"; I'd like to check where that code is and whether that code is checking -ETXTBSY. ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH v2 00/15] Make the user mode driver code a better citizen 2020-07-04 6:57 ` Tetsuo Handa @ 2020-07-08 4:46 ` Eric W. Biederman 0 siblings, 0 replies; 72+ messages in thread From: Eric W. Biederman @ 2020-07-08 4:46 UTC (permalink / raw) To: Tetsuo Handa Cc: Al Viro, Casey Schaufler, Alexei Starovoitov, linux-kernel, David Miller, Greg Kroah-Hartman, Kees Cook, Andrew Morton, Alexei Starovoitov, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Luis Chamberlain, Linus Torvalds Just to make certain I understand what is going on I instrumented a kernel with some print statements. a) The workqueues and timers start before populate_rootfs. b) populate_rootfs does indeed happen long before the bpfilter module is intialized. c) What prevents populate_rootfs and the umd_load_blob from having problems when they call flush_delayed_put is the fact that fput_many does: "schedule_delayed_work(&delayed_fput_work,1)". That 1 requests a delay of at least 1 jiffy. A jiffy is between 1ms and 10ms depending on how Linux is configured. In my test configuration running a kernel in kvm printing to a serial console I measured 0.8ms between the fput in blob_to_mnt and flush_delayed_fput which immediately follows it. So unless the fput becomes incredibly slow there is nothing to worry about in blob_to_mnt. d) As the same mechanism is used by populate_rootfs. A but in the mechanism applies to both. e) No one appears to have reported a problem executing files out of initramfs these last several years since the flush_delayed_fput was introduced. f) The code works for me. There is real reason to believe the code will work for everyone else, as the exact same logic is used by initramfs. So it should be perfectly fine for the patchset and the usermode_driver code to go ahead as written. h) If there is something to be fixed it is flush_delayed_fput as that is much more important than anything in the usermode driver code. Eric p.s.) When I talked of restarts of the usermode driver code ealier I was referring to the code that restarts the usermode driver if it is killed, the next time the kernel tries to talk to it. That could mask an -ETXTBUSY except if it happens on the first exec the net/bfilter/bpfilter_kern.c:load_umh() will return an error. ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH v2 00/15] Make the user mode driver code a better citizen 2020-06-30 12:29 ` Eric W. Biederman 2020-06-30 13:21 ` Tetsuo Handa @ 2020-06-30 16:52 ` Alexei Starovoitov 2020-07-01 17:12 ` Eric W. Biederman 1 sibling, 1 reply; 72+ messages in thread From: Alexei Starovoitov @ 2020-06-30 16:52 UTC (permalink / raw) To: Eric W. Biederman Cc: linux-kernel, David Miller, Greg Kroah-Hartman, Tetsuo Handa, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds On Tue, Jun 30, 2020 at 07:29:34AM -0500, Eric W. Biederman wrote: > > diff --git a/net/bpfilter/bpfilter_kern.c b/net/bpfilter/bpfilter_kern.c > index 91474884ddb7..3e1874030daa 100644 > --- a/net/bpfilter/bpfilter_kern.c > +++ b/net/bpfilter/bpfilter_kern.c > @@ -19,8 +19,8 @@ static void shutdown_umh(void) > struct pid *tgid = info->tgid; > > if (tgid) { > - kill_pid_info(SIGKILL, SEND_SIG_PRIV, tgid); > - wait_event(tgid->wait_pidfd, !pid_task(tgid, PIDTYPE_TGID)); > + kill_pid(tgid, SIGKILL, 1); > + wait_event(tgid->wait_pidfd, !pid_has_task(tgid, PIDTYPE_TGID)); > bpfilter_umh_cleanup(info); > } > } > > > And then did: > > while true; do iptables -L;rmmod bpfilter; done > > > > Unfortunately sometimes 'rmmod bpfilter' hangs in wait_event(). > > Hmm. The wake up happens just of tgid->wait_pidfd happens just before > release_task is called so there is a race. As it is possible to wake > up and then go back to sleep before pid_has_task becomes false. > > So I think I need a friendly helper that does: > > bool task_has_exited(struct pid *tgid) > { > bool exited = false; > > rcu_read_lock(); > tsk = pid_task(tgid, PIDTYPE_TGID); > exited = !!tsk; > if (tsk) { > exited = !!tsk->exit_state; > out: > rcu_unlock(); > return exited; > } All makes sense to me. If I understood the race condition such helper should indeed solve it. Are you going to add such patch to your series? I'll proceed with my work on top of your series and will ignore this race for now, but I think it should be fixed before we land this set into multiple trees. ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH v2 00/15] Make the user mode driver code a better citizen 2020-06-30 16:52 ` Alexei Starovoitov @ 2020-07-01 17:12 ` Eric W. Biederman 0 siblings, 0 replies; 72+ messages in thread From: Eric W. Biederman @ 2020-07-01 17:12 UTC (permalink / raw) To: Alexei Starovoitov Cc: linux-kernel, David Miller, Greg Kroah-Hartman, Tetsuo Handa, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds Alexei Starovoitov <alexei.starovoitov@gmail.com> writes: > On Tue, Jun 30, 2020 at 07:29:34AM -0500, Eric W. Biederman wrote: >> >> diff --git a/net/bpfilter/bpfilter_kern.c b/net/bpfilter/bpfilter_kern.c >> index 91474884ddb7..3e1874030daa 100644 >> --- a/net/bpfilter/bpfilter_kern.c >> +++ b/net/bpfilter/bpfilter_kern.c >> @@ -19,8 +19,8 @@ static void shutdown_umh(void) >> struct pid *tgid = info->tgid; >> >> if (tgid) { >> - kill_pid_info(SIGKILL, SEND_SIG_PRIV, tgid); >> - wait_event(tgid->wait_pidfd, !pid_task(tgid, PIDTYPE_TGID)); >> + kill_pid(tgid, SIGKILL, 1); >> + wait_event(tgid->wait_pidfd, !pid_has_task(tgid, PIDTYPE_TGID)); >> bpfilter_umh_cleanup(info); >> } >> } >> >> > And then did: >> > while true; do iptables -L;rmmod bpfilter; done >> > >> > Unfortunately sometimes 'rmmod bpfilter' hangs in wait_event(). >> >> Hmm. The wake up happens just of tgid->wait_pidfd happens just before >> release_task is called so there is a race. As it is possible to wake >> up and then go back to sleep before pid_has_task becomes false. >> >> So I think I need a friendly helper that does: >> >> bool task_has_exited(struct pid *tgid) >> { >> bool exited = false; >> >> rcu_read_lock(); >> tsk = pid_task(tgid, PIDTYPE_TGID); >> exited = !!tsk; >> if (tsk) { >> exited = !!tsk->exit_state; >> out: >> rcu_unlock(); >> return exited; >> } > > All makes sense to me. > If I understood the race condition such helper should indeed solve it. > Are you going to add such patch to your series? > I'll proceed with my work on top of your series and will ignore this > race for now, but I think it should be fixed before we land this set > into multiple trees. Yes. I am just finishing it up now. Eric ^ permalink raw reply [flat|nested] 72+ messages in thread
* [PATCH v3 00/16] Make the user mode driver code a better citizen 2020-06-29 19:55 ` [PATCH v2 00/15] Make the user mode driver code a better citizen Eric W. Biederman ` (15 preceding siblings ...) 2020-06-29 22:12 ` [PATCH v2 00/15] Make the user mode driver code a better citizen Alexei Starovoitov @ 2020-07-02 16:40 ` Eric W. Biederman 2020-07-02 16:41 ` [PATCH v3 01/16] umh: Capture the pid in umh_pipe_setup Eric W. Biederman ` (17 more replies) 2020-07-08 5:20 ` [PATCH v2 00/15] " Luis Chamberlain 17 siblings, 18 replies; 72+ messages in thread From: Eric W. Biederman @ 2020-07-02 16:40 UTC (permalink / raw) To: linux-kernel Cc: David Miller, Greg Kroah-Hartman, Tetsuo Handa, Alexei Starovoitov, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds, Christian Brauner This is the third round of my changeset to split the user mode driver code from the user mode helper code, and to make the code use common facilities to get things done instead of recreating them just for the user mode driver code. I have split the changes into small enough pieces so they should be easily readable and testable. The changes lean into the preexisting interfaces in the kernel and remove special cases for user mode driver code in favor of solutions that don't need special cases. This results in smaller code with fewer bugs. At a practical level this removes the maintenance burden of the user mode drivers from the user mode helper code and from exec as the special cases are removed. Similarly the LSM interaction bugs are fixed by not having unnecessary special cases for user mode drivers. I have tested thes changes by booting with the code compiled in and by killing "bpfilter_umh" and "running iptables -vnL" to restart the userspace driver, also by running "while true; do iptables -L;rmmod bpfilter; done" to verify the module load and unload work properly. I have compiled tested each change with and without CONFIG_BPFILTER enabled. From v2 to v3 I have made two siginficant changes. - I factored thread_group_exit out of pidfd_poll to allow the test to be used by the bpfilter code. - I renamed umd.c and umd.h to usermode_driver.c and usermode_driver.h respectively. I made a few very small changes from v1 to v2: - Updated the function name in a comment when the function is renamed - Moved some more code so that the the !CONFIG_BPFILTER case continues to compile when I moved the code into umd.c - A fix for the module loading case to really flush the file descriptor. - Removed split_argv entirely from fork_usermode_driver. There was nothing to split so it was just confusing. Please let me know if you see any bugs. Once the code review is finished I plan to place the code in a non-rebasing branch so I can pull it into my tree and so it can also be pulled into the bpf-next tree. v1: https://lkml.kernel.org/r/87pn9mgfc2.fsf_-_@x220.int.ebiederm.org v2: https://lkml.kernel.org/r/87bll17ili.fsf_-_@x220.int.ebiederm.org Eric W. Biederman (16): umh: Capture the pid in umh_pipe_setup umh: Move setting PF_UMH into umh_pipe_setup umh: Rename the user mode driver helpers for clarity umh: Remove call_usermodehelper_setup_file. umh: Separate the user mode driver and the user mode helper support umd: For clarity rename umh_info umd_info umd: Rename umd_info.cmdline umd_info.driver_name umd: Transform fork_usermode_blob into fork_usermode_driver umh: Stop calling do_execve_file exec: Remove do_execve_file bpfilter: Move bpfilter_umh back into init data umd: Track user space drivers with struct pid exit: Factor thread_group_exited out of pidfd_poll bpfilter: Take advantage of the facilities of struct pid umd: Remove exit_umh umd: Stop using split_argv fs/exec.c | 38 ++------ include/linux/binfmts.h | 1 - include/linux/bpfilter.h | 7 +- include/linux/sched.h | 9 -- include/linux/sched/signal.h | 2 + include/linux/umh.h | 15 ---- include/linux/usermode_driver.h | 18 ++++ kernel/Makefile | 1 + kernel/exit.c | 25 +++++- kernel/fork.c | 6 +- kernel/umh.c | 171 +----------------------------------- kernel/usermode_driver.c | 182 +++++++++++++++++++++++++++++++++++++++ net/bpfilter/bpfilter_kern.c | 38 ++++---- net/bpfilter/bpfilter_umh_blob.S | 2 +- net/ipv4/bpfilter/sockopt.c | 20 +++-- 15 files changed, 275 insertions(+), 260 deletions(-) Eric W. Biederman (15): umh: Capture the pid in umh_pipe_setup umh: Move setting PF_UMH into umh_pipe_setup umh: Rename the user mode driver helpers for clarity umh: Remove call_usermodehelper_setup_file. umh: Separate the user mode driver and the user mode helper support umd: For clarity rename umh_info umd_info umd: Rename umd_info.cmdline umd_info.driver_name umd: Transform fork_usermode_blob into fork_usermode_driver umh: Stop calling do_execve_file exec: Remove do_execve_file bpfilter: Move bpfilter_umh back into init data umd: Track user space drivers with struct pid bpfilter: Take advantage of the facilities of struct pid umd: Remove exit_umh umd: Stop using split_argv fs/exec.c | 38 ++------ include/linux/binfmts.h | 1 - include/linux/bpfilter.h | 7 +- include/linux/sched.h | 9 -- include/linux/umd.h | 18 ++++ include/linux/umh.h | 15 ---- kernel/Makefile | 1 + kernel/exit.c | 1 - kernel/umd.c | 182 +++++++++++++++++++++++++++++++++++++++ kernel/umh.c | 171 +----------------------------------- net/bpfilter/bpfilter_kern.c | 38 ++++---- net/bpfilter/bpfilter_umh_blob.S | 2 +- net/ipv4/bpfilter/sockopt.c | 20 +++-- 13 files changed, 248 insertions(+), 255 deletions(-) ^ permalink raw reply [flat|nested] 72+ messages in thread
* [PATCH v3 01/16] umh: Capture the pid in umh_pipe_setup 2020-07-02 16:40 ` [PATCH v3 00/16] " Eric W. Biederman @ 2020-07-02 16:41 ` Eric W. Biederman 2020-07-02 16:41 ` [PATCH v3 02/16] umh: Move setting PF_UMH into umh_pipe_setup Eric W. Biederman ` (16 subsequent siblings) 17 siblings, 0 replies; 72+ messages in thread From: Eric W. Biederman @ 2020-07-02 16:41 UTC (permalink / raw) To: linux-kernel Cc: David Miller, Greg Kroah-Hartman, Tetsuo Handa, Alexei Starovoitov, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds, Christian Brauner, Eric W. Biederman, Greg Kroah-Hartman The pid in struct subprocess_info is only used by umh_clean_and_save_pid to write the pid into umh_info. Instead always capture the pid on struct umh_info in umh_pipe_setup, removing code that is specific to user mode drivers from the common user path of user mode helpers. v1: https://lkml.kernel.org/r/87h7uygf9i.fsf_-_@x220.int.ebiederm.org v2: https://lkml.kernel.org/r/875zb97iix.fsf_-_@x220.int.ebiederm.org Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- include/linux/umh.h | 1 - kernel/umh.c | 5 ++--- 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/include/linux/umh.h b/include/linux/umh.h index 0c08de356d0d..aae16a0ebd0f 100644 --- a/include/linux/umh.h +++ b/include/linux/umh.h @@ -25,7 +25,6 @@ struct subprocess_info { struct file *file; int wait; int retval; - pid_t pid; int (*init)(struct subprocess_info *info, struct cred *new); void (*cleanup)(struct subprocess_info *info); void *data; diff --git a/kernel/umh.c b/kernel/umh.c index 79f139a7ca03..c2a582b3a2bf 100644 --- a/kernel/umh.c +++ b/kernel/umh.c @@ -102,7 +102,6 @@ static int call_usermodehelper_exec_async(void *data) commit_creds(new); - sub_info->pid = task_pid_nr(current); if (sub_info->file) { retval = do_execve_file(sub_info->file, sub_info->argv, sub_info->envp); @@ -468,6 +467,7 @@ static int umh_pipe_setup(struct subprocess_info *info, struct cred *new) umh_info->pipe_to_umh = to_umh[1]; umh_info->pipe_from_umh = from_umh[0]; + umh_info->pid = task_pid_nr(current); return 0; } @@ -476,13 +476,12 @@ static void umh_clean_and_save_pid(struct subprocess_info *info) struct umh_info *umh_info = info->data; /* cleanup if umh_pipe_setup() was successful but exec failed */ - if (info->pid && info->retval) { + if (info->retval) { fput(umh_info->pipe_to_umh); fput(umh_info->pipe_from_umh); } argv_free(info->argv); - umh_info->pid = info->pid; } /** -- 2.25.0 ^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH v3 02/16] umh: Move setting PF_UMH into umh_pipe_setup 2020-07-02 16:40 ` [PATCH v3 00/16] " Eric W. Biederman 2020-07-02 16:41 ` [PATCH v3 01/16] umh: Capture the pid in umh_pipe_setup Eric W. Biederman @ 2020-07-02 16:41 ` Eric W. Biederman 2020-07-02 16:41 ` [PATCH v3 03/16] umh: Rename the user mode driver helpers for clarity Eric W. Biederman ` (15 subsequent siblings) 17 siblings, 0 replies; 72+ messages in thread From: Eric W. Biederman @ 2020-07-02 16:41 UTC (permalink / raw) To: linux-kernel Cc: David Miller, Greg Kroah-Hartman, Tetsuo Handa, Alexei Starovoitov, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds, Christian Brauner, Eric W. Biederman, Greg Kroah-Hartman I am separating the code specific to user mode drivers from the code for ordinary user space helpers. Move setting of PF_UMH from call_usermodehelper_exec_async which is core user mode helper code into umh_pipe_setup which is user mode driver code. The code is equally as easy to write in one location as the other and the movement minimizes the impact of the user mode driver code on the core of the user mode helper code. Setting PF_UMH unconditionally is harmless as an action will only happen if it is paired with an entry on umh_list. v1: https://lkml.kernel.org/r/87bll6gf8t.fsf_-_@x220.int.ebiederm.org v2: https://lkml.kernel.org/r/87zh8l63xs.fsf_-_@x220.int.ebiederm.org Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- kernel/umh.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/kernel/umh.c b/kernel/umh.c index c2a582b3a2bf..e6b9d6636850 100644 --- a/kernel/umh.c +++ b/kernel/umh.c @@ -102,12 +102,10 @@ static int call_usermodehelper_exec_async(void *data) commit_creds(new); - if (sub_info->file) { + if (sub_info->file) retval = do_execve_file(sub_info->file, sub_info->argv, sub_info->envp); - if (!retval) - current->flags |= PF_UMH; - } else + else retval = do_execve(getname_kernel(sub_info->path), (const char __user *const __user *)sub_info->argv, (const char __user *const __user *)sub_info->envp); @@ -468,6 +466,7 @@ static int umh_pipe_setup(struct subprocess_info *info, struct cred *new) umh_info->pipe_to_umh = to_umh[1]; umh_info->pipe_from_umh = from_umh[0]; umh_info->pid = task_pid_nr(current); + current->flags |= PF_UMH; return 0; } -- 2.25.0 ^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH v3 03/16] umh: Rename the user mode driver helpers for clarity 2020-07-02 16:40 ` [PATCH v3 00/16] " Eric W. Biederman 2020-07-02 16:41 ` [PATCH v3 01/16] umh: Capture the pid in umh_pipe_setup Eric W. Biederman 2020-07-02 16:41 ` [PATCH v3 02/16] umh: Move setting PF_UMH into umh_pipe_setup Eric W. Biederman @ 2020-07-02 16:41 ` Eric W. Biederman 2020-07-02 16:41 ` [PATCH v3 04/16] umh: Remove call_usermodehelper_setup_file Eric W. Biederman ` (14 subsequent siblings) 17 siblings, 0 replies; 72+ messages in thread From: Eric W. Biederman @ 2020-07-02 16:41 UTC (permalink / raw) To: linux-kernel Cc: David Miller, Greg Kroah-Hartman, Tetsuo Handa, Alexei Starovoitov, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds, Christian Brauner, Eric W. Biederman, Greg Kroah-Hartman Now that the functionality of umh_setup_pipe and umh_clean_and_save_pid has changed their names are too specific and don't make much sense. Instead name them umd_setup and umd_cleanup for the functional role in setting up user mode drivers. v1: https://lkml.kernel.org/r/875zbegf82.fsf_-_@x220.int.ebiederm.org v2: https://lkml.kernel.org/r/87tuyt63x3.fsf_-_@x220.int.ebiederm.org Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- kernel/umh.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/kernel/umh.c b/kernel/umh.c index e6b9d6636850..26c3d493f168 100644 --- a/kernel/umh.c +++ b/kernel/umh.c @@ -429,7 +429,7 @@ struct subprocess_info *call_usermodehelper_setup_file(struct file *file, return sub_info; } -static int umh_pipe_setup(struct subprocess_info *info, struct cred *new) +static int umd_setup(struct subprocess_info *info, struct cred *new) { struct umh_info *umh_info = info->data; struct file *from_umh[2]; @@ -470,11 +470,11 @@ static int umh_pipe_setup(struct subprocess_info *info, struct cred *new) return 0; } -static void umh_clean_and_save_pid(struct subprocess_info *info) +static void umd_cleanup(struct subprocess_info *info) { struct umh_info *umh_info = info->data; - /* cleanup if umh_pipe_setup() was successful but exec failed */ + /* cleanup if umh_setup() was successful but exec failed */ if (info->retval) { fput(umh_info->pipe_to_umh); fput(umh_info->pipe_from_umh); @@ -520,8 +520,8 @@ int fork_usermode_blob(void *data, size_t len, struct umh_info *info) } err = -ENOMEM; - sub_info = call_usermodehelper_setup_file(file, umh_pipe_setup, - umh_clean_and_save_pid, info); + sub_info = call_usermodehelper_setup_file(file, umd_setup, umd_cleanup, + info); if (!sub_info) goto out; -- 2.25.0 ^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH v3 04/16] umh: Remove call_usermodehelper_setup_file. 2020-07-02 16:40 ` [PATCH v3 00/16] " Eric W. Biederman ` (2 preceding siblings ...) 2020-07-02 16:41 ` [PATCH v3 03/16] umh: Rename the user mode driver helpers for clarity Eric W. Biederman @ 2020-07-02 16:41 ` Eric W. Biederman 2020-07-02 16:41 ` [PATCH v3 05/16] umh: Separate the user mode driver and the user mode helper support Eric W. Biederman ` (13 subsequent siblings) 17 siblings, 0 replies; 72+ messages in thread From: Eric W. Biederman @ 2020-07-02 16:41 UTC (permalink / raw) To: linux-kernel Cc: David Miller, Greg Kroah-Hartman, Tetsuo Handa, Alexei Starovoitov, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds, Christian Brauner, Eric W. Biederman, Greg Kroah-Hartman The only caller of call_usermodehelper_setup_file is fork_usermode_blob. In fork_usermode_blob replace call_usermodehelper_setup_file with call_usermodehelper_setup and delete fork_usermodehelper_setup_file. For this to work the argv_free is moved from umh_clean_and_save_pid to fork_usermode_blob. v1: https://lkml.kernel.org/r/87zh8qf0mp.fsf_-_@x220.int.ebiederm.org v2: https://lkml.kernel.org/r/87o8p163u1.fsf_-_@x220.int.ebiederm.org Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- include/linux/umh.h | 3 --- kernel/umh.c | 42 +++++++++++------------------------------- 2 files changed, 11 insertions(+), 34 deletions(-) diff --git a/include/linux/umh.h b/include/linux/umh.h index aae16a0ebd0f..de08af00c68a 100644 --- a/include/linux/umh.h +++ b/include/linux/umh.h @@ -39,9 +39,6 @@ call_usermodehelper_setup(const char *path, char **argv, char **envp, int (*init)(struct subprocess_info *info, struct cred *new), void (*cleanup)(struct subprocess_info *), void *data); -struct subprocess_info *call_usermodehelper_setup_file(struct file *file, - int (*init)(struct subprocess_info *info, struct cred *new), - void (*cleanup)(struct subprocess_info *), void *data); struct umh_info { const char *cmdline; struct file *pipe_to_umh; diff --git a/kernel/umh.c b/kernel/umh.c index 26c3d493f168..b8fa9b99b366 100644 --- a/kernel/umh.c +++ b/kernel/umh.c @@ -402,33 +402,6 @@ struct subprocess_info *call_usermodehelper_setup(const char *path, char **argv, } EXPORT_SYMBOL(call_usermodehelper_setup); -struct subprocess_info *call_usermodehelper_setup_file(struct file *file, - int (*init)(struct subprocess_info *info, struct cred *new), - void (*cleanup)(struct subprocess_info *info), void *data) -{ - struct subprocess_info *sub_info; - struct umh_info *info = data; - const char *cmdline = (info->cmdline) ? info->cmdline : "usermodehelper"; - - sub_info = kzalloc(sizeof(struct subprocess_info), GFP_KERNEL); - if (!sub_info) - return NULL; - - sub_info->argv = argv_split(GFP_KERNEL, cmdline, NULL); - if (!sub_info->argv) { - kfree(sub_info); - return NULL; - } - - INIT_WORK(&sub_info->work, call_usermodehelper_exec_work); - sub_info->path = "none"; - sub_info->file = file; - sub_info->init = init; - sub_info->cleanup = cleanup; - sub_info->data = data; - return sub_info; -} - static int umd_setup(struct subprocess_info *info, struct cred *new) { struct umh_info *umh_info = info->data; @@ -479,8 +452,6 @@ static void umd_cleanup(struct subprocess_info *info) fput(umh_info->pipe_to_umh); fput(umh_info->pipe_from_umh); } - - argv_free(info->argv); } /** @@ -501,7 +472,9 @@ static void umd_cleanup(struct subprocess_info *info) */ int fork_usermode_blob(void *data, size_t len, struct umh_info *info) { + const char *cmdline = (info->cmdline) ? info->cmdline : "usermodehelper"; struct subprocess_info *sub_info; + char **argv = NULL; struct file *file; ssize_t written; loff_t pos = 0; @@ -520,11 +493,16 @@ int fork_usermode_blob(void *data, size_t len, struct umh_info *info) } err = -ENOMEM; - sub_info = call_usermodehelper_setup_file(file, umd_setup, umd_cleanup, - info); + argv = argv_split(GFP_KERNEL, cmdline, NULL); + if (!argv) + goto out; + + sub_info = call_usermodehelper_setup("none", argv, NULL, GFP_KERNEL, + umd_setup, umd_cleanup, info); if (!sub_info) goto out; + sub_info->file = file; err = call_usermodehelper_exec(sub_info, UMH_WAIT_EXEC); if (!err) { mutex_lock(&umh_list_lock); @@ -532,6 +510,8 @@ int fork_usermode_blob(void *data, size_t len, struct umh_info *info) mutex_unlock(&umh_list_lock); } out: + if (argv) + argv_free(argv); fput(file); return err; } -- 2.25.0 ^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH v3 05/16] umh: Separate the user mode driver and the user mode helper support 2020-07-02 16:40 ` [PATCH v3 00/16] " Eric W. Biederman ` (3 preceding siblings ...) 2020-07-02 16:41 ` [PATCH v3 04/16] umh: Remove call_usermodehelper_setup_file Eric W. Biederman @ 2020-07-02 16:41 ` Eric W. Biederman 2020-07-02 16:41 ` [PATCH v3 06/16] umd: For clarity rename umh_info umd_info Eric W. Biederman ` (12 subsequent siblings) 17 siblings, 0 replies; 72+ messages in thread From: Eric W. Biederman @ 2020-07-02 16:41 UTC (permalink / raw) To: linux-kernel Cc: David Miller, Greg Kroah-Hartman, Tetsuo Handa, Alexei Starovoitov, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds, Christian Brauner, Eric W. Biederman, Greg Kroah-Hartman This makes it clear which code is part of the core user mode helper support and which code is needed to implement user mode drivers. This makes the kernel smaller for everyone who does not use a usermode driver. v1: https://lkml.kernel.org/r/87tuyyf0ln.fsf_-_@x220.int.ebiederm.org v2: https://lkml.kernel.org/r/87imf963s6.fsf_-_@x220.int.ebiederm.org Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- include/linux/bpfilter.h | 2 +- include/linux/sched.h | 8 -- include/linux/umh.h | 10 --- include/linux/usermode_driver.h | 30 +++++++ kernel/Makefile | 1 + kernel/exit.c | 1 + kernel/umh.c | 139 ------------------------------ kernel/usermode_driver.c | 146 ++++++++++++++++++++++++++++++++ 8 files changed, 179 insertions(+), 158 deletions(-) create mode 100644 include/linux/usermode_driver.h create mode 100644 kernel/usermode_driver.c diff --git a/include/linux/bpfilter.h b/include/linux/bpfilter.h index d815622cd31e..d6d6206052a6 100644 --- a/include/linux/bpfilter.h +++ b/include/linux/bpfilter.h @@ -3,7 +3,7 @@ #define _LINUX_BPFILTER_H #include <uapi/linux/bpfilter.h> -#include <linux/umh.h> +#include <linux/usermode_driver.h> struct sock; int bpfilter_ip_set_sockopt(struct sock *sk, int optname, char __user *optval, diff --git a/include/linux/sched.h b/include/linux/sched.h index b62e6aaf28f0..59d1e92bb88e 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -2020,14 +2020,6 @@ static inline void rseq_execve(struct task_struct *t) #endif -void __exit_umh(struct task_struct *tsk); - -static inline void exit_umh(struct task_struct *tsk) -{ - if (unlikely(tsk->flags & PF_UMH)) - __exit_umh(tsk); -} - #ifdef CONFIG_DEBUG_RSEQ void rseq_syscall(struct pt_regs *regs); diff --git a/include/linux/umh.h b/include/linux/umh.h index de08af00c68a..73173c4a07e5 100644 --- a/include/linux/umh.h +++ b/include/linux/umh.h @@ -39,16 +39,6 @@ call_usermodehelper_setup(const char *path, char **argv, char **envp, int (*init)(struct subprocess_info *info, struct cred *new), void (*cleanup)(struct subprocess_info *), void *data); -struct umh_info { - const char *cmdline; - struct file *pipe_to_umh; - struct file *pipe_from_umh; - struct list_head list; - void (*cleanup)(struct umh_info *info); - pid_t pid; -}; -int fork_usermode_blob(void *data, size_t len, struct umh_info *info); - extern int call_usermodehelper_exec(struct subprocess_info *info, int wait); diff --git a/include/linux/usermode_driver.h b/include/linux/usermode_driver.h new file mode 100644 index 000000000000..c5f6dc950227 --- /dev/null +++ b/include/linux/usermode_driver.h @@ -0,0 +1,30 @@ +#ifndef __LINUX_USERMODE_DRIVER_H__ +#define __LINUX_USERMODE_DRIVER_H__ + +#include <linux/umh.h> + +#ifdef CONFIG_BPFILTER +void __exit_umh(struct task_struct *tsk); + +static inline void exit_umh(struct task_struct *tsk) +{ + if (unlikely(tsk->flags & PF_UMH)) + __exit_umh(tsk); +} +#else +static inline void exit_umh(struct task_struct *tsk) +{ +} +#endif + +struct umh_info { + const char *cmdline; + struct file *pipe_to_umh; + struct file *pipe_from_umh; + struct list_head list; + void (*cleanup)(struct umh_info *info); + pid_t pid; +}; +int fork_usermode_blob(void *data, size_t len, struct umh_info *info); + +#endif /* __LINUX_USERMODE_DRIVER_H__ */ diff --git a/kernel/Makefile b/kernel/Makefile index f3218bc5ec69..43928759893a 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -12,6 +12,7 @@ obj-y = fork.o exec_domain.o panic.o \ notifier.o ksysfs.o cred.o reboot.o \ async.o range.o smpboot.o ucount.o +obj-$(CONFIG_BPFILTER) += usermode_driver.o obj-$(CONFIG_MODULES) += kmod.o obj-$(CONFIG_MULTIUSER) += groups.o diff --git a/kernel/exit.c b/kernel/exit.c index 727150f28103..a081deea52ca 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -63,6 +63,7 @@ #include <linux/random.h> #include <linux/rcuwait.h> #include <linux/compat.h> +#include <linux/usermode_driver.h> #include <linux/uaccess.h> #include <asm/unistd.h> diff --git a/kernel/umh.c b/kernel/umh.c index b8fa9b99b366..3e4e453d45c8 100644 --- a/kernel/umh.c +++ b/kernel/umh.c @@ -26,8 +26,6 @@ #include <linux/ptrace.h> #include <linux/async.h> #include <linux/uaccess.h> -#include <linux/shmem_fs.h> -#include <linux/pipe_fs_i.h> #include <trace/events/module.h> @@ -38,8 +36,6 @@ static kernel_cap_t usermodehelper_bset = CAP_FULL_SET; static kernel_cap_t usermodehelper_inheritable = CAP_FULL_SET; static DEFINE_SPINLOCK(umh_sysctl_lock); static DECLARE_RWSEM(umhelper_sem); -static LIST_HEAD(umh_list); -static DEFINE_MUTEX(umh_list_lock); static void call_usermodehelper_freeinfo(struct subprocess_info *info) { @@ -402,121 +398,6 @@ struct subprocess_info *call_usermodehelper_setup(const char *path, char **argv, } EXPORT_SYMBOL(call_usermodehelper_setup); -static int umd_setup(struct subprocess_info *info, struct cred *new) -{ - struct umh_info *umh_info = info->data; - struct file *from_umh[2]; - struct file *to_umh[2]; - int err; - - /* create pipe to send data to umh */ - err = create_pipe_files(to_umh, 0); - if (err) - return err; - err = replace_fd(0, to_umh[0], 0); - fput(to_umh[0]); - if (err < 0) { - fput(to_umh[1]); - return err; - } - - /* create pipe to receive data from umh */ - err = create_pipe_files(from_umh, 0); - if (err) { - fput(to_umh[1]); - replace_fd(0, NULL, 0); - return err; - } - err = replace_fd(1, from_umh[1], 0); - fput(from_umh[1]); - if (err < 0) { - fput(to_umh[1]); - replace_fd(0, NULL, 0); - fput(from_umh[0]); - return err; - } - - umh_info->pipe_to_umh = to_umh[1]; - umh_info->pipe_from_umh = from_umh[0]; - umh_info->pid = task_pid_nr(current); - current->flags |= PF_UMH; - return 0; -} - -static void umd_cleanup(struct subprocess_info *info) -{ - struct umh_info *umh_info = info->data; - - /* cleanup if umh_setup() was successful but exec failed */ - if (info->retval) { - fput(umh_info->pipe_to_umh); - fput(umh_info->pipe_from_umh); - } -} - -/** - * fork_usermode_blob - fork a blob of bytes as a usermode process - * @data: a blob of bytes that can be do_execv-ed as a file - * @len: length of the blob - * @info: information about usermode process (shouldn't be NULL) - * - * If info->cmdline is set it will be used as command line for the - * user process, else "usermodehelper" is used. - * - * Returns either negative error or zero which indicates success - * in executing a blob of bytes as a usermode process. In such - * case 'struct umh_info *info' is populated with two pipes - * and a pid of the process. The caller is responsible for health - * check of the user process, killing it via pid, and closing the - * pipes when user process is no longer needed. - */ -int fork_usermode_blob(void *data, size_t len, struct umh_info *info) -{ - const char *cmdline = (info->cmdline) ? info->cmdline : "usermodehelper"; - struct subprocess_info *sub_info; - char **argv = NULL; - struct file *file; - ssize_t written; - loff_t pos = 0; - int err; - - file = shmem_kernel_file_setup("", len, 0); - if (IS_ERR(file)) - return PTR_ERR(file); - - written = kernel_write(file, data, len, &pos); - if (written != len) { - err = written; - if (err >= 0) - err = -ENOMEM; - goto out; - } - - err = -ENOMEM; - argv = argv_split(GFP_KERNEL, cmdline, NULL); - if (!argv) - goto out; - - sub_info = call_usermodehelper_setup("none", argv, NULL, GFP_KERNEL, - umd_setup, umd_cleanup, info); - if (!sub_info) - goto out; - - sub_info->file = file; - err = call_usermodehelper_exec(sub_info, UMH_WAIT_EXEC); - if (!err) { - mutex_lock(&umh_list_lock); - list_add(&info->list, &umh_list); - mutex_unlock(&umh_list_lock); - } -out: - if (argv) - argv_free(argv); - fput(file); - return err; -} -EXPORT_SYMBOL_GPL(fork_usermode_blob); - /** * call_usermodehelper_exec - start a usermode application * @sub_info: information about the subprocessa @@ -678,26 +559,6 @@ static int proc_cap_handler(struct ctl_table *table, int write, return 0; } -void __exit_umh(struct task_struct *tsk) -{ - struct umh_info *info; - pid_t pid = tsk->pid; - - mutex_lock(&umh_list_lock); - list_for_each_entry(info, &umh_list, list) { - if (info->pid == pid) { - list_del(&info->list); - mutex_unlock(&umh_list_lock); - goto out; - } - } - mutex_unlock(&umh_list_lock); - return; -out: - if (info->cleanup) - info->cleanup(info); -} - struct ctl_table usermodehelper_table[] = { { .procname = "bset", diff --git a/kernel/usermode_driver.c b/kernel/usermode_driver.c new file mode 100644 index 000000000000..5b05863af855 --- /dev/null +++ b/kernel/usermode_driver.c @@ -0,0 +1,146 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * umd - User mode driver support + */ +#include <linux/shmem_fs.h> +#include <linux/pipe_fs_i.h> +#include <linux/usermode_driver.h> + +static LIST_HEAD(umh_list); +static DEFINE_MUTEX(umh_list_lock); + +static int umd_setup(struct subprocess_info *info, struct cred *new) +{ + struct umh_info *umh_info = info->data; + struct file *from_umh[2]; + struct file *to_umh[2]; + int err; + + /* create pipe to send data to umh */ + err = create_pipe_files(to_umh, 0); + if (err) + return err; + err = replace_fd(0, to_umh[0], 0); + fput(to_umh[0]); + if (err < 0) { + fput(to_umh[1]); + return err; + } + + /* create pipe to receive data from umh */ + err = create_pipe_files(from_umh, 0); + if (err) { + fput(to_umh[1]); + replace_fd(0, NULL, 0); + return err; + } + err = replace_fd(1, from_umh[1], 0); + fput(from_umh[1]); + if (err < 0) { + fput(to_umh[1]); + replace_fd(0, NULL, 0); + fput(from_umh[0]); + return err; + } + + umh_info->pipe_to_umh = to_umh[1]; + umh_info->pipe_from_umh = from_umh[0]; + umh_info->pid = task_pid_nr(current); + current->flags |= PF_UMH; + return 0; +} + +static void umd_cleanup(struct subprocess_info *info) +{ + struct umh_info *umh_info = info->data; + + /* cleanup if umh_setup() was successful but exec failed */ + if (info->retval) { + fput(umh_info->pipe_to_umh); + fput(umh_info->pipe_from_umh); + } +} + +/** + * fork_usermode_blob - fork a blob of bytes as a usermode process + * @data: a blob of bytes that can be do_execv-ed as a file + * @len: length of the blob + * @info: information about usermode process (shouldn't be NULL) + * + * If info->cmdline is set it will be used as command line for the + * user process, else "usermodehelper" is used. + * + * Returns either negative error or zero which indicates success + * in executing a blob of bytes as a usermode process. In such + * case 'struct umh_info *info' is populated with two pipes + * and a pid of the process. The caller is responsible for health + * check of the user process, killing it via pid, and closing the + * pipes when user process is no longer needed. + */ +int fork_usermode_blob(void *data, size_t len, struct umh_info *info) +{ + const char *cmdline = (info->cmdline) ? info->cmdline : "usermodehelper"; + struct subprocess_info *sub_info; + char **argv = NULL; + struct file *file; + ssize_t written; + loff_t pos = 0; + int err; + + file = shmem_kernel_file_setup("", len, 0); + if (IS_ERR(file)) + return PTR_ERR(file); + + written = kernel_write(file, data, len, &pos); + if (written != len) { + err = written; + if (err >= 0) + err = -ENOMEM; + goto out; + } + + err = -ENOMEM; + argv = argv_split(GFP_KERNEL, cmdline, NULL); + if (!argv) + goto out; + + sub_info = call_usermodehelper_setup("none", argv, NULL, GFP_KERNEL, + umd_setup, umd_cleanup, info); + if (!sub_info) + goto out; + + sub_info->file = file; + err = call_usermodehelper_exec(sub_info, UMH_WAIT_EXEC); + if (!err) { + mutex_lock(&umh_list_lock); + list_add(&info->list, &umh_list); + mutex_unlock(&umh_list_lock); + } +out: + if (argv) + argv_free(argv); + fput(file); + return err; +} +EXPORT_SYMBOL_GPL(fork_usermode_blob); + +void __exit_umh(struct task_struct *tsk) +{ + struct umh_info *info; + pid_t pid = tsk->pid; + + mutex_lock(&umh_list_lock); + list_for_each_entry(info, &umh_list, list) { + if (info->pid == pid) { + list_del(&info->list); + mutex_unlock(&umh_list_lock); + goto out; + } + } + mutex_unlock(&umh_list_lock); + return; +out: + if (info->cleanup) + info->cleanup(info); +} + -- 2.25.0 ^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH v3 06/16] umd: For clarity rename umh_info umd_info 2020-07-02 16:40 ` [PATCH v3 00/16] " Eric W. Biederman ` (4 preceding siblings ...) 2020-07-02 16:41 ` [PATCH v3 05/16] umh: Separate the user mode driver and the user mode helper support Eric W. Biederman @ 2020-07-02 16:41 ` Eric W. Biederman 2020-07-02 16:41 ` [PATCH v3 07/16] umd: Rename umd_info.cmdline umd_info.driver_name Eric W. Biederman ` (11 subsequent siblings) 17 siblings, 0 replies; 72+ messages in thread From: Eric W. Biederman @ 2020-07-02 16:41 UTC (permalink / raw) To: linux-kernel Cc: David Miller, Greg Kroah-Hartman, Tetsuo Handa, Alexei Starovoitov, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds, Christian Brauner, Eric W. Biederman, Greg Kroah-Hartman This structure is only used for user mode drivers so change the prefix from umh to umd to make that clear. v1: https://lkml.kernel.org/r/87o8p6f0kw.fsf_-_@x220.int.ebiederm.org v2: https://lkml.kernel.org/r/878sg563po.fsf_-_@x220.int.ebiederm.org Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- include/linux/bpfilter.h | 2 +- include/linux/usermode_driver.h | 6 +++--- kernel/usermode_driver.c | 20 ++++++++++---------- net/ipv4/bpfilter/sockopt.c | 2 +- 4 files changed, 15 insertions(+), 15 deletions(-) diff --git a/include/linux/bpfilter.h b/include/linux/bpfilter.h index d6d6206052a6..ec9972d822e0 100644 --- a/include/linux/bpfilter.h +++ b/include/linux/bpfilter.h @@ -11,7 +11,7 @@ int bpfilter_ip_set_sockopt(struct sock *sk, int optname, char __user *optval, int bpfilter_ip_get_sockopt(struct sock *sk, int optname, char __user *optval, int __user *optlen); struct bpfilter_umh_ops { - struct umh_info info; + struct umd_info info; /* since ip_getsockopt() can run in parallel, serialize access to umh */ struct mutex lock; int (*sockopt)(struct sock *sk, int optname, diff --git a/include/linux/usermode_driver.h b/include/linux/usermode_driver.h index c5f6dc950227..7131ea611bab 100644 --- a/include/linux/usermode_driver.h +++ b/include/linux/usermode_driver.h @@ -17,14 +17,14 @@ static inline void exit_umh(struct task_struct *tsk) } #endif -struct umh_info { +struct umd_info { const char *cmdline; struct file *pipe_to_umh; struct file *pipe_from_umh; struct list_head list; - void (*cleanup)(struct umh_info *info); + void (*cleanup)(struct umd_info *info); pid_t pid; }; -int fork_usermode_blob(void *data, size_t len, struct umh_info *info); +int fork_usermode_blob(void *data, size_t len, struct umd_info *info); #endif /* __LINUX_USERMODE_DRIVER_H__ */ diff --git a/kernel/usermode_driver.c b/kernel/usermode_driver.c index 5b05863af855..e73550e946d6 100644 --- a/kernel/usermode_driver.c +++ b/kernel/usermode_driver.c @@ -11,7 +11,7 @@ static DEFINE_MUTEX(umh_list_lock); static int umd_setup(struct subprocess_info *info, struct cred *new) { - struct umh_info *umh_info = info->data; + struct umd_info *umd_info = info->data; struct file *from_umh[2]; struct file *to_umh[2]; int err; @@ -43,21 +43,21 @@ static int umd_setup(struct subprocess_info *info, struct cred *new) return err; } - umh_info->pipe_to_umh = to_umh[1]; - umh_info->pipe_from_umh = from_umh[0]; - umh_info->pid = task_pid_nr(current); + umd_info->pipe_to_umh = to_umh[1]; + umd_info->pipe_from_umh = from_umh[0]; + umd_info->pid = task_pid_nr(current); current->flags |= PF_UMH; return 0; } static void umd_cleanup(struct subprocess_info *info) { - struct umh_info *umh_info = info->data; + struct umd_info *umd_info = info->data; /* cleanup if umh_setup() was successful but exec failed */ if (info->retval) { - fput(umh_info->pipe_to_umh); - fput(umh_info->pipe_from_umh); + fput(umd_info->pipe_to_umh); + fput(umd_info->pipe_from_umh); } } @@ -72,12 +72,12 @@ static void umd_cleanup(struct subprocess_info *info) * * Returns either negative error or zero which indicates success * in executing a blob of bytes as a usermode process. In such - * case 'struct umh_info *info' is populated with two pipes + * case 'struct umd_info *info' is populated with two pipes * and a pid of the process. The caller is responsible for health * check of the user process, killing it via pid, and closing the * pipes when user process is no longer needed. */ -int fork_usermode_blob(void *data, size_t len, struct umh_info *info) +int fork_usermode_blob(void *data, size_t len, struct umd_info *info) { const char *cmdline = (info->cmdline) ? info->cmdline : "usermodehelper"; struct subprocess_info *sub_info; @@ -126,7 +126,7 @@ EXPORT_SYMBOL_GPL(fork_usermode_blob); void __exit_umh(struct task_struct *tsk) { - struct umh_info *info; + struct umd_info *info; pid_t pid = tsk->pid; mutex_lock(&umh_list_lock); diff --git a/net/ipv4/bpfilter/sockopt.c b/net/ipv4/bpfilter/sockopt.c index 0480918bfc7c..c0dbcc86fcdb 100644 --- a/net/ipv4/bpfilter/sockopt.c +++ b/net/ipv4/bpfilter/sockopt.c @@ -12,7 +12,7 @@ struct bpfilter_umh_ops bpfilter_ops; EXPORT_SYMBOL_GPL(bpfilter_ops); -static void bpfilter_umh_cleanup(struct umh_info *info) +static void bpfilter_umh_cleanup(struct umd_info *info) { mutex_lock(&bpfilter_ops.lock); bpfilter_ops.stop = true; -- 2.25.0 ^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH v3 07/16] umd: Rename umd_info.cmdline umd_info.driver_name 2020-07-02 16:40 ` [PATCH v3 00/16] " Eric W. Biederman ` (5 preceding siblings ...) 2020-07-02 16:41 ` [PATCH v3 06/16] umd: For clarity rename umh_info umd_info Eric W. Biederman @ 2020-07-02 16:41 ` Eric W. Biederman 2020-07-02 16:41 ` [PATCH v3 08/16] umd: Transform fork_usermode_blob into fork_usermode_driver Eric W. Biederman ` (10 subsequent siblings) 17 siblings, 0 replies; 72+ messages in thread From: Eric W. Biederman @ 2020-07-02 16:41 UTC (permalink / raw) To: linux-kernel Cc: David Miller, Greg Kroah-Hartman, Tetsuo Handa, Alexei Starovoitov, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds, Christian Brauner, Eric W. Biederman, Greg Kroah-Hartman The only thing supplied in the cmdline today is the driver name so rename the field to clarify the code. As this value is always supplied stop trying to handle the case of a NULL cmdline. Additionally since we now have a name we can count on use the driver_name any place where the code is looking for a name of the binary. v1: https://lkml.kernel.org/r/87imfef0k3.fsf_-_@x220.int.ebiederm.org v2: https://lkml.kernel.org/r/87366d63os.fsf_-_@x220.int.ebiederm.org Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- include/linux/usermode_driver.h | 2 +- kernel/usermode_driver.c | 11 ++++------- net/ipv4/bpfilter/sockopt.c | 2 +- 3 files changed, 6 insertions(+), 9 deletions(-) diff --git a/include/linux/usermode_driver.h b/include/linux/usermode_driver.h index 7131ea611bab..48cf25e3145d 100644 --- a/include/linux/usermode_driver.h +++ b/include/linux/usermode_driver.h @@ -18,7 +18,7 @@ static inline void exit_umh(struct task_struct *tsk) #endif struct umd_info { - const char *cmdline; + const char *driver_name; struct file *pipe_to_umh; struct file *pipe_from_umh; struct list_head list; diff --git a/kernel/usermode_driver.c b/kernel/usermode_driver.c index e73550e946d6..46d60d855e93 100644 --- a/kernel/usermode_driver.c +++ b/kernel/usermode_driver.c @@ -67,9 +67,6 @@ static void umd_cleanup(struct subprocess_info *info) * @len: length of the blob * @info: information about usermode process (shouldn't be NULL) * - * If info->cmdline is set it will be used as command line for the - * user process, else "usermodehelper" is used. - * * Returns either negative error or zero which indicates success * in executing a blob of bytes as a usermode process. In such * case 'struct umd_info *info' is populated with two pipes @@ -79,7 +76,6 @@ static void umd_cleanup(struct subprocess_info *info) */ int fork_usermode_blob(void *data, size_t len, struct umd_info *info) { - const char *cmdline = (info->cmdline) ? info->cmdline : "usermodehelper"; struct subprocess_info *sub_info; char **argv = NULL; struct file *file; @@ -87,7 +83,7 @@ int fork_usermode_blob(void *data, size_t len, struct umd_info *info) loff_t pos = 0; int err; - file = shmem_kernel_file_setup("", len, 0); + file = shmem_kernel_file_setup(info->driver_name, len, 0); if (IS_ERR(file)) return PTR_ERR(file); @@ -100,11 +96,12 @@ int fork_usermode_blob(void *data, size_t len, struct umd_info *info) } err = -ENOMEM; - argv = argv_split(GFP_KERNEL, cmdline, NULL); + argv = argv_split(GFP_KERNEL, info->driver_name, NULL); if (!argv) goto out; - sub_info = call_usermodehelper_setup("none", argv, NULL, GFP_KERNEL, + sub_info = call_usermodehelper_setup(info->driver_name, argv, NULL, + GFP_KERNEL, umd_setup, umd_cleanup, info); if (!sub_info) goto out; diff --git a/net/ipv4/bpfilter/sockopt.c b/net/ipv4/bpfilter/sockopt.c index c0dbcc86fcdb..5050de28333d 100644 --- a/net/ipv4/bpfilter/sockopt.c +++ b/net/ipv4/bpfilter/sockopt.c @@ -70,7 +70,7 @@ static int __init bpfilter_sockopt_init(void) { mutex_init(&bpfilter_ops.lock); bpfilter_ops.stop = true; - bpfilter_ops.info.cmdline = "bpfilter_umh"; + bpfilter_ops.info.driver_name = "bpfilter_umh"; bpfilter_ops.info.cleanup = &bpfilter_umh_cleanup; return 0; -- 2.25.0 ^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH v3 08/16] umd: Transform fork_usermode_blob into fork_usermode_driver 2020-07-02 16:40 ` [PATCH v3 00/16] " Eric W. Biederman ` (6 preceding siblings ...) 2020-07-02 16:41 ` [PATCH v3 07/16] umd: Rename umd_info.cmdline umd_info.driver_name Eric W. Biederman @ 2020-07-02 16:41 ` Eric W. Biederman 2020-07-02 16:41 ` [PATCH v3 09/16] umh: Stop calling do_execve_file Eric W. Biederman ` (9 subsequent siblings) 17 siblings, 0 replies; 72+ messages in thread From: Eric W. Biederman @ 2020-07-02 16:41 UTC (permalink / raw) To: linux-kernel Cc: David Miller, Greg Kroah-Hartman, Tetsuo Handa, Alexei Starovoitov, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds, Christian Brauner, Eric W. Biederman, Greg Kroah-Hartman Instead of loading a binary blob into a temporary file with shmem_kernel_file_setup load a binary blob into a temporary tmpfs filesystem. This means that the blob can be stored in an init section and discared, and it means the binary blob will have a filename so can be executed normally. The only tricky thing about this code is that in the helper function blob_to_mnt __fput_sync is used. That is because a file can not be executed if it is still open for write, and the ordinary delayed close for kernel threads does not happen soon enough, which causes the following exec to fail. The function umd_load_blob is not called with any locks so this should be safe. Executing the blob normally winds up correcting several problems with the user mode driver code discovered by Tetsuo Handa[1]. By passing an ordinary filename into the exec, it is no longer necessary to figure out how to turn a O_RDWR file descriptor into a properly referende counted O_EXEC file descriptor that forbids all writes. For path based LSMs there are no new special cases. [1] https://lore.kernel.org/linux-fsdevel/2a8775b4-1dd5-9d5c-aa42-9872445e0942@i-love.sakura.ne.jp/ v1: https://lkml.kernel.org/r/87d05mf0j9.fsf_-_@x220.int.ebiederm.org v2: https://lkml.kernel.org/r/87wo3p4p35.fsf_-_@x220.int.ebiederm.org Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- include/linux/usermode_driver.h | 6 +- kernel/usermode_driver.c | 126 ++++++++++++++++++++++++-------- net/bpfilter/bpfilter_kern.c | 14 +++- 3 files changed, 113 insertions(+), 33 deletions(-) diff --git a/include/linux/usermode_driver.h b/include/linux/usermode_driver.h index 48cf25e3145d..97c919b7147c 100644 --- a/include/linux/usermode_driver.h +++ b/include/linux/usermode_driver.h @@ -2,6 +2,7 @@ #define __LINUX_USERMODE_DRIVER_H__ #include <linux/umh.h> +#include <linux/path.h> #ifdef CONFIG_BPFILTER void __exit_umh(struct task_struct *tsk); @@ -23,8 +24,11 @@ struct umd_info { struct file *pipe_from_umh; struct list_head list; void (*cleanup)(struct umd_info *info); + struct path wd; pid_t pid; }; -int fork_usermode_blob(void *data, size_t len, struct umd_info *info); +int umd_load_blob(struct umd_info *info, const void *data, size_t len); +int umd_unload_blob(struct umd_info *info); +int fork_usermode_driver(struct umd_info *info); #endif /* __LINUX_USERMODE_DRIVER_H__ */ diff --git a/kernel/usermode_driver.c b/kernel/usermode_driver.c index 46d60d855e93..a86798759f83 100644 --- a/kernel/usermode_driver.c +++ b/kernel/usermode_driver.c @@ -4,11 +4,98 @@ */ #include <linux/shmem_fs.h> #include <linux/pipe_fs_i.h> +#include <linux/mount.h> +#include <linux/fs_struct.h> +#include <linux/task_work.h> #include <linux/usermode_driver.h> static LIST_HEAD(umh_list); static DEFINE_MUTEX(umh_list_lock); +static struct vfsmount *blob_to_mnt(const void *data, size_t len, const char *name) +{ + struct file_system_type *type; + struct vfsmount *mnt; + struct file *file; + ssize_t written; + loff_t pos = 0; + + type = get_fs_type("tmpfs"); + if (!type) + return ERR_PTR(-ENODEV); + + mnt = kern_mount(type); + put_filesystem(type); + if (IS_ERR(mnt)) + return mnt; + + file = file_open_root(mnt->mnt_root, mnt, name, O_CREAT | O_WRONLY, 0700); + if (IS_ERR(file)) { + mntput(mnt); + return ERR_CAST(file); + } + + written = kernel_write(file, data, len, &pos); + if (written != len) { + int err = written; + if (err >= 0) + err = -ENOMEM; + filp_close(file, NULL); + mntput(mnt); + return ERR_PTR(err); + } + + fput(file); + + /* Flush delayed fput so exec can open the file read-only */ + flush_delayed_fput(); + task_work_run(); + return mnt; +} + +/** + * umd_load_blob - Remember a blob of bytes for fork_usermode_driver + * @info: information about usermode driver + * @data: a blob of bytes that can be executed as a file + * @len: The lentgh of the blob + * + */ +int umd_load_blob(struct umd_info *info, const void *data, size_t len) +{ + struct vfsmount *mnt; + + if (WARN_ON_ONCE(info->wd.dentry || info->wd.mnt)) + return -EBUSY; + + mnt = blob_to_mnt(data, len, info->driver_name); + if (IS_ERR(mnt)) + return PTR_ERR(mnt); + + info->wd.mnt = mnt; + info->wd.dentry = mnt->mnt_root; + return 0; +} +EXPORT_SYMBOL_GPL(umd_load_blob); + +/** + * umd_unload_blob - Disassociate @info from a previously loaded blob + * @info: information about usermode driver + * + */ +int umd_unload_blob(struct umd_info *info) +{ + if (WARN_ON_ONCE(!info->wd.mnt || + !info->wd.dentry || + info->wd.mnt->mnt_root != info->wd.dentry)) + return -EINVAL; + + kern_unmount(info->wd.mnt); + info->wd.mnt = NULL; + info->wd.dentry = NULL; + return 0; +} +EXPORT_SYMBOL_GPL(umd_unload_blob); + static int umd_setup(struct subprocess_info *info, struct cred *new) { struct umd_info *umd_info = info->data; @@ -43,6 +130,7 @@ static int umd_setup(struct subprocess_info *info, struct cred *new) return err; } + set_fs_pwd(current->fs, &umd_info->wd); umd_info->pipe_to_umh = to_umh[1]; umd_info->pipe_from_umh = from_umh[0]; umd_info->pid = task_pid_nr(current); @@ -62,39 +150,21 @@ static void umd_cleanup(struct subprocess_info *info) } /** - * fork_usermode_blob - fork a blob of bytes as a usermode process - * @data: a blob of bytes that can be do_execv-ed as a file - * @len: length of the blob - * @info: information about usermode process (shouldn't be NULL) + * fork_usermode_driver - fork a usermode driver + * @info: information about usermode driver (shouldn't be NULL) * - * Returns either negative error or zero which indicates success - * in executing a blob of bytes as a usermode process. In such - * case 'struct umd_info *info' is populated with two pipes - * and a pid of the process. The caller is responsible for health - * check of the user process, killing it via pid, and closing the - * pipes when user process is no longer needed. + * Returns either negative error or zero which indicates success in + * executing a usermode driver. In such case 'struct umd_info *info' + * is populated with two pipes and a pid of the process. The caller is + * responsible for health check of the user process, killing it via + * pid, and closing the pipes when user process is no longer needed. */ -int fork_usermode_blob(void *data, size_t len, struct umd_info *info) +int fork_usermode_driver(struct umd_info *info) { struct subprocess_info *sub_info; char **argv = NULL; - struct file *file; - ssize_t written; - loff_t pos = 0; int err; - file = shmem_kernel_file_setup(info->driver_name, len, 0); - if (IS_ERR(file)) - return PTR_ERR(file); - - written = kernel_write(file, data, len, &pos); - if (written != len) { - err = written; - if (err >= 0) - err = -ENOMEM; - goto out; - } - err = -ENOMEM; argv = argv_split(GFP_KERNEL, info->driver_name, NULL); if (!argv) @@ -106,7 +176,6 @@ int fork_usermode_blob(void *data, size_t len, struct umd_info *info) if (!sub_info) goto out; - sub_info->file = file; err = call_usermodehelper_exec(sub_info, UMH_WAIT_EXEC); if (!err) { mutex_lock(&umh_list_lock); @@ -116,10 +185,9 @@ int fork_usermode_blob(void *data, size_t len, struct umd_info *info) out: if (argv) argv_free(argv); - fput(file); return err; } -EXPORT_SYMBOL_GPL(fork_usermode_blob); +EXPORT_SYMBOL_GPL(fork_usermode_driver); void __exit_umh(struct task_struct *tsk) { diff --git a/net/bpfilter/bpfilter_kern.c b/net/bpfilter/bpfilter_kern.c index c0f0990f30b6..28883b00609d 100644 --- a/net/bpfilter/bpfilter_kern.c +++ b/net/bpfilter/bpfilter_kern.c @@ -77,9 +77,7 @@ static int start_umh(void) int err; /* fork usermode process */ - err = fork_usermode_blob(&bpfilter_umh_start, - &bpfilter_umh_end - &bpfilter_umh_start, - &bpfilter_ops.info); + err = fork_usermode_driver(&bpfilter_ops.info); if (err) return err; bpfilter_ops.stop = false; @@ -98,6 +96,12 @@ static int __init load_umh(void) { int err; + err = umd_load_blob(&bpfilter_ops.info, + &bpfilter_umh_start, + &bpfilter_umh_end - &bpfilter_umh_start); + if (err) + return err; + mutex_lock(&bpfilter_ops.lock); if (!bpfilter_ops.stop) { err = -EFAULT; @@ -110,6 +114,8 @@ static int __init load_umh(void) } out: mutex_unlock(&bpfilter_ops.lock); + if (err) + umd_unload_blob(&bpfilter_ops.info); return err; } @@ -122,6 +128,8 @@ static void __exit fini_umh(void) bpfilter_ops.sockopt = NULL; } mutex_unlock(&bpfilter_ops.lock); + + umd_unload_blob(&bpfilter_ops.info); } module_init(load_umh); module_exit(fini_umh); -- 2.25.0 ^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH v3 09/16] umh: Stop calling do_execve_file 2020-07-02 16:40 ` [PATCH v3 00/16] " Eric W. Biederman ` (7 preceding siblings ...) 2020-07-02 16:41 ` [PATCH v3 08/16] umd: Transform fork_usermode_blob into fork_usermode_driver Eric W. Biederman @ 2020-07-02 16:41 ` Eric W. Biederman 2020-07-02 16:41 ` [PATCH v3 10/16] exec: Remove do_execve_file Eric W. Biederman ` (8 subsequent siblings) 17 siblings, 0 replies; 72+ messages in thread From: Eric W. Biederman @ 2020-07-02 16:41 UTC (permalink / raw) To: linux-kernel Cc: David Miller, Greg Kroah-Hartman, Tetsuo Handa, Alexei Starovoitov, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds, Christian Brauner, Eric W. Biederman, Greg Kroah-Hartman With the user mode driver code changed to not set subprocess_info.file there are no more users of subproces_info.file. Remove this field from struct subprocess_info and remove the only user in call_usermodehelper_exec_async that would call do_execve_file instead of do_execve if file was set. v1: https://lkml.kernel.org/r/877dvuf0i7.fsf_-_@x220.int.ebiederm.org v2: https://lkml.kernel.org/r/87r1tx4p2a.fsf_-_@x220.int.ebiederm.org Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- include/linux/umh.h | 1 - kernel/umh.c | 10 +++------- 2 files changed, 3 insertions(+), 8 deletions(-) diff --git a/include/linux/umh.h b/include/linux/umh.h index 73173c4a07e5..244aff638220 100644 --- a/include/linux/umh.h +++ b/include/linux/umh.h @@ -22,7 +22,6 @@ struct subprocess_info { const char *path; char **argv; char **envp; - struct file *file; int wait; int retval; int (*init)(struct subprocess_info *info, struct cred *new); diff --git a/kernel/umh.c b/kernel/umh.c index 3e4e453d45c8..6ca2096298b9 100644 --- a/kernel/umh.c +++ b/kernel/umh.c @@ -98,13 +98,9 @@ static int call_usermodehelper_exec_async(void *data) commit_creds(new); - if (sub_info->file) - retval = do_execve_file(sub_info->file, - sub_info->argv, sub_info->envp); - else - retval = do_execve(getname_kernel(sub_info->path), - (const char __user *const __user *)sub_info->argv, - (const char __user *const __user *)sub_info->envp); + retval = do_execve(getname_kernel(sub_info->path), + (const char __user *const __user *)sub_info->argv, + (const char __user *const __user *)sub_info->envp); out: sub_info->retval = retval; /* -- 2.25.0 ^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH v3 10/16] exec: Remove do_execve_file 2020-07-02 16:40 ` [PATCH v3 00/16] " Eric W. Biederman ` (8 preceding siblings ...) 2020-07-02 16:41 ` [PATCH v3 09/16] umh: Stop calling do_execve_file Eric W. Biederman @ 2020-07-02 16:41 ` Eric W. Biederman 2020-07-08 6:35 ` Luis Chamberlain 2020-07-12 21:02 ` Pavel Machek 2020-07-02 16:41 ` [PATCH v3 11/16] bpfilter: Move bpfilter_umh back into init data Eric W. Biederman ` (7 subsequent siblings) 17 siblings, 2 replies; 72+ messages in thread From: Eric W. Biederman @ 2020-07-02 16:41 UTC (permalink / raw) To: linux-kernel Cc: David Miller, Greg Kroah-Hartman, Tetsuo Handa, Alexei Starovoitov, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds, Christian Brauner, Eric W. Biederman, Tetsuo Handa, Greg Kroah-Hartman Now that the last callser has been removed remove this code from exec. For anyone thinking of resurrecing do_execve_file please note that the code was buggy in several fundamental ways. - It did not ensure the file it was passed was read-only and that deny_write_access had been called on it. Which subtlely breaks invaniants in exec. - The caller of do_execve_file was expected to hold and put a reference to the file, but an extra reference for use by exec was not taken so that when exec put it's reference to the file an underflow occured on the file reference count. - The point of the interface was so that a pathname did not need to exist. Which breaks pathname based LSMs. Tetsuo Handa originally reported these issues[1]. While it was clear that deny_write_access was missing the fundamental incompatibility with the passed in O_RDWR filehandle was not immediately recognized. All of these issues were fixed by modifying the usermode driver code to have a path, so it did not need this hack. Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> [1] https://lore.kernel.org/linux-fsdevel/2a8775b4-1dd5-9d5c-aa42-9872445e0942@i-love.sakura.ne.jp/ v1: https://lkml.kernel.org/r/871rm2f0hi.fsf_-_@x220.int.ebiederm.org v2: https://lkml.kernel.org/r/87lfk54p0m.fsf_-_@x220.int.ebiederm.org Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- fs/exec.c | 38 +++++++++----------------------------- include/linux/binfmts.h | 1 - 2 files changed, 9 insertions(+), 30 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index e6e8a9a70327..23dfbb820626 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1818,13 +1818,14 @@ static int exec_binprm(struct linux_binprm *bprm) /* * sys_execve() executes a new program. */ -static int __do_execve_file(int fd, struct filename *filename, - struct user_arg_ptr argv, - struct user_arg_ptr envp, - int flags, struct file *file) +static int do_execveat_common(int fd, struct filename *filename, + struct user_arg_ptr argv, + struct user_arg_ptr envp, + int flags) { char *pathbuf = NULL; struct linux_binprm *bprm; + struct file *file; struct files_struct *displaced; int retval; @@ -1863,8 +1864,7 @@ static int __do_execve_file(int fd, struct filename *filename, check_unsafe_exec(bprm); current->in_execve = 1; - if (!file) - file = do_open_execat(fd, filename, flags); + file = do_open_execat(fd, filename, flags); retval = PTR_ERR(file); if (IS_ERR(file)) goto out_unmark; @@ -1872,9 +1872,7 @@ static int __do_execve_file(int fd, struct filename *filename, sched_exec(); bprm->file = file; - if (!filename) { - bprm->filename = "none"; - } else if (fd == AT_FDCWD || filename->name[0] == '/') { + if (fd == AT_FDCWD || filename->name[0] == '/') { bprm->filename = filename->name; } else { if (filename->name[0] == '\0') @@ -1935,8 +1933,7 @@ static int __do_execve_file(int fd, struct filename *filename, task_numa_free(current, false); free_bprm(bprm); kfree(pathbuf); - if (filename) - putname(filename); + putname(filename); if (displaced) put_files_struct(displaced); return retval; @@ -1967,27 +1964,10 @@ static int __do_execve_file(int fd, struct filename *filename, if (displaced) reset_files_struct(displaced); out_ret: - if (filename) - putname(filename); + putname(filename); return retval; } -static int do_execveat_common(int fd, struct filename *filename, - struct user_arg_ptr argv, - struct user_arg_ptr envp, - int flags) -{ - return __do_execve_file(fd, filename, argv, envp, flags, NULL); -} - -int do_execve_file(struct file *file, void *__argv, void *__envp) -{ - struct user_arg_ptr argv = { .ptr.native = __argv }; - struct user_arg_ptr envp = { .ptr.native = __envp }; - - return __do_execve_file(AT_FDCWD, NULL, argv, envp, 0, file); -} - int do_execve(struct filename *filename, const char __user *const __user *__argv, const char __user *const __user *__envp) diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h index 4a20b7517dd0..7c27d7b57871 100644 --- a/include/linux/binfmts.h +++ b/include/linux/binfmts.h @@ -141,6 +141,5 @@ extern int do_execveat(int, struct filename *, const char __user * const __user *, const char __user * const __user *, int); -int do_execve_file(struct file *file, void *__argv, void *__envp); #endif /* _LINUX_BINFMTS_H */ -- 2.25.0 ^ permalink raw reply related [flat|nested] 72+ messages in thread
* Re: [PATCH v3 10/16] exec: Remove do_execve_file 2020-07-02 16:41 ` [PATCH v3 10/16] exec: Remove do_execve_file Eric W. Biederman @ 2020-07-08 6:35 ` Luis Chamberlain 2020-07-08 12:41 ` Luis Chamberlain 2020-07-12 21:02 ` Pavel Machek 1 sibling, 1 reply; 72+ messages in thread From: Luis Chamberlain @ 2020-07-08 6:35 UTC (permalink / raw) To: Eric W. Biederman Cc: linux-kernel, David Miller, Greg Kroah-Hartman, Tetsuo Handa, Alexei Starovoitov, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Linus Torvalds, Christian Brauner, Greg Kroah-Hartman On Thu, Jul 02, 2020 at 11:41:34AM -0500, Eric W. Biederman wrote: > Now that the last callser has been removed remove this code from exec. > > For anyone thinking of resurrecing do_execve_file please note that > the code was buggy in several fundamental ways. > > - It did not ensure the file it was passed was read-only and that > deny_write_access had been called on it. Which subtlely breaks > invaniants in exec. > > - The caller of do_execve_file was expected to hold and put a > reference to the file, but an extra reference for use by exec was > not taken so that when exec put it's reference to the file an > underflow occured on the file reference count. Maybe its my growing love with testing, but I'm going to have to partly blame here that we added a new API without any respective testing. Granted, I recall this this patch set could have used more wider review and a bit more patience... but just mentioning this so we try to avoid new api-without-testing with more reason in the future. But more importantly, *how* could we have caught this? Or how can we catch this sort of stuff better in the future? > - The point of the interface was so that a pathname did not need to > exist. Which breaks pathname based LSMs. Perhaps so but this fails to do justice of the LSM consideration done for the patch which added this during patch review [0], and I particularly recall I called out LSM folks to bring their ray guns out at this patch. It didn't get much attention. Let me recap a few points I think your commit log should somehow consider. You do as you please. Users of shmem_kernel_file_setup() spawned out of the desire to *avoid* LSMs since it didn't make sense in their case as their inodes are never exposed to userspace. Such is the case for ipc/shm.c and security/keys/big_key.c. Refer to commit c7277090927a5 ("security: shmem: implement kernel private shmem inodes") and then commit e1832f2923ec9 ("ipc: use private shmem or hugetlbfs inodes for shm segments"). And the umh module approach was doing: a) mapping data already extracted by the kernel somehow from a file somehow, presumably from /lib/modules/ path somewhere, but again this is not visible to umc.c, as it just gets called with: fork_usermode_blob(void *data, size_t len, struct umh_info *info) b) Creating the respective tmpfs file with shmem_kernel_file_setup() c) Populating the file created and stuffing it with our data passed d) Calling do_execve_file() on it. So, although I was hoping LSM folks would chime in for things I may have missed during my patch review, my recollection from the patch thread was that this becuase of a) it in theory could skip out on dealing with LSMs. [0] https://lkml.kernel.org/r/20180509022526.hertzfpvy7apz6ny@ast-mbp Luis ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH v3 10/16] exec: Remove do_execve_file 2020-07-08 6:35 ` Luis Chamberlain @ 2020-07-08 12:41 ` Luis Chamberlain 2020-07-08 13:08 ` Eric W. Biederman 0 siblings, 1 reply; 72+ messages in thread From: Luis Chamberlain @ 2020-07-08 12:41 UTC (permalink / raw) To: Eric W. Biederman, Alexei Starovoitov Cc: linux-kernel, David Miller, Greg Kroah-Hartman, Tetsuo Handa, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Linus Torvalds, Christian Brauner, Greg Kroah-Hartman On Wed, Jul 08, 2020 at 06:35:25AM +0000, Luis Chamberlain wrote: > On Thu, Jul 02, 2020 at 11:41:34AM -0500, Eric W. Biederman wrote: > > Now that the last callser has been removed remove this code from exec. > > > > For anyone thinking of resurrecing do_execve_file please note that > > the code was buggy in several fundamental ways. > > > > - It did not ensure the file it was passed was read-only and that > > deny_write_access had been called on it. Which subtlely breaks > > invaniants in exec. > > > > - The caller of do_execve_file was expected to hold and put a > > reference to the file, but an extra reference for use by exec was > > not taken so that when exec put it's reference to the file an > > underflow occured on the file reference count. > > Maybe its my growing love with testing, but I'm going to have to partly > blame here that we added a new API without any respective testing. > Granted, I recall this this patch set could have used more wider review > and a bit more patience... but just mentioning this so we try to avoid > new api-without-testing with more reason in the future. > > But more importantly, *how* could we have caught this? Or how can we > catch this sort of stuff better in the future? Of all the issues you pointed out with do_execve_file(), since upon review the assumption *by design* was that LSMs/etc would pick up issues with the file *prior* to processing, I think that this file reference count issue comes to my attention as the more serious issue which I wish we could address *first* before this crusade. So I have to ask, has anyone *really tried* to give a crack at fixing this refcount issue in a smaller way first? Alexei? I'm not opposed to the removal of do_execve_file(), however if there is a reproducible crash / issue with the existing user, this sledge hammer seems a bit overkill for older kernels. Luis ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH v3 10/16] exec: Remove do_execve_file 2020-07-08 12:41 ` Luis Chamberlain @ 2020-07-08 13:08 ` Eric W. Biederman 2020-07-08 13:32 ` Luis Chamberlain 0 siblings, 1 reply; 72+ messages in thread From: Eric W. Biederman @ 2020-07-08 13:08 UTC (permalink / raw) To: Luis Chamberlain Cc: Alexei Starovoitov, linux-kernel, David Miller, Greg Kroah-Hartman, Tetsuo Handa, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Linus Torvalds, Christian Brauner, Greg Kroah-Hartman Luis Chamberlain <mcgrof@kernel.org> writes: > On Wed, Jul 08, 2020 at 06:35:25AM +0000, Luis Chamberlain wrote: >> On Thu, Jul 02, 2020 at 11:41:34AM -0500, Eric W. Biederman wrote: >> > Now that the last callser has been removed remove this code from exec. >> > >> > For anyone thinking of resurrecing do_execve_file please note that >> > the code was buggy in several fundamental ways. >> > >> > - It did not ensure the file it was passed was read-only and that >> > deny_write_access had been called on it. Which subtlely breaks >> > invaniants in exec. >> > >> > - The caller of do_execve_file was expected to hold and put a >> > reference to the file, but an extra reference for use by exec was >> > not taken so that when exec put it's reference to the file an >> > underflow occured on the file reference count. >> >> Maybe its my growing love with testing, but I'm going to have to partly >> blame here that we added a new API without any respective testing. >> Granted, I recall this this patch set could have used more wider review >> and a bit more patience... but just mentioning this so we try to avoid >> new api-without-testing with more reason in the future. >> >> But more importantly, *how* could we have caught this? Or how can we >> catch this sort of stuff better in the future? > > Of all the issues you pointed out with do_execve_file(), since upon > review the assumption *by design* was that LSMs/etc would pick up issues > with the file *prior* to processing, I think that this file reference > count issue comes to my attention as the more serious issue which I > wish we could address *first* before this crusade. > > So I have to ask, has anyone *really tried* to give a crack at fixing > this refcount issue in a smaller way first? Alexei? > > I'm not opposed to the removal of do_execve_file(), however if there > is a reproducible crash / issue with the existing user, this sledge > hammer seems a bit overkill for older kernels. It does not matter for older kernels because there is exactly one user. That one user is just a place holder keeping the code alive until a real user comes along. For older kernels the solution is to just mark the bpfilter code broken in Kconfig and refuse to compile it. That is the trivial backportable fix if anyone wants one. Eric ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH v3 10/16] exec: Remove do_execve_file 2020-07-08 13:08 ` Eric W. Biederman @ 2020-07-08 13:32 ` Luis Chamberlain 0 siblings, 0 replies; 72+ messages in thread From: Luis Chamberlain @ 2020-07-08 13:32 UTC (permalink / raw) To: Eric W. Biederman Cc: Alexei Starovoitov, linux-kernel, David Miller, Greg Kroah-Hartman, Tetsuo Handa, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Linus Torvalds, Christian Brauner, Greg Kroah-Hartman On Wed, Jul 08, 2020 at 08:08:09AM -0500, Eric W. Biederman wrote: > Luis Chamberlain <mcgrof@kernel.org> writes: > > > On Wed, Jul 08, 2020 at 06:35:25AM +0000, Luis Chamberlain wrote: > >> On Thu, Jul 02, 2020 at 11:41:34AM -0500, Eric W. Biederman wrote: > >> > Now that the last callser has been removed remove this code from exec. > >> > > >> > For anyone thinking of resurrecing do_execve_file please note that > >> > the code was buggy in several fundamental ways. > >> > > >> > - It did not ensure the file it was passed was read-only and that > >> > deny_write_access had been called on it. Which subtlely breaks > >> > invaniants in exec. > >> > > >> > - The caller of do_execve_file was expected to hold and put a > >> > reference to the file, but an extra reference for use by exec was > >> > not taken so that when exec put it's reference to the file an > >> > underflow occured on the file reference count. > >> > >> Maybe its my growing love with testing, but I'm going to have to partly > >> blame here that we added a new API without any respective testing. > >> Granted, I recall this this patch set could have used more wider review > >> and a bit more patience... but just mentioning this so we try to avoid > >> new api-without-testing with more reason in the future. > >> > >> But more importantly, *how* could we have caught this? Or how can we > >> catch this sort of stuff better in the future? > > > > Of all the issues you pointed out with do_execve_file(), since upon > > review the assumption *by design* was that LSMs/etc would pick up issues > > with the file *prior* to processing, I think that this file reference > > count issue comes to my attention as the more serious issue which I > > wish we could address *first* before this crusade. > > > > So I have to ask, has anyone *really tried* to give a crack at fixing > > this refcount issue in a smaller way first? Alexei? > > > > I'm not opposed to the removal of do_execve_file(), however if there > > is a reproducible crash / issue with the existing user, this sledge > > hammer seems a bit overkill for older kernels. > > It does not matter for older kernels because there is exactly one user. > That one user is just a place holder keeping the code alive until a real > user comes along. > > For older kernels the solution is to just mark the bpfilter code broken > in Kconfig and refuse to compile it. That is the trivial backportable > fix if anyone wants one. This seals the deal for me, thanks! Carry on, but hey, please add yourself to MAINTAINERS too :) Luis ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH v3 10/16] exec: Remove do_execve_file 2020-07-02 16:41 ` [PATCH v3 10/16] exec: Remove do_execve_file Eric W. Biederman 2020-07-08 6:35 ` Luis Chamberlain @ 2020-07-12 21:02 ` Pavel Machek 1 sibling, 0 replies; 72+ messages in thread From: Pavel Machek @ 2020-07-12 21:02 UTC (permalink / raw) To: Eric W. Biederman Cc: linux-kernel, David Miller, Greg Kroah-Hartman, Tetsuo Handa, Alexei Starovoitov, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds, Christian Brauner, Greg Kroah-Hartman On Thu 2020-07-02 11:41:34, Eric W. Biederman wrote: > Now that the last callser has been removed remove this code from exec. Typo "caller". > For anyone thinking of resurrecing do_execve_file please note that resurrecting? > the code was buggy in several fundamental ways. > > - It did not ensure the file it was passed was read-only and that > deny_write_access had been called on it. Which subtlely breaks > invaniants in exec. subtly, invariants? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 72+ messages in thread
* [PATCH v3 11/16] bpfilter: Move bpfilter_umh back into init data 2020-07-02 16:40 ` [PATCH v3 00/16] " Eric W. Biederman ` (9 preceding siblings ...) 2020-07-02 16:41 ` [PATCH v3 10/16] exec: Remove do_execve_file Eric W. Biederman @ 2020-07-02 16:41 ` Eric W. Biederman 2020-07-02 16:41 ` [PATCH v3 12/16] umd: Track user space drivers with struct pid Eric W. Biederman ` (6 subsequent siblings) 17 siblings, 0 replies; 72+ messages in thread From: Eric W. Biederman @ 2020-07-02 16:41 UTC (permalink / raw) To: linux-kernel Cc: David Miller, Greg Kroah-Hartman, Tetsuo Handa, Alexei Starovoitov, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds, Christian Brauner, Eric W. Biederman, Greg Kroah-Hartman To allow for restarts 61fbf5933d42 ("net: bpfilter: restart bpfilter_umh when error occurred") moved the blob holding the userspace binary out of the init sections. Now that loading the blob into a filesystem is separate from executing the blob the blob no longer needs to live .rodata to allow for restarting. So move the blob back to .init.rodata. v1: https://lkml.kernel.org/r/87sgeidlvq.fsf_-_@x220.int.ebiederm.org v2: https://lkml.kernel.org/r/87ftad4ozc.fsf_-_@x220.int.ebiederm.org Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- net/bpfilter/bpfilter_umh_blob.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/bpfilter/bpfilter_umh_blob.S b/net/bpfilter/bpfilter_umh_blob.S index 9ea6100dca87..40311d10d2f2 100644 --- a/net/bpfilter/bpfilter_umh_blob.S +++ b/net/bpfilter/bpfilter_umh_blob.S @@ -1,5 +1,5 @@ /* SPDX-License-Identifier: GPL-2.0 */ - .section .rodata, "a" + .section .init.rodata, "a" .global bpfilter_umh_start bpfilter_umh_start: .incbin "net/bpfilter/bpfilter_umh" -- 2.25.0 ^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH v3 12/16] umd: Track user space drivers with struct pid 2020-07-02 16:40 ` [PATCH v3 00/16] " Eric W. Biederman ` (10 preceding siblings ...) 2020-07-02 16:41 ` [PATCH v3 11/16] bpfilter: Move bpfilter_umh back into init data Eric W. Biederman @ 2020-07-02 16:41 ` Eric W. Biederman 2020-07-02 16:41 ` [PATCH v3 13/16] exit: Factor thread_group_exited out of pidfd_poll Eric W. Biederman ` (5 subsequent siblings) 17 siblings, 0 replies; 72+ messages in thread From: Eric W. Biederman @ 2020-07-02 16:41 UTC (permalink / raw) To: linux-kernel Cc: David Miller, Greg Kroah-Hartman, Tetsuo Handa, Alexei Starovoitov, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds, Christian Brauner, Eric W. Biederman, Greg Kroah-Hartman Use struct pid instead of user space pid values that are prone to wrap araound. In addition track the entire thread group instead of just the first thread that is started by exec. There are no multi-threaded user mode drivers today but there is nothing preclucing user drivers from being multi-threaded, so it is just a good idea to track the entire process. Take a reference count on the tgid's in question to make it possible to remove exit_umh in a future change. As a struct pid is available directly use kill_pid_info. The prior process signalling code was iffy in using a userspace pid known to be in the initial pid namespace and then looking up it's task in whatever the current pid namespace is. It worked only because kernel threads always run in the initial pid namespace. As the tgid is now refcounted verify the tgid is NULL at the start of fork_usermode_driver to avoid the possibility of silent pid leaks. v1: https://lkml.kernel.org/r/87mu4qdlv2.fsf_-_@x220.int.ebiederm.org v2: https://lkml.kernel.org/r/a70l4oy8.fsf_-_@x220.int.ebiederm.org Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- include/linux/usermode_driver.h | 2 +- kernel/exit.c | 3 ++- kernel/usermode_driver.c | 15 ++++++++++----- net/bpfilter/bpfilter_kern.c | 13 +++++-------- net/ipv4/bpfilter/sockopt.c | 3 ++- 5 files changed, 20 insertions(+), 16 deletions(-) diff --git a/include/linux/usermode_driver.h b/include/linux/usermode_driver.h index 97c919b7147c..45adbffb31d9 100644 --- a/include/linux/usermode_driver.h +++ b/include/linux/usermode_driver.h @@ -25,7 +25,7 @@ struct umd_info { struct list_head list; void (*cleanup)(struct umd_info *info); struct path wd; - pid_t pid; + struct pid *tgid; }; int umd_load_blob(struct umd_info *info, const void *data, size_t len); int umd_unload_blob(struct umd_info *info); diff --git a/kernel/exit.c b/kernel/exit.c index a081deea52ca..d3294b611df1 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -805,7 +805,8 @@ void __noreturn do_exit(long code) exit_task_namespaces(tsk); exit_task_work(tsk); exit_thread(tsk); - exit_umh(tsk); + if (group_dead) + exit_umh(tsk); /* * Flush inherited counters to the parent - before the parent diff --git a/kernel/usermode_driver.c b/kernel/usermode_driver.c index a86798759f83..f77f8d7ce9e3 100644 --- a/kernel/usermode_driver.c +++ b/kernel/usermode_driver.c @@ -133,7 +133,7 @@ static int umd_setup(struct subprocess_info *info, struct cred *new) set_fs_pwd(current->fs, &umd_info->wd); umd_info->pipe_to_umh = to_umh[1]; umd_info->pipe_from_umh = from_umh[0]; - umd_info->pid = task_pid_nr(current); + umd_info->tgid = get_pid(task_tgid(current)); current->flags |= PF_UMH; return 0; } @@ -146,6 +146,8 @@ static void umd_cleanup(struct subprocess_info *info) if (info->retval) { fput(umd_info->pipe_to_umh); fput(umd_info->pipe_from_umh); + put_pid(umd_info->tgid); + umd_info->tgid = NULL; } } @@ -155,9 +157,9 @@ static void umd_cleanup(struct subprocess_info *info) * * Returns either negative error or zero which indicates success in * executing a usermode driver. In such case 'struct umd_info *info' - * is populated with two pipes and a pid of the process. The caller is + * is populated with two pipes and a tgid of the process. The caller is * responsible for health check of the user process, killing it via - * pid, and closing the pipes when user process is no longer needed. + * tgid, and closing the pipes when user process is no longer needed. */ int fork_usermode_driver(struct umd_info *info) { @@ -165,6 +167,9 @@ int fork_usermode_driver(struct umd_info *info) char **argv = NULL; int err; + if (WARN_ON_ONCE(info->tgid)) + return -EBUSY; + err = -ENOMEM; argv = argv_split(GFP_KERNEL, info->driver_name, NULL); if (!argv) @@ -192,11 +197,11 @@ EXPORT_SYMBOL_GPL(fork_usermode_driver); void __exit_umh(struct task_struct *tsk) { struct umd_info *info; - pid_t pid = tsk->pid; + struct pid *tgid = task_tgid(tsk); mutex_lock(&umh_list_lock); list_for_each_entry(info, &umh_list, list) { - if (info->pid == pid) { + if (info->tgid == tgid) { list_del(&info->list); mutex_unlock(&umh_list_lock); goto out; diff --git a/net/bpfilter/bpfilter_kern.c b/net/bpfilter/bpfilter_kern.c index 28883b00609d..08ea77c2b137 100644 --- a/net/bpfilter/bpfilter_kern.c +++ b/net/bpfilter/bpfilter_kern.c @@ -15,16 +15,13 @@ extern char bpfilter_umh_end; static void shutdown_umh(void) { - struct task_struct *tsk; + struct umd_info *info = &bpfilter_ops.info; + struct pid *tgid = info->tgid; if (bpfilter_ops.stop) return; - tsk = get_pid_task(find_vpid(bpfilter_ops.info.pid), PIDTYPE_PID); - if (tsk) { - send_sig(SIGKILL, tsk, 1); - put_task_struct(tsk); - } + kill_pid(tgid, SIGKILL, 1); } static void __stop_umh(void) @@ -48,7 +45,7 @@ static int __bpfilter_process_sockopt(struct sock *sk, int optname, req.cmd = optname; req.addr = (long __force __user)optval; req.len = optlen; - if (!bpfilter_ops.info.pid) + if (!bpfilter_ops.info.tgid) goto out; n = __kernel_write(bpfilter_ops.info.pipe_to_umh, &req, sizeof(req), &pos); @@ -81,7 +78,7 @@ static int start_umh(void) if (err) return err; bpfilter_ops.stop = false; - pr_info("Loaded bpfilter_umh pid %d\n", bpfilter_ops.info.pid); + pr_info("Loaded bpfilter_umh pid %d\n", pid_nr(bpfilter_ops.info.tgid)); /* health check that usermode process started correctly */ if (__bpfilter_process_sockopt(NULL, 0, NULL, 0, 0) != 0) { diff --git a/net/ipv4/bpfilter/sockopt.c b/net/ipv4/bpfilter/sockopt.c index 5050de28333d..56cbc43145f6 100644 --- a/net/ipv4/bpfilter/sockopt.c +++ b/net/ipv4/bpfilter/sockopt.c @@ -18,7 +18,8 @@ static void bpfilter_umh_cleanup(struct umd_info *info) bpfilter_ops.stop = true; fput(info->pipe_to_umh); fput(info->pipe_from_umh); - info->pid = 0; + put_pid(info->tgid); + info->tgid = NULL; mutex_unlock(&bpfilter_ops.lock); } -- 2.25.0 ^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH v3 13/16] exit: Factor thread_group_exited out of pidfd_poll 2020-07-02 16:40 ` [PATCH v3 00/16] " Eric W. Biederman ` (11 preceding siblings ...) 2020-07-02 16:41 ` [PATCH v3 12/16] umd: Track user space drivers with struct pid Eric W. Biederman @ 2020-07-02 16:41 ` Eric W. Biederman 2020-07-03 20:30 ` Alexei Starovoitov 2020-07-04 16:00 ` Christian Brauner 2020-07-02 16:41 ` [PATCH v3 14/16] bpfilter: Take advantage of the facilities of struct pid Eric W. Biederman ` (4 subsequent siblings) 17 siblings, 2 replies; 72+ messages in thread From: Eric W. Biederman @ 2020-07-02 16:41 UTC (permalink / raw) To: linux-kernel Cc: David Miller, Greg Kroah-Hartman, Tetsuo Handa, Alexei Starovoitov, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds, Christian Brauner, Eric W. Biederman Create an independent helper thread_group_exited report return true when all threads have passed exit_notify in do_exit. AKA all of the threads are at least zombies and might be dead or completely gone. Create this helper by taking the logic out of pidfd_poll where it is already tested, and adding a missing READ_ONCE on the read of task->exit_state. I will be changing the user mode driver code to use this same logic to know when a user mode driver needs to be restarted. Place the new helper thread_group_exited in kernel/exit.c and EXPORT it so it can be used by modules. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- include/linux/sched/signal.h | 2 ++ kernel/exit.c | 24 ++++++++++++++++++++++++ kernel/fork.c | 6 +----- 3 files changed, 27 insertions(+), 5 deletions(-) diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h index 0ee5e696c5d8..1bad18a1d8ba 100644 --- a/include/linux/sched/signal.h +++ b/include/linux/sched/signal.h @@ -674,6 +674,8 @@ static inline int thread_group_empty(struct task_struct *p) #define delay_group_leader(p) \ (thread_group_leader(p) && !thread_group_empty(p)) +extern bool thread_group_exited(struct pid *pid); + extern struct sighand_struct *__lock_task_sighand(struct task_struct *task, unsigned long *flags); diff --git a/kernel/exit.c b/kernel/exit.c index d3294b611df1..a7f112feb0f6 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -1713,6 +1713,30 @@ COMPAT_SYSCALL_DEFINE5(waitid, } #endif +/** + * thread_group_exited - check that a thread group has exited + * @pid: tgid of thread group to be checked. + * + * Test if thread group is has exited (all threads are zombies, dead + * or completely gone). + * + * Return: true if the thread group has exited. false otherwise. + */ +bool thread_group_exited(struct pid *pid) +{ + struct task_struct *task; + bool exited; + + rcu_read_lock(); + task = pid_task(pid, PIDTYPE_PID); + exited = !task || + (READ_ONCE(task->exit_state) && thread_group_empty(task)); + rcu_read_unlock(); + + return exited; +} +EXPORT_SYMBOL(thread_group_exited); + __weak void abort(void) { BUG(); diff --git a/kernel/fork.c b/kernel/fork.c index 142b23645d82..bf215af7a904 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1787,22 +1787,18 @@ static void pidfd_show_fdinfo(struct seq_file *m, struct file *f) */ static __poll_t pidfd_poll(struct file *file, struct poll_table_struct *pts) { - struct task_struct *task; struct pid *pid = file->private_data; __poll_t poll_flags = 0; poll_wait(file, &pid->wait_pidfd, pts); - rcu_read_lock(); - task = pid_task(pid, PIDTYPE_PID); /* * Inform pollers only when the whole thread group exits. * If the thread group leader exits before all other threads in the * group, then poll(2) should block, similar to the wait(2) family. */ - if (!task || (task->exit_state && thread_group_empty(task))) + if (thread_group_exited(pid)) poll_flags = EPOLLIN | EPOLLRDNORM; - rcu_read_unlock(); return poll_flags; } -- 2.25.0 ^ permalink raw reply related [flat|nested] 72+ messages in thread
* Re: [PATCH v3 13/16] exit: Factor thread_group_exited out of pidfd_poll 2020-07-02 16:41 ` [PATCH v3 13/16] exit: Factor thread_group_exited out of pidfd_poll Eric W. Biederman @ 2020-07-03 20:30 ` Alexei Starovoitov 2020-07-03 21:37 ` Eric W. Biederman 2020-07-04 16:00 ` Christian Brauner 1 sibling, 1 reply; 72+ messages in thread From: Alexei Starovoitov @ 2020-07-03 20:30 UTC (permalink / raw) To: Eric W. Biederman Cc: linux-kernel, David Miller, Greg Kroah-Hartman, Tetsuo Handa, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds, Christian Brauner On Thu, Jul 02, 2020 at 11:41:37AM -0500, Eric W. Biederman wrote: > Create an independent helper thread_group_exited report return true > when all threads have passed exit_notify in do_exit. AKA all of the > threads are at least zombies and might be dead or completely gone. > > Create this helper by taking the logic out of pidfd_poll where > it is already tested, and adding a missing READ_ONCE on > the read of task->exit_state. > > I will be changing the user mode driver code to use this same logic > to know when a user mode driver needs to be restarted. > > Place the new helper thread_group_exited in kernel/exit.c and > EXPORT it so it can be used by modules. > > Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> > --- > include/linux/sched/signal.h | 2 ++ > kernel/exit.c | 24 ++++++++++++++++++++++++ > kernel/fork.c | 6 +----- > 3 files changed, 27 insertions(+), 5 deletions(-) > > diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h > index 0ee5e696c5d8..1bad18a1d8ba 100644 > --- a/include/linux/sched/signal.h > +++ b/include/linux/sched/signal.h > @@ -674,6 +674,8 @@ static inline int thread_group_empty(struct task_struct *p) > #define delay_group_leader(p) \ > (thread_group_leader(p) && !thread_group_empty(p)) > > +extern bool thread_group_exited(struct pid *pid); > + > extern struct sighand_struct *__lock_task_sighand(struct task_struct *task, > unsigned long *flags); > > diff --git a/kernel/exit.c b/kernel/exit.c > index d3294b611df1..a7f112feb0f6 100644 > --- a/kernel/exit.c > +++ b/kernel/exit.c > @@ -1713,6 +1713,30 @@ COMPAT_SYSCALL_DEFINE5(waitid, > } > #endif > > +/** > + * thread_group_exited - check that a thread group has exited > + * @pid: tgid of thread group to be checked. > + * > + * Test if thread group is has exited (all threads are zombies, dead > + * or completely gone). > + * > + * Return: true if the thread group has exited. false otherwise. > + */ > +bool thread_group_exited(struct pid *pid) > +{ > + struct task_struct *task; > + bool exited; > + > + rcu_read_lock(); > + task = pid_task(pid, PIDTYPE_PID); > + exited = !task || > + (READ_ONCE(task->exit_state) && thread_group_empty(task)); > + rcu_read_unlock(); > + > + return exited; > +} I'm not sure why you think READ_ONCE was missing. It's different in wait_consider_task() where READ_ONCE is needed because of multiple checks. Here it's done once. The rest all looks good to me. Tested with and without bpf_preload patches. Feel free to create a frozen branch with this set. btw I'll be offline starting tomorrow for a week. Will catch up with threads afterwards. ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH v3 13/16] exit: Factor thread_group_exited out of pidfd_poll 2020-07-03 20:30 ` Alexei Starovoitov @ 2020-07-03 21:37 ` Eric W. Biederman 2020-07-04 0:03 ` Alexei Starovoitov 2020-07-04 15:50 ` Christian Brauner 0 siblings, 2 replies; 72+ messages in thread From: Eric W. Biederman @ 2020-07-03 21:37 UTC (permalink / raw) To: Alexei Starovoitov Cc: linux-kernel, David Miller, Greg Kroah-Hartman, Tetsuo Handa, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds, Christian Brauner Alexei Starovoitov <alexei.starovoitov@gmail.com> writes: > On Thu, Jul 02, 2020 at 11:41:37AM -0500, Eric W. Biederman wrote: >> Create an independent helper thread_group_exited report return true >> when all threads have passed exit_notify in do_exit. AKA all of the >> threads are at least zombies and might be dead or completely gone. >> >> Create this helper by taking the logic out of pidfd_poll where >> it is already tested, and adding a missing READ_ONCE on >> the read of task->exit_state. >> >> I will be changing the user mode driver code to use this same logic >> to know when a user mode driver needs to be restarted. >> >> Place the new helper thread_group_exited in kernel/exit.c and >> EXPORT it so it can be used by modules. >> >> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> >> --- >> include/linux/sched/signal.h | 2 ++ >> kernel/exit.c | 24 ++++++++++++++++++++++++ >> kernel/fork.c | 6 +----- >> 3 files changed, 27 insertions(+), 5 deletions(-) >> >> diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h >> index 0ee5e696c5d8..1bad18a1d8ba 100644 >> --- a/include/linux/sched/signal.h >> +++ b/include/linux/sched/signal.h >> @@ -674,6 +674,8 @@ static inline int thread_group_empty(struct task_struct *p) >> #define delay_group_leader(p) \ >> (thread_group_leader(p) && !thread_group_empty(p)) >> >> +extern bool thread_group_exited(struct pid *pid); >> + >> extern struct sighand_struct *__lock_task_sighand(struct task_struct *task, >> unsigned long *flags); >> >> diff --git a/kernel/exit.c b/kernel/exit.c >> index d3294b611df1..a7f112feb0f6 100644 >> --- a/kernel/exit.c >> +++ b/kernel/exit.c >> @@ -1713,6 +1713,30 @@ COMPAT_SYSCALL_DEFINE5(waitid, >> } >> #endif >> >> +/** >> + * thread_group_exited - check that a thread group has exited >> + * @pid: tgid of thread group to be checked. >> + * >> + * Test if thread group is has exited (all threads are zombies, dead >> + * or completely gone). >> + * >> + * Return: true if the thread group has exited. false otherwise. >> + */ >> +bool thread_group_exited(struct pid *pid) >> +{ >> + struct task_struct *task; >> + bool exited; >> + >> + rcu_read_lock(); >> + task = pid_task(pid, PIDTYPE_PID); >> + exited = !task || >> + (READ_ONCE(task->exit_state) && thread_group_empty(task)); >> + rcu_read_unlock(); >> + >> + return exited; >> +} > > I'm not sure why you think READ_ONCE was missing. > It's different in wait_consider_task() where READ_ONCE is needed because > of multiple checks. Here it's done once. In practice it probably has no effect on the generated code. But READ_ONCE is about telling the compiler not to be clever. Don't use tearing loads or stores etc. When all of the other readers are using READ_ONCE I just get nervous if we have a case that doesn't. > The rest all looks good to me. Tested with and without bpf_preload patches. > Feel free to create a frozen branch with this set. Can I have your Tested-by and Acked-by? > btw I'll be offline starting tomorrow for a week. > Will catch up with threads afterwards. Eric ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH v3 13/16] exit: Factor thread_group_exited out of pidfd_poll 2020-07-03 21:37 ` Eric W. Biederman @ 2020-07-04 0:03 ` Alexei Starovoitov 2020-07-04 15:50 ` Christian Brauner 1 sibling, 0 replies; 72+ messages in thread From: Alexei Starovoitov @ 2020-07-04 0:03 UTC (permalink / raw) To: Eric W. Biederman Cc: linux-kernel, David Miller, Greg Kroah-Hartman, Tetsuo Handa, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds, Christian Brauner On Fri, Jul 03, 2020 at 04:37:47PM -0500, Eric W. Biederman wrote: > > > The rest all looks good to me. Tested with and without bpf_preload patches. > > Feel free to create a frozen branch with this set. > > Can I have your Tested-by and Acked-by? For the set: Acked-by: Alexei Starovoitov <ast@kernel.org> Tested-by: Alexei Starovoitov <ast@kernel.org> ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH v3 13/16] exit: Factor thread_group_exited out of pidfd_poll 2020-07-03 21:37 ` Eric W. Biederman 2020-07-04 0:03 ` Alexei Starovoitov @ 2020-07-04 15:50 ` Christian Brauner 2020-07-07 17:09 ` Eric W. Biederman 1 sibling, 1 reply; 72+ messages in thread From: Christian Brauner @ 2020-07-04 15:50 UTC (permalink / raw) To: Eric W. Biederman Cc: Alexei Starovoitov, linux-kernel, David Miller, Greg Kroah-Hartman, Tetsuo Handa, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds On Fri, Jul 03, 2020 at 04:37:47PM -0500, Eric W. Biederman wrote: > Alexei Starovoitov <alexei.starovoitov@gmail.com> writes: > > > On Thu, Jul 02, 2020 at 11:41:37AM -0500, Eric W. Biederman wrote: > >> Create an independent helper thread_group_exited report return true > >> when all threads have passed exit_notify in do_exit. AKA all of the > >> threads are at least zombies and might be dead or completely gone. > >> > >> Create this helper by taking the logic out of pidfd_poll where > >> it is already tested, and adding a missing READ_ONCE on > >> the read of task->exit_state. > >> > >> I will be changing the user mode driver code to use this same logic > >> to know when a user mode driver needs to be restarted. > >> > >> Place the new helper thread_group_exited in kernel/exit.c and > >> EXPORT it so it can be used by modules. > >> > >> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> > >> --- > >> include/linux/sched/signal.h | 2 ++ > >> kernel/exit.c | 24 ++++++++++++++++++++++++ > >> kernel/fork.c | 6 +----- > >> 3 files changed, 27 insertions(+), 5 deletions(-) > >> > >> diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h > >> index 0ee5e696c5d8..1bad18a1d8ba 100644 > >> --- a/include/linux/sched/signal.h > >> +++ b/include/linux/sched/signal.h > >> @@ -674,6 +674,8 @@ static inline int thread_group_empty(struct task_struct *p) > >> #define delay_group_leader(p) \ > >> (thread_group_leader(p) && !thread_group_empty(p)) > >> > >> +extern bool thread_group_exited(struct pid *pid); > >> + > >> extern struct sighand_struct *__lock_task_sighand(struct task_struct *task, > >> unsigned long *flags); > >> > >> diff --git a/kernel/exit.c b/kernel/exit.c > >> index d3294b611df1..a7f112feb0f6 100644 > >> --- a/kernel/exit.c > >> +++ b/kernel/exit.c > >> @@ -1713,6 +1713,30 @@ COMPAT_SYSCALL_DEFINE5(waitid, > >> } > >> #endif > >> > >> +/** > >> + * thread_group_exited - check that a thread group has exited > >> + * @pid: tgid of thread group to be checked. > >> + * > >> + * Test if thread group is has exited (all threads are zombies, dead > >> + * or completely gone). > >> + * > >> + * Return: true if the thread group has exited. false otherwise. > >> + */ > >> +bool thread_group_exited(struct pid *pid) > >> +{ > >> + struct task_struct *task; > >> + bool exited; > >> + > >> + rcu_read_lock(); > >> + task = pid_task(pid, PIDTYPE_PID); > >> + exited = !task || > >> + (READ_ONCE(task->exit_state) && thread_group_empty(task)); > >> + rcu_read_unlock(); > >> + > >> + return exited; > >> +} > > > > I'm not sure why you think READ_ONCE was missing. > > It's different in wait_consider_task() where READ_ONCE is needed because > > of multiple checks. Here it's done once. > > In practice it probably has no effect on the generated code. But > READ_ONCE is about telling the compiler not to be clever. Don't use > tearing loads or stores etc. When all of the other readers are using > READ_ONCE I just get nervous if we have a case that doesn't. That's not true. The only place where READ_ONCE(->exit_state) is used is in wait_consider_task() and nowhere else. We had that discussion a while ago where I or someone proposed to simply place a READ_ONCE() around all accesses to exit_state for the sake of kcsan and we agreed that it's unnecessary and not to do this. But it obviously doesn't hurt to have it. Christian ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH v3 13/16] exit: Factor thread_group_exited out of pidfd_poll 2020-07-04 15:50 ` Christian Brauner @ 2020-07-07 17:09 ` Eric W. Biederman 2020-07-08 0:05 ` Daniel Borkmann 0 siblings, 1 reply; 72+ messages in thread From: Eric W. Biederman @ 2020-07-07 17:09 UTC (permalink / raw) To: Christian Brauner Cc: Alexei Starovoitov, linux-kernel, David Miller, Greg Kroah-Hartman, Tetsuo Handa, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds Christian Brauner <christian.brauner@ubuntu.com> writes: > On Fri, Jul 03, 2020 at 04:37:47PM -0500, Eric W. Biederman wrote: >> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes: >> >> > On Thu, Jul 02, 2020 at 11:41:37AM -0500, Eric W. Biederman wrote: >> >> Create an independent helper thread_group_exited report return true >> >> when all threads have passed exit_notify in do_exit. AKA all of the >> >> threads are at least zombies and might be dead or completely gone. >> >> >> >> Create this helper by taking the logic out of pidfd_poll where >> >> it is already tested, and adding a missing READ_ONCE on >> >> the read of task->exit_state. >> >> >> >> I will be changing the user mode driver code to use this same logic >> >> to know when a user mode driver needs to be restarted. >> >> >> >> Place the new helper thread_group_exited in kernel/exit.c and >> >> EXPORT it so it can be used by modules. >> >> >> >> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> >> >> --- >> >> include/linux/sched/signal.h | 2 ++ >> >> kernel/exit.c | 24 ++++++++++++++++++++++++ >> >> kernel/fork.c | 6 +----- >> >> 3 files changed, 27 insertions(+), 5 deletions(-) >> >> >> >> diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h >> >> index 0ee5e696c5d8..1bad18a1d8ba 100644 >> >> --- a/include/linux/sched/signal.h >> >> +++ b/include/linux/sched/signal.h >> >> @@ -674,6 +674,8 @@ static inline int thread_group_empty(struct task_struct *p) >> >> #define delay_group_leader(p) \ >> >> (thread_group_leader(p) && !thread_group_empty(p)) >> >> >> >> +extern bool thread_group_exited(struct pid *pid); >> >> + >> >> extern struct sighand_struct *__lock_task_sighand(struct task_struct *task, >> >> unsigned long *flags); >> >> >> >> diff --git a/kernel/exit.c b/kernel/exit.c >> >> index d3294b611df1..a7f112feb0f6 100644 >> >> --- a/kernel/exit.c >> >> +++ b/kernel/exit.c >> >> @@ -1713,6 +1713,30 @@ COMPAT_SYSCALL_DEFINE5(waitid, >> >> } >> >> #endif >> >> >> >> +/** >> >> + * thread_group_exited - check that a thread group has exited >> >> + * @pid: tgid of thread group to be checked. >> >> + * >> >> + * Test if thread group is has exited (all threads are zombies, dead >> >> + * or completely gone). >> >> + * >> >> + * Return: true if the thread group has exited. false otherwise. >> >> + */ >> >> +bool thread_group_exited(struct pid *pid) >> >> +{ >> >> + struct task_struct *task; >> >> + bool exited; >> >> + >> >> + rcu_read_lock(); >> >> + task = pid_task(pid, PIDTYPE_PID); >> >> + exited = !task || >> >> + (READ_ONCE(task->exit_state) && thread_group_empty(task)); >> >> + rcu_read_unlock(); >> >> + >> >> + return exited; >> >> +} >> > >> > I'm not sure why you think READ_ONCE was missing. >> > It's different in wait_consider_task() where READ_ONCE is needed because >> > of multiple checks. Here it's done once. >> >> In practice it probably has no effect on the generated code. But >> READ_ONCE is about telling the compiler not to be clever. Don't use >> tearing loads or stores etc. When all of the other readers are using >> READ_ONCE I just get nervous if we have a case that doesn't. > > That's not true. The only place where READ_ONCE(->exit_state) is used is > in wait_consider_task() and nowhere else. We had that discussion a while > ago where I or someone proposed to simply place a READ_ONCE() around all > accesses to exit_state for the sake of kcsan and we agreed that it's > unnecessary and not to do this. > But it obviously doesn't hurt to have it. There is a larger discussion to be had around the proper handling of exit_state. In this particular case because we are accessing exit_state with only rcu_read_lock protection, because the outcome of the read is about correctness, and because the compiler has nothing else telling it not to re-read exit_state, I believe we actually need the READ_ONCE. At the same time it would take a pretty special compiler to want to reaccess that field in thread_group_exited. I have looked through and I don't find any of the other access of exit_state where the result is about correctness (so that we care) and we don't hold tasklist_lock. But I have removed the necessary wording from the commit comment. There is a much larger discussion to be had about what to do with exit_state, because I think I found about half the accesses were slightly buggy in one form or another. Eric ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH v3 13/16] exit: Factor thread_group_exited out of pidfd_poll 2020-07-07 17:09 ` Eric W. Biederman @ 2020-07-08 0:05 ` Daniel Borkmann 2020-07-08 3:50 ` Eric W. Biederman 0 siblings, 1 reply; 72+ messages in thread From: Daniel Borkmann @ 2020-07-08 0:05 UTC (permalink / raw) To: Eric W. Biederman, Christian Brauner Cc: Alexei Starovoitov, linux-kernel, David Miller, Greg Kroah-Hartman, Tetsuo Handa, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds On 7/7/20 7:09 PM, Eric W. Biederman wrote: > Christian Brauner <christian.brauner@ubuntu.com> writes: >> On Fri, Jul 03, 2020 at 04:37:47PM -0500, Eric W. Biederman wrote: >>> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes: >>> >>>> On Thu, Jul 02, 2020 at 11:41:37AM -0500, Eric W. Biederman wrote: >>>>> Create an independent helper thread_group_exited report return true >>>>> when all threads have passed exit_notify in do_exit. AKA all of the >>>>> threads are at least zombies and might be dead or completely gone. >>>>> >>>>> Create this helper by taking the logic out of pidfd_poll where >>>>> it is already tested, and adding a missing READ_ONCE on >>>>> the read of task->exit_state. >>>>> >>>>> I will be changing the user mode driver code to use this same logic >>>>> to know when a user mode driver needs to be restarted. >>>>> >>>>> Place the new helper thread_group_exited in kernel/exit.c and >>>>> EXPORT it so it can be used by modules. >>>>> >>>>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> >>>>> --- >>>>> include/linux/sched/signal.h | 2 ++ >>>>> kernel/exit.c | 24 ++++++++++++++++++++++++ >>>>> kernel/fork.c | 6 +----- >>>>> 3 files changed, 27 insertions(+), 5 deletions(-) >>>>> >>>>> diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h >>>>> index 0ee5e696c5d8..1bad18a1d8ba 100644 >>>>> --- a/include/linux/sched/signal.h >>>>> +++ b/include/linux/sched/signal.h >>>>> @@ -674,6 +674,8 @@ static inline int thread_group_empty(struct task_struct *p) >>>>> #define delay_group_leader(p) \ >>>>> (thread_group_leader(p) && !thread_group_empty(p)) >>>>> >>>>> +extern bool thread_group_exited(struct pid *pid); >>>>> + >>>>> extern struct sighand_struct *__lock_task_sighand(struct task_struct *task, >>>>> unsigned long *flags); >>>>> >>>>> diff --git a/kernel/exit.c b/kernel/exit.c >>>>> index d3294b611df1..a7f112feb0f6 100644 >>>>> --- a/kernel/exit.c >>>>> +++ b/kernel/exit.c >>>>> @@ -1713,6 +1713,30 @@ COMPAT_SYSCALL_DEFINE5(waitid, >>>>> } >>>>> #endif >>>>> >>>>> +/** >>>>> + * thread_group_exited - check that a thread group has exited >>>>> + * @pid: tgid of thread group to be checked. >>>>> + * >>>>> + * Test if thread group is has exited (all threads are zombies, dead >>>>> + * or completely gone). >>>>> + * >>>>> + * Return: true if the thread group has exited. false otherwise. >>>>> + */ >>>>> +bool thread_group_exited(struct pid *pid) >>>>> +{ >>>>> + struct task_struct *task; >>>>> + bool exited; >>>>> + >>>>> + rcu_read_lock(); >>>>> + task = pid_task(pid, PIDTYPE_PID); >>>>> + exited = !task || >>>>> + (READ_ONCE(task->exit_state) && thread_group_empty(task)); >>>>> + rcu_read_unlock(); >>>>> + >>>>> + return exited; >>>>> +} >>>> >>>> I'm not sure why you think READ_ONCE was missing. >>>> It's different in wait_consider_task() where READ_ONCE is needed because >>>> of multiple checks. Here it's done once. >>> >>> In practice it probably has no effect on the generated code. But >>> READ_ONCE is about telling the compiler not to be clever. Don't use >>> tearing loads or stores etc. When all of the other readers are using >>> READ_ONCE I just get nervous if we have a case that doesn't. >> >> That's not true. The only place where READ_ONCE(->exit_state) is used is >> in wait_consider_task() and nowhere else. We had that discussion a while >> ago where I or someone proposed to simply place a READ_ONCE() around all >> accesses to exit_state for the sake of kcsan and we agreed that it's >> unnecessary and not to do this. >> But it obviously doesn't hurt to have it. > > There is a larger discussion to be had around the proper handling of > exit_state. > > In this particular case because we are accessing exit_state with > only rcu_read_lock protection, because the outcome of the read > is about correctness, and because the compiler has nothing else > telling it not to re-read exit_state, I believe we actually need > the READ_ONCE. > > At the same time it would take a pretty special compiler to want to > reaccess that field in thread_group_exited. > > I have looked through and I don't find any of the other access of > exit_state where the result is about correctness (so that we care) > and we don't hold tasklist_lock. > > But I have removed the necessary wording from the commit comment. Hey Eric, are you planning to push the final version into a topic branch so it can be pulled into bpf-next as discussed earlier? Thanks, Daniel ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH v3 13/16] exit: Factor thread_group_exited out of pidfd_poll 2020-07-08 0:05 ` Daniel Borkmann @ 2020-07-08 3:50 ` Eric W. Biederman 0 siblings, 0 replies; 72+ messages in thread From: Eric W. Biederman @ 2020-07-08 3:50 UTC (permalink / raw) To: Daniel Borkmann Cc: Christian Brauner, Alexei Starovoitov, linux-kernel, David Miller, Greg Kroah-Hartman, Tetsuo Handa, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds Daniel Borkmann <daniel@iogearbox.net> writes: > Hey Eric, are you planning to push the final version into a topic branch > so it can be pulled into bpf-next as discussed earlier? Yes. I just about have it ready. I am taking one last pass through the review comments to make certain I have not missed anything before I do. I am hoping I can get it out tonight. Fingers crossed. Eric ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH v3 13/16] exit: Factor thread_group_exited out of pidfd_poll 2020-07-02 16:41 ` [PATCH v3 13/16] exit: Factor thread_group_exited out of pidfd_poll Eric W. Biederman 2020-07-03 20:30 ` Alexei Starovoitov @ 2020-07-04 16:00 ` Christian Brauner 1 sibling, 0 replies; 72+ messages in thread From: Christian Brauner @ 2020-07-04 16:00 UTC (permalink / raw) To: Eric W. Biederman Cc: linux-kernel, David Miller, Greg Kroah-Hartman, Tetsuo Handa, Alexei Starovoitov, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds On Thu, Jul 02, 2020 at 11:41:37AM -0500, Eric W. Biederman wrote: > Create an independent helper thread_group_exited report return true s/report return/which reports/ > when all threads have passed exit_notify in do_exit. AKA all of the > threads are at least zombies and might be dead or completely gone. > > Create this helper by taking the logic out of pidfd_poll where > it is already tested, and adding a missing READ_ONCE on > the read of task->exit_state. I would prefer to have this comment dropped as this read_once() is not missing as you can see from the comments elsewhere in this thread. > > I will be changing the user mode driver code to use this same logic > to know when a user mode driver needs to be restarted. > > Place the new helper thread_group_exited in kernel/exit.c and > EXPORT it so it can be used by modules. > > Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> > --- Minus the typos above and below, this looks good and passes the pidfd and process test-suite. Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Thanks! Christian > include/linux/sched/signal.h | 2 ++ > kernel/exit.c | 24 ++++++++++++++++++++++++ > kernel/fork.c | 6 +----- > 3 files changed, 27 insertions(+), 5 deletions(-) > > diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h > index 0ee5e696c5d8..1bad18a1d8ba 100644 > --- a/include/linux/sched/signal.h > +++ b/include/linux/sched/signal.h > @@ -674,6 +674,8 @@ static inline int thread_group_empty(struct task_struct *p) > #define delay_group_leader(p) \ > (thread_group_leader(p) && !thread_group_empty(p)) > > +extern bool thread_group_exited(struct pid *pid); > + > extern struct sighand_struct *__lock_task_sighand(struct task_struct *task, > unsigned long *flags); > > diff --git a/kernel/exit.c b/kernel/exit.c > index d3294b611df1..a7f112feb0f6 100644 > --- a/kernel/exit.c > +++ b/kernel/exit.c > @@ -1713,6 +1713,30 @@ COMPAT_SYSCALL_DEFINE5(waitid, > } > #endif > > +/** > + * thread_group_exited - check that a thread group has exited > + * @pid: tgid of thread group to be checked. > + * > + * Test if thread group is has exited (all threads are zombies, dead s/is has exited/has exited/ > + * or completely gone). > + * > + * Return: true if the thread group has exited. false otherwise. > + */ > +bool thread_group_exited(struct pid *pid) > +{ > + struct task_struct *task; > + bool exited; > + > + rcu_read_lock(); > + task = pid_task(pid, PIDTYPE_PID); > + exited = !task || > + (READ_ONCE(task->exit_state) && thread_group_empty(task)); > + rcu_read_unlock(); > + > + return exited; > +} > +EXPORT_SYMBOL(thread_group_exited); > + > __weak void abort(void) > { > BUG(); > diff --git a/kernel/fork.c b/kernel/fork.c > index 142b23645d82..bf215af7a904 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -1787,22 +1787,18 @@ static void pidfd_show_fdinfo(struct seq_file *m, struct file *f) > */ > static __poll_t pidfd_poll(struct file *file, struct poll_table_struct *pts) > { > - struct task_struct *task; > struct pid *pid = file->private_data; > __poll_t poll_flags = 0; > > poll_wait(file, &pid->wait_pidfd, pts); > > - rcu_read_lock(); > - task = pid_task(pid, PIDTYPE_PID); > /* > * Inform pollers only when the whole thread group exits. > * If the thread group leader exits before all other threads in the > * group, then poll(2) should block, similar to the wait(2) family. > */ > - if (!task || (task->exit_state && thread_group_empty(task))) > + if (thread_group_exited(pid)) > poll_flags = EPOLLIN | EPOLLRDNORM; > - rcu_read_unlock(); > > return poll_flags; > } > -- > 2.25.0 > ^ permalink raw reply [flat|nested] 72+ messages in thread
* [PATCH v3 14/16] bpfilter: Take advantage of the facilities of struct pid 2020-07-02 16:40 ` [PATCH v3 00/16] " Eric W. Biederman ` (12 preceding siblings ...) 2020-07-02 16:41 ` [PATCH v3 13/16] exit: Factor thread_group_exited out of pidfd_poll Eric W. Biederman @ 2020-07-02 16:41 ` Eric W. Biederman 2020-07-02 16:41 ` [PATCH v3 15/16] umd: Remove exit_umh Eric W. Biederman ` (3 subsequent siblings) 17 siblings, 0 replies; 72+ messages in thread From: Eric W. Biederman @ 2020-07-02 16:41 UTC (permalink / raw) To: linux-kernel Cc: David Miller, Greg Kroah-Hartman, Tetsuo Handa, Alexei Starovoitov, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds, Christian Brauner, Eric W. Biederman, Greg Kroah-Hartman Instead of relying on the exit_umh cleanup callback use the fact a struct pid can be tested to see if a process still exists, and that struct pid has a wait queue that notifies when the process dies. v1: https://lkml.kernel.org/r/87h7uydlu9.fsf_-_@x220.int.ebiederm.org v2: https://lkml.kernel.org/r/874kqt4owu.fsf_-_@x220.int.ebiederm.org Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- include/linux/bpfilter.h | 3 ++- net/bpfilter/bpfilter_kern.c | 15 +++++---------- net/ipv4/bpfilter/sockopt.c | 15 ++++++++------- 3 files changed, 15 insertions(+), 18 deletions(-) diff --git a/include/linux/bpfilter.h b/include/linux/bpfilter.h index ec9972d822e0..9b114c718a76 100644 --- a/include/linux/bpfilter.h +++ b/include/linux/bpfilter.h @@ -10,6 +10,8 @@ int bpfilter_ip_set_sockopt(struct sock *sk, int optname, char __user *optval, unsigned int optlen); int bpfilter_ip_get_sockopt(struct sock *sk, int optname, char __user *optval, int __user *optlen); +void bpfilter_umh_cleanup(struct umd_info *info); + struct bpfilter_umh_ops { struct umd_info info; /* since ip_getsockopt() can run in parallel, serialize access to umh */ @@ -18,7 +20,6 @@ struct bpfilter_umh_ops { char __user *optval, unsigned int optlen, bool is_set); int (*start)(void); - bool stop; }; extern struct bpfilter_umh_ops bpfilter_ops; #endif diff --git a/net/bpfilter/bpfilter_kern.c b/net/bpfilter/bpfilter_kern.c index 08ea77c2b137..9616fb7defeb 100644 --- a/net/bpfilter/bpfilter_kern.c +++ b/net/bpfilter/bpfilter_kern.c @@ -18,10 +18,11 @@ static void shutdown_umh(void) struct umd_info *info = &bpfilter_ops.info; struct pid *tgid = info->tgid; - if (bpfilter_ops.stop) - return; - - kill_pid(tgid, SIGKILL, 1); + if (tgid) { + kill_pid(tgid, SIGKILL, 1); + wait_event(tgid->wait_pidfd, thread_group_exited(tgid)); + bpfilter_umh_cleanup(info); + } } static void __stop_umh(void) @@ -77,7 +78,6 @@ static int start_umh(void) err = fork_usermode_driver(&bpfilter_ops.info); if (err) return err; - bpfilter_ops.stop = false; pr_info("Loaded bpfilter_umh pid %d\n", pid_nr(bpfilter_ops.info.tgid)); /* health check that usermode process started correctly */ @@ -100,16 +100,11 @@ static int __init load_umh(void) return err; mutex_lock(&bpfilter_ops.lock); - if (!bpfilter_ops.stop) { - err = -EFAULT; - goto out; - } err = start_umh(); if (!err && IS_ENABLED(CONFIG_INET)) { bpfilter_ops.sockopt = &__bpfilter_process_sockopt; bpfilter_ops.start = &start_umh; } -out: mutex_unlock(&bpfilter_ops.lock); if (err) umd_unload_blob(&bpfilter_ops.info); diff --git a/net/ipv4/bpfilter/sockopt.c b/net/ipv4/bpfilter/sockopt.c index 56cbc43145f6..9063c6767d34 100644 --- a/net/ipv4/bpfilter/sockopt.c +++ b/net/ipv4/bpfilter/sockopt.c @@ -12,16 +12,14 @@ struct bpfilter_umh_ops bpfilter_ops; EXPORT_SYMBOL_GPL(bpfilter_ops); -static void bpfilter_umh_cleanup(struct umd_info *info) +void bpfilter_umh_cleanup(struct umd_info *info) { - mutex_lock(&bpfilter_ops.lock); - bpfilter_ops.stop = true; fput(info->pipe_to_umh); fput(info->pipe_from_umh); put_pid(info->tgid); info->tgid = NULL; - mutex_unlock(&bpfilter_ops.lock); } +EXPORT_SYMBOL_GPL(bpfilter_umh_cleanup); static int bpfilter_mbox_request(struct sock *sk, int optname, char __user *optval, @@ -39,7 +37,11 @@ static int bpfilter_mbox_request(struct sock *sk, int optname, goto out; } } - if (bpfilter_ops.stop) { + if (bpfilter_ops.info.tgid && + thread_group_exited(bpfilter_ops.info.tgid)) + bpfilter_umh_cleanup(&bpfilter_ops.info); + + if (!bpfilter_ops.info.tgid) { err = bpfilter_ops.start(); if (err) goto out; @@ -70,9 +72,8 @@ int bpfilter_ip_get_sockopt(struct sock *sk, int optname, char __user *optval, static int __init bpfilter_sockopt_init(void) { mutex_init(&bpfilter_ops.lock); - bpfilter_ops.stop = true; + bpfilter_ops.info.tgid = NULL; bpfilter_ops.info.driver_name = "bpfilter_umh"; - bpfilter_ops.info.cleanup = &bpfilter_umh_cleanup; return 0; } -- 2.25.0 ^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH v3 15/16] umd: Remove exit_umh 2020-07-02 16:40 ` [PATCH v3 00/16] " Eric W. Biederman ` (13 preceding siblings ...) 2020-07-02 16:41 ` [PATCH v3 14/16] bpfilter: Take advantage of the facilities of struct pid Eric W. Biederman @ 2020-07-02 16:41 ` Eric W. Biederman 2020-07-02 16:41 ` [PATCH v3 16/16] umd: Stop using split_argv Eric W. Biederman ` (2 subsequent siblings) 17 siblings, 0 replies; 72+ messages in thread From: Eric W. Biederman @ 2020-07-02 16:41 UTC (permalink / raw) To: linux-kernel Cc: David Miller, Greg Kroah-Hartman, Tetsuo Handa, Alexei Starovoitov, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds, Christian Brauner, Eric W. Biederman, Greg Kroah-Hartman The bpfilter code no longer uses the umd_info.cleanup callback. This callback is what exit_umh exists to call. So remove exit_umh and all of it's associated booking. v1: https://lkml.kernel.org/r/87bll6dlte.fsf_-_@x220.int.ebiederm.org v2: https://lkml.kernel.org/r/87y2o53abg.fsf_-_@x220.int.ebiederm.org Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- include/linux/sched.h | 1 - include/linux/usermode_driver.h | 16 ---------------- kernel/exit.c | 3 --- kernel/usermode_driver.c | 28 ---------------------------- 4 files changed, 48 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 59d1e92bb88e..edb2020875ad 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1511,7 +1511,6 @@ extern struct pid *cad_pid; #define PF_KTHREAD 0x00200000 /* I am a kernel thread */ #define PF_RANDOMIZE 0x00400000 /* Randomize virtual address space */ #define PF_SWAPWRITE 0x00800000 /* Allowed to write to swap */ -#define PF_UMH 0x02000000 /* I'm an Usermodehelper process */ #define PF_NO_SETAFFINITY 0x04000000 /* Userland is not allowed to meddle with cpus_mask */ #define PF_MCE_EARLY 0x08000000 /* Early kill for mce process policy */ #define PF_MEMALLOC_NOCMA 0x10000000 /* All allocation request will have _GFP_MOVABLE cleared */ diff --git a/include/linux/usermode_driver.h b/include/linux/usermode_driver.h index 45adbffb31d9..073a9e0ec07d 100644 --- a/include/linux/usermode_driver.h +++ b/include/linux/usermode_driver.h @@ -4,26 +4,10 @@ #include <linux/umh.h> #include <linux/path.h> -#ifdef CONFIG_BPFILTER -void __exit_umh(struct task_struct *tsk); - -static inline void exit_umh(struct task_struct *tsk) -{ - if (unlikely(tsk->flags & PF_UMH)) - __exit_umh(tsk); -} -#else -static inline void exit_umh(struct task_struct *tsk) -{ -} -#endif - struct umd_info { const char *driver_name; struct file *pipe_to_umh; struct file *pipe_from_umh; - struct list_head list; - void (*cleanup)(struct umd_info *info); struct path wd; struct pid *tgid; }; diff --git a/kernel/exit.c b/kernel/exit.c index a7f112feb0f6..4ec82859bfe5 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -63,7 +63,6 @@ #include <linux/random.h> #include <linux/rcuwait.h> #include <linux/compat.h> -#include <linux/usermode_driver.h> #include <linux/uaccess.h> #include <asm/unistd.h> @@ -805,8 +804,6 @@ void __noreturn do_exit(long code) exit_task_namespaces(tsk); exit_task_work(tsk); exit_thread(tsk); - if (group_dead) - exit_umh(tsk); /* * Flush inherited counters to the parent - before the parent diff --git a/kernel/usermode_driver.c b/kernel/usermode_driver.c index f77f8d7ce9e3..cd136f86f799 100644 --- a/kernel/usermode_driver.c +++ b/kernel/usermode_driver.c @@ -9,9 +9,6 @@ #include <linux/task_work.h> #include <linux/usermode_driver.h> -static LIST_HEAD(umh_list); -static DEFINE_MUTEX(umh_list_lock); - static struct vfsmount *blob_to_mnt(const void *data, size_t len, const char *name) { struct file_system_type *type; @@ -134,7 +131,6 @@ static int umd_setup(struct subprocess_info *info, struct cred *new) umd_info->pipe_to_umh = to_umh[1]; umd_info->pipe_from_umh = from_umh[0]; umd_info->tgid = get_pid(task_tgid(current)); - current->flags |= PF_UMH; return 0; } @@ -182,11 +178,6 @@ int fork_usermode_driver(struct umd_info *info) goto out; err = call_usermodehelper_exec(sub_info, UMH_WAIT_EXEC); - if (!err) { - mutex_lock(&umh_list_lock); - list_add(&info->list, &umh_list); - mutex_unlock(&umh_list_lock); - } out: if (argv) argv_free(argv); @@ -194,23 +185,4 @@ int fork_usermode_driver(struct umd_info *info) } EXPORT_SYMBOL_GPL(fork_usermode_driver); -void __exit_umh(struct task_struct *tsk) -{ - struct umd_info *info; - struct pid *tgid = task_tgid(tsk); - - mutex_lock(&umh_list_lock); - list_for_each_entry(info, &umh_list, list) { - if (info->tgid == tgid) { - list_del(&info->list); - mutex_unlock(&umh_list_lock); - goto out; - } - } - mutex_unlock(&umh_list_lock); - return; -out: - if (info->cleanup) - info->cleanup(info); -} -- 2.25.0 ^ permalink raw reply related [flat|nested] 72+ messages in thread
* [PATCH v3 16/16] umd: Stop using split_argv 2020-07-02 16:40 ` [PATCH v3 00/16] " Eric W. Biederman ` (14 preceding siblings ...) 2020-07-02 16:41 ` [PATCH v3 15/16] umd: Remove exit_umh Eric W. Biederman @ 2020-07-02 16:41 ` Eric W. Biederman 2020-07-02 23:51 ` [PATCH v3 00/16] Make the user mode driver code a better citizen Tetsuo Handa 2020-07-09 22:05 ` [merged][PATCH " Eric W. Biederman 17 siblings, 0 replies; 72+ messages in thread From: Eric W. Biederman @ 2020-07-02 16:41 UTC (permalink / raw) To: linux-kernel Cc: David Miller, Greg Kroah-Hartman, Tetsuo Handa, Alexei Starovoitov, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds, Christian Brauner, Eric W. Biederman There is exactly one argument so there is nothing to split. All split_argv does now is cause confusion and avoid the need for a cast when passing a "const char *" string to call_usermodehelper_setup. So avoid confusion and the possibility of an odd driver name causing problems by just using a fixed argv array with a cast in the call to call_usermodehelper_setup. v1: https://lkml.kernel.org/r/87sged3a9n.fsf_-_@x220.int.ebiederm.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- kernel/usermode_driver.c | 12 +++--------- 1 file changed, 3 insertions(+), 9 deletions(-) diff --git a/kernel/usermode_driver.c b/kernel/usermode_driver.c index cd136f86f799..0b35212ffc3d 100644 --- a/kernel/usermode_driver.c +++ b/kernel/usermode_driver.c @@ -160,27 +160,21 @@ static void umd_cleanup(struct subprocess_info *info) int fork_usermode_driver(struct umd_info *info) { struct subprocess_info *sub_info; - char **argv = NULL; + const char *argv[] = { info->driver_name, NULL }; int err; if (WARN_ON_ONCE(info->tgid)) return -EBUSY; err = -ENOMEM; - argv = argv_split(GFP_KERNEL, info->driver_name, NULL); - if (!argv) - goto out; - - sub_info = call_usermodehelper_setup(info->driver_name, argv, NULL, - GFP_KERNEL, + sub_info = call_usermodehelper_setup(info->driver_name, + (char **)argv, NULL, GFP_KERNEL, umd_setup, umd_cleanup, info); if (!sub_info) goto out; err = call_usermodehelper_exec(sub_info, UMH_WAIT_EXEC); out: - if (argv) - argv_free(argv); return err; } EXPORT_SYMBOL_GPL(fork_usermode_driver); -- 2.25.0 ^ permalink raw reply related [flat|nested] 72+ messages in thread
* Re: [PATCH v3 00/16] Make the user mode driver code a better citizen 2020-07-02 16:40 ` [PATCH v3 00/16] " Eric W. Biederman ` (15 preceding siblings ...) 2020-07-02 16:41 ` [PATCH v3 16/16] umd: Stop using split_argv Eric W. Biederman @ 2020-07-02 23:51 ` Tetsuo Handa 2020-07-09 22:05 ` [merged][PATCH " Eric W. Biederman 17 siblings, 0 replies; 72+ messages in thread From: Tetsuo Handa @ 2020-07-02 23:51 UTC (permalink / raw) To: Eric W. Biederman Cc: linux-kernel, David Miller, Greg Kroah-Hartman, Alexei Starovoitov, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds, Christian Brauner On 2020/07/03 1:40, Eric W. Biederman wrote: > > This is the third round of my changeset to split the user mode driver > code from the user mode helper code, and to make the code use common > facilities to get things done instead of recreating them just > for the user mode driver code. I won't test this version, for you are ignoring my comments. ^ permalink raw reply [flat|nested] 72+ messages in thread
* [merged][PATCH v3 00/16] Make the user mode driver code a better citizen 2020-07-02 16:40 ` [PATCH v3 00/16] " Eric W. Biederman ` (16 preceding siblings ...) 2020-07-02 23:51 ` [PATCH v3 00/16] Make the user mode driver code a better citizen Tetsuo Handa @ 2020-07-09 22:05 ` Eric W. Biederman 2020-07-14 19:42 ` Alexei Starovoitov 17 siblings, 1 reply; 72+ messages in thread From: Eric W. Biederman @ 2020-07-09 22:05 UTC (permalink / raw) To: linux-kernel Cc: David Miller, Greg Kroah-Hartman, Tetsuo Handa, Alexei Starovoitov, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds, Christian Brauner I have merged all of this into my exec-next tree. The code is also available on the frozen branch: git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git usermode-driver-cleanup The range-diff from the last posted version is below. I was asked "Is there a simpler version of this code that could be used for backports?". The honest answer is not really. Fundamentally do_execve_file as it existed prior to this set of changes breaks a lot of invariants in exec. The choices are either track down all of the invariants it violates and fix it, or reorganize the code so that do_execve_file is unnecessary. Reorganizing the code was the path I found simplest and most reliable. I don't think anyone has tracked down all of the constraints the code violated. There is an issue clearly pointed out by Tetsuo Handa that in theory if there is too long of a delay between closing the file after writing it and flush_delayed_fput might not synchronize the file synchronously. I can not trigger it, and this is the same code path the initramfs relies upon. So I think calling flush_delayed_fput is good enough for this set of changes. If and when a generally accepted way to remove the theoreticaly race it will be trivial to fix flush_delayed_fput or replace it and none of the other logic changes. Declaring this set of changes done now, allows the work that depends upon this change to proceed. Eric --- 1: 8fee10be3e7e ! 1: 5fec25f2cb95 umh: Capture the pid in umh_pipe_setup @@ Commit message v1: https://lkml.kernel.org/r/87h7uygf9i.fsf_-_@x220.int.ebiederm.org v2: https://lkml.kernel.org/r/875zb97iix.fsf_-_@x220.int.ebiederm.org + Link: https://lkml.kernel.org/r/20200702164140.4468-1-ebiederm@xmission.com Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> + Acked-by: Alexei Starovoitov <ast@kernel.org> + Tested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> ## include/linux/umh.h ## 2: 2d97bc5269dd ! 2: b044fa2ae50d umh: Move setting PF_UMH into umh_pipe_setup @@ Commit message v1: https://lkml.kernel.org/r/87bll6gf8t.fsf_-_@x220.int.ebiederm.org v2: https://lkml.kernel.org/r/87zh8l63xs.fsf_-_@x220.int.ebiederm.org + Link: https://lkml.kernel.org/r/20200702164140.4468-2-ebiederm@xmission.com Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> + Acked-by: Alexei Starovoitov <ast@kernel.org> + Tested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> ## kernel/umh.c ## 3: 974e2b827aca ! 3: 3a171042aeab umh: Rename the user mode driver helpers for clarity @@ Commit message v1: https://lkml.kernel.org/r/875zbegf82.fsf_-_@x220.int.ebiederm.org v2: https://lkml.kernel.org/r/87tuyt63x3.fsf_-_@x220.int.ebiederm.org + Link: https://lkml.kernel.org/r/20200702164140.4468-3-ebiederm@xmission.com Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> + Acked-by: Alexei Starovoitov <ast@kernel.org> + Tested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> ## kernel/umh.c ## 4: 6c8f72f8eb49 ! 4: 21d598280675 umh: Remove call_usermodehelper_setup_file. @@ Commit message v1: https://lkml.kernel.org/r/87zh8qf0mp.fsf_-_@x220.int.ebiederm.org v2: https://lkml.kernel.org/r/87o8p163u1.fsf_-_@x220.int.ebiederm.org + Link: https://lkml.kernel.org/r/20200702164140.4468-4-ebiederm@xmission.com Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> + Acked-by: Alexei Starovoitov <ast@kernel.org> + Tested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> ## include/linux/umh.h ## 5: cbf6c2b5a04a ! 5: 884c5e683b67 umh: Separate the user mode driver and the user mode helper support @@ Commit message v1: https://lkml.kernel.org/r/87tuyyf0ln.fsf_-_@x220.int.ebiederm.org v2: https://lkml.kernel.org/r/87imf963s6.fsf_-_@x220.int.ebiederm.org + Link: https://lkml.kernel.org/r/20200702164140.4468-5-ebiederm@xmission.com Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> + Acked-by: Alexei Starovoitov <ast@kernel.org> + Tested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> ## include/linux/bpfilter.h ## 6: b68617fd4ee3 ! 6: 74be2d3b80af umd: For clarity rename umh_info umd_info @@ Commit message v1: https://lkml.kernel.org/r/87o8p6f0kw.fsf_-_@x220.int.ebiederm.org v2: https://lkml.kernel.org/r/878sg563po.fsf_-_@x220.int.ebiederm.org + Link: https://lkml.kernel.org/r/20200702164140.4468-6-ebiederm@xmission.com Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> + Acked-by: Alexei Starovoitov <ast@kernel.org> + Tested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> ## include/linux/bpfilter.h ## 7: 6881acff5f6a ! 7: 1199c6c3da51 umd: Rename umd_info.cmdline umd_info.driver_name @@ Commit message v1: https://lkml.kernel.org/r/87imfef0k3.fsf_-_@x220.int.ebiederm.org v2: https://lkml.kernel.org/r/87366d63os.fsf_-_@x220.int.ebiederm.org + Link: https://lkml.kernel.org/r/20200702164140.4468-7-ebiederm@xmission.com Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> + Acked-by: Alexei Starovoitov <ast@kernel.org> + Tested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> ## include/linux/usermode_driver.h ## 8: cd210622ff6f ! 8: e2dc9bf3f527 umd: Transform fork_usermode_blob into fork_usermode_driver @@ Commit message [1] https://lore.kernel.org/linux-fsdevel/2a8775b4-1dd5-9d5c-aa42-9872445e0942@i-love.sakura.ne.jp/ v1: https://lkml.kernel.org/r/87d05mf0j9.fsf_-_@x220.int.ebiederm.org v2: https://lkml.kernel.org/r/87wo3p4p35.fsf_-_@x220.int.ebiederm.org + Link: https://lkml.kernel.org/r/20200702164140.4468-8-ebiederm@xmission.com Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> + Acked-by: Alexei Starovoitov <ast@kernel.org> + Tested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> ## include/linux/usermode_driver.h ## 9: 74d65aaf2cab ! 9: 55e6074e3fa6 umh: Stop calling do_execve_file @@ Commit message v1: https://lkml.kernel.org/r/877dvuf0i7.fsf_-_@x220.int.ebiederm.org v2: https://lkml.kernel.org/r/87r1tx4p2a.fsf_-_@x220.int.ebiederm.org + Link: https://lkml.kernel.org/r/20200702164140.4468-9-ebiederm@xmission.com Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> + Acked-by: Alexei Starovoitov <ast@kernel.org> + Tested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> ## include/linux/umh.h ## 10: 58a9854274a1 ! 10: 25cf336de51b exec: Remove do_execve_file @@ Commit message [1] https://lore.kernel.org/linux-fsdevel/2a8775b4-1dd5-9d5c-aa42-9872445e0942@i-love.sakura.ne.jp/ v1: https://lkml.kernel.org/r/871rm2f0hi.fsf_-_@x220.int.ebiederm.org v2: https://lkml.kernel.org/r/87lfk54p0m.fsf_-_@x220.int.ebiederm.org + Link: https://lkml.kernel.org/r/20200702164140.4468-10-ebiederm@xmission.com Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> + Acked-by: Alexei Starovoitov <ast@kernel.org> + Tested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> ## fs/exec.c ## 11: c45ae16a18c9 ! 11: 0fe3c63148ef bpfilter: Move bpfilter_umh back into init data @@ Commit message v1: https://lkml.kernel.org/r/87sgeidlvq.fsf_-_@x220.int.ebiederm.org v2: https://lkml.kernel.org/r/87ftad4ozc.fsf_-_@x220.int.ebiederm.org + Link: https://lkml.kernel.org/r/20200702164140.4468-11-ebiederm@xmission.com Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> + Acked-by: Alexei Starovoitov <ast@kernel.org> + Tested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> ## net/bpfilter/bpfilter_umh_blob.S ## 12: 43b41b9d52a0 ! 12: 1c340ead18ee umd: Track user space drivers with struct pid @@ Commit message v1: https://lkml.kernel.org/r/87mu4qdlv2.fsf_-_@x220.int.ebiederm.org v2: https://lkml.kernel.org/r/a70l4oy8.fsf_-_@x220.int.ebiederm.org + Link: https://lkml.kernel.org/r/20200702164140.4468-12-ebiederm@xmission.com Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> + Acked-by: Alexei Starovoitov <ast@kernel.org> + Tested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> ## include/linux/usermode_driver.h ## 13: 653476c24a30 ! 13: 38fd525a4c61 exit: Factor thread_group_exited out of pidfd_poll @@ Metadata ## Commit message ## exit: Factor thread_group_exited out of pidfd_poll - Create an independent helper thread_group_exited report return true + Create an independent helper thread_group_exited which returns true when all threads have passed exit_notify in do_exit. AKA all of the threads are at least zombies and might be dead or completely gone. - Create this helper by taking the logic out of pidfd_poll where - it is already tested, and adding a missing READ_ONCE on - the read of task->exit_state. + Create this helper by taking the logic out of pidfd_poll where it is + already tested, and adding a READ_ONCE on the read of + task->exit_state. I will be changing the user mode driver code to use this same logic to know when a user mode driver needs to be restarted. @@ Commit message Place the new helper thread_group_exited in kernel/exit.c and EXPORT it so it can be used by modules. + Link: https://lkml.kernel.org/r/20200702164140.4468-13-ebiederm@xmission.com + Acked-by: Christian Brauner <christian.brauner@ubuntu.com> + Acked-by: Alexei Starovoitov <ast@kernel.org> + Tested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> ## include/linux/sched/signal.h ## @@ kernel/exit.c: COMPAT_SYSCALL_DEFINE5(waitid, + * thread_group_exited - check that a thread group has exited + * @pid: tgid of thread group to be checked. + * -+ * Test if thread group is has exited (all threads are zombies, dead -+ * or completely gone). ++ * Test if the thread group represented by tgid has exited (all ++ * threads are zombies, dead or completely gone). + * + * Return: true if the thread group has exited. false otherwise. + */ 14: 7ad037d12723 ! 14: e80eb1dc868b bpfilter: Take advantage of the facilities of struct pid @@ Commit message v1: https://lkml.kernel.org/r/87h7uydlu9.fsf_-_@x220.int.ebiederm.org v2: https://lkml.kernel.org/r/874kqt4owu.fsf_-_@x220.int.ebiederm.org + Link: https://lkml.kernel.org/r/20200702164140.4468-14-ebiederm@xmission.com Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> + Acked-by: Alexei Starovoitov <ast@kernel.org> + Tested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> ## include/linux/bpfilter.h ## 15: e50cf5e57a62 ! 15: 8c2f52663973 umd: Remove exit_umh @@ Commit message v1: https://lkml.kernel.org/r/87bll6dlte.fsf_-_@x220.int.ebiederm.org v2: https://lkml.kernel.org/r/87y2o53abg.fsf_-_@x220.int.ebiederm.org + Link: https://lkml.kernel.org/r/20200702164140.4468-15-ebiederm@xmission.com Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> + Acked-by: Alexei Starovoitov <ast@kernel.org> + Tested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> ## include/linux/sched.h ## 16: 32e057d8aa4a ! 16: 33c326014fe6 umd: Stop using split_argv @@ Commit message call_usermodehelper_setup. v1: https://lkml.kernel.org/r/87sged3a9n.fsf_-_@x220.int.ebiederm.org + Link: https://lkml.kernel.org/r/20200702164140.4468-16-ebiederm@xmission.com + Acked-by: Alexei Starovoitov <ast@kernel.org> + Tested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> ## kernel/usermode_driver.c ## ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [merged][PATCH v3 00/16] Make the user mode driver code a better citizen 2020-07-09 22:05 ` [merged][PATCH " Eric W. Biederman @ 2020-07-14 19:42 ` Alexei Starovoitov 0 siblings, 0 replies; 72+ messages in thread From: Alexei Starovoitov @ 2020-07-14 19:42 UTC (permalink / raw) To: Eric W. Biederman Cc: linux-kernel, David Miller, Greg Kroah-Hartman, Tetsuo Handa, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Luis Chamberlain, Linus Torvalds, Christian Brauner On Thu, Jul 09, 2020 at 05:05:09PM -0500, Eric W. Biederman wrote: > > I have merged all of this into my exec-next tree. > > The code is also available on the frozen branch: > > git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git usermode-driver-cleanup > > Declaring this set of changes done now, allows the work that depends > upon this change to proceed. Now I've pulled it into bpf-next as well. In the mean time there were changes to kernel_write that broke bpfilter.ko I fixed it up as well. Thanks. ^ permalink raw reply [flat|nested] 72+ messages in thread
* Re: [PATCH v2 00/15] Make the user mode driver code a better citizen 2020-06-29 19:55 ` [PATCH v2 00/15] Make the user mode driver code a better citizen Eric W. Biederman ` (16 preceding siblings ...) 2020-07-02 16:40 ` [PATCH v3 00/16] " Eric W. Biederman @ 2020-07-08 5:20 ` Luis Chamberlain 17 siblings, 0 replies; 72+ messages in thread From: Luis Chamberlain @ 2020-07-08 5:20 UTC (permalink / raw) To: Eric W. Biederman Cc: linux-kernel, David Miller, Greg Kroah-Hartman, Tetsuo Handa, Alexei Starovoitov, Kees Cook, Andrew Morton, Alexei Starovoitov, Al Viro, bpf, linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele, LSM List, Casey Schaufler, Linus Torvalds On Mon, Jun 29, 2020 at 02:55:05PM -0500, Eric W. Biederman wrote: > > I have tested thes changes by booting with the code compiled in and > by killing "bpfilter_umh" and running iptables -vnL to restart > the userspace driver. > > I have compiled tested each change with and without CONFIG_BPFILTER > enabled. Sounds like grounds for a selftests driver and respective selftest? And if so, has the other issues Tetsuo reported be hacked into one? Luis ^ permalink raw reply [flat|nested] 72+ messages in thread
end of thread, other threads:[~2020-07-14 19:42 UTC | newest] Thread overview: 72+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <20200625095725.GA3303921@kroah.com> [not found] ` <778297d2-512a-8361-cf05-42d9379e6977@i-love.sakura.ne.jp> [not found] ` <20200625120725.GA3493334@kroah.com> [not found] ` <20200625.123437.2219826613137938086.davem@davemloft.net> [not found] ` <CAHk-=whuTwGHEPjvtbBvneHHXeqJC=q5S09mbPnqb=Q+MSPMag@mail.gmail.com> [not found] ` <87pn9mgfc2.fsf_-_@x220.int.ebiederm.org> [not found] ` <87y2oac50p.fsf@x220.int.ebiederm.org> 2020-06-29 19:55 ` [PATCH v2 00/15] Make the user mode driver code a better citizen Eric W. Biederman 2020-06-29 19:56 ` [PATCH v2 01/15] umh: Capture the pid in umh_pipe_setup Eric W. Biederman 2020-06-29 19:57 ` [PATCH v2 02/15] umh: Move setting PF_UMH into umh_pipe_setup Eric W. Biederman 2020-06-29 19:57 ` [PATCH v2 03/15] umh: Rename the user mode driver helpers for clarity Eric W. Biederman 2020-06-29 19:59 ` [PATCH v2 04/15] umh: Remove call_usermodehelper_setup_file Eric W. Biederman 2020-06-29 20:00 ` [PATCH v2 05/15] umh: Separate the user mode driver and the user mode helper support Eric W. Biederman 2020-06-30 16:58 ` Linus Torvalds 2020-07-01 17:18 ` Eric W. Biederman 2020-07-01 17:42 ` Alexei Starovoitov 2020-06-29 20:01 ` [PATCH v2 06/15] umd: For clarity rename umh_info umd_info Eric W. Biederman 2020-06-29 20:02 ` [PATCH v2 07/15] umd: Rename umd_info.cmdline umd_info.driver_name Eric W. Biederman 2020-06-29 20:03 ` [PATCH v2 08/15] umd: Transform fork_usermode_blob into fork_usermode_driver Eric W. Biederman 2020-06-29 20:03 ` [PATCH v2 09/15] umh: Stop calling do_execve_file Eric W. Biederman 2020-06-29 20:04 ` [PATCH v2 10/15] exec: Remove do_execve_file Eric W. Biederman 2020-06-30 5:43 ` Christoph Hellwig 2020-06-30 12:14 ` Eric W. Biederman 2020-06-30 13:38 ` Christoph Hellwig 2020-06-30 14:28 ` Eric W. Biederman 2020-06-30 16:55 ` Alexei Starovoitov 2020-06-29 20:05 ` [PATCH v2 11/15] bpfilter: Move bpfilter_umh back into init data Eric W. Biederman 2020-06-29 20:06 ` [PATCH v2 12/15] umd: Track user space drivers with struct pid Eric W. Biederman 2020-06-29 20:06 ` [PATCH v2 13/15] bpfilter: Take advantage of the facilities of " Eric W. Biederman 2020-06-29 20:07 ` [PATCH v2 14/15] umd: Remove exit_umh Eric W. Biederman 2020-06-29 20:08 ` [PATCH v2 15/15] umd: Stop using split_argv Eric W. Biederman 2020-06-29 22:12 ` [PATCH v2 00/15] Make the user mode driver code a better citizen Alexei Starovoitov 2020-06-30 1:13 ` Eric W. Biederman 2020-06-30 6:16 ` Tetsuo Handa 2020-06-30 12:29 ` Eric W. Biederman 2020-06-30 13:21 ` Tetsuo Handa 2020-07-02 13:08 ` Eric W. Biederman 2020-07-02 13:40 ` Tetsuo Handa 2020-07-02 16:02 ` Eric W. Biederman 2020-07-03 13:19 ` Tetsuo Handa 2020-07-03 22:25 ` Eric W. Biederman 2020-07-04 6:57 ` Tetsuo Handa 2020-07-08 4:46 ` Eric W. Biederman 2020-06-30 16:52 ` Alexei Starovoitov 2020-07-01 17:12 ` Eric W. Biederman 2020-07-02 16:40 ` [PATCH v3 00/16] " Eric W. Biederman 2020-07-02 16:41 ` [PATCH v3 01/16] umh: Capture the pid in umh_pipe_setup Eric W. Biederman 2020-07-02 16:41 ` [PATCH v3 02/16] umh: Move setting PF_UMH into umh_pipe_setup Eric W. Biederman 2020-07-02 16:41 ` [PATCH v3 03/16] umh: Rename the user mode driver helpers for clarity Eric W. Biederman 2020-07-02 16:41 ` [PATCH v3 04/16] umh: Remove call_usermodehelper_setup_file Eric W. Biederman 2020-07-02 16:41 ` [PATCH v3 05/16] umh: Separate the user mode driver and the user mode helper support Eric W. Biederman 2020-07-02 16:41 ` [PATCH v3 06/16] umd: For clarity rename umh_info umd_info Eric W. Biederman 2020-07-02 16:41 ` [PATCH v3 07/16] umd: Rename umd_info.cmdline umd_info.driver_name Eric W. Biederman 2020-07-02 16:41 ` [PATCH v3 08/16] umd: Transform fork_usermode_blob into fork_usermode_driver Eric W. Biederman 2020-07-02 16:41 ` [PATCH v3 09/16] umh: Stop calling do_execve_file Eric W. Biederman 2020-07-02 16:41 ` [PATCH v3 10/16] exec: Remove do_execve_file Eric W. Biederman 2020-07-08 6:35 ` Luis Chamberlain 2020-07-08 12:41 ` Luis Chamberlain 2020-07-08 13:08 ` Eric W. Biederman 2020-07-08 13:32 ` Luis Chamberlain 2020-07-12 21:02 ` Pavel Machek 2020-07-02 16:41 ` [PATCH v3 11/16] bpfilter: Move bpfilter_umh back into init data Eric W. Biederman 2020-07-02 16:41 ` [PATCH v3 12/16] umd: Track user space drivers with struct pid Eric W. Biederman 2020-07-02 16:41 ` [PATCH v3 13/16] exit: Factor thread_group_exited out of pidfd_poll Eric W. Biederman 2020-07-03 20:30 ` Alexei Starovoitov 2020-07-03 21:37 ` Eric W. Biederman 2020-07-04 0:03 ` Alexei Starovoitov 2020-07-04 15:50 ` Christian Brauner 2020-07-07 17:09 ` Eric W. Biederman 2020-07-08 0:05 ` Daniel Borkmann 2020-07-08 3:50 ` Eric W. Biederman 2020-07-04 16:00 ` Christian Brauner 2020-07-02 16:41 ` [PATCH v3 14/16] bpfilter: Take advantage of the facilities of struct pid Eric W. Biederman 2020-07-02 16:41 ` [PATCH v3 15/16] umd: Remove exit_umh Eric W. Biederman 2020-07-02 16:41 ` [PATCH v3 16/16] umd: Stop using split_argv Eric W. Biederman 2020-07-02 23:51 ` [PATCH v3 00/16] Make the user mode driver code a better citizen Tetsuo Handa 2020-07-09 22:05 ` [merged][PATCH " Eric W. Biederman 2020-07-14 19:42 ` Alexei Starovoitov 2020-07-08 5:20 ` [PATCH v2 00/15] " Luis Chamberlain
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).