It is hard to follow the control flow in exec.c as the code has evolved over time and something that used to work one way now works another. This set of changes attempts to address the worst of that, to remove unnecessary work and to make the code a little easier to follow. The one rough point in my changes is cap_bprm_set_creds propbably needs a new name as I have taken it out of security_bprm_set_creds but my imagination failed to come up with anything better. Eric W. Biederman (5): exec: Call cap_bprm_set_creds directly from prepare_binprm exec: Directly call security_bprm_set_creds from __do_execve_file exec: Remove recursion from search_binary_handler exec: Allow load_misc_binary to call prepare_binfmt unconditionally exec: Move the call of prepare_binprm into search_binary_handler arch/alpha/kernel/binfmt_loader.c | 5 +---- fs/binfmt_em86.c | 7 +----- fs/binfmt_misc.c | 22 +++--------------- fs/binfmt_script.c | 5 +---- fs/exec.c | 47 +++++++++++++++++++++------------------ include/linux/binfmts.h | 11 ++------- include/linux/security.h | 2 +- security/apparmor/domain.c | 3 --- security/commoncap.c | 1 - security/selinux/hooks.c | 2 -- security/smack/smack_lsm.c | 3 --- security/tomoyo/tomoyo.c | 6 ----- 12 files changed, 34 insertions(+), 80 deletions(-) --- I think this is correct set of changes that makes things better but please look things over/review this code if you have any expertise in anything I am touching. Thank you, Eric
The function cap_bprm_set_creds is the only instance of security_bprm_set_creds that does something for the primary executable file and for every interpreter the rest of the implementations of security_bprm_set_creds do something only for the primary executable file even if that file is a shell script. The function cap_bprm_set_creds is also special in that it is called even when CONFIG_SECURITY is unset. So calling cap_bprm_set_creds separately to make these two cases explicit, and allow future changes to take advantages of these differences to simplify the code. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- fs/exec.c | 4 ++++ include/linux/security.h | 2 +- security/commoncap.c | 1 - 3 files changed, 5 insertions(+), 2 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index b0620d5ebc66..765bfd51a546 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1641,6 +1641,10 @@ int prepare_binprm(struct linux_binprm *bprm) return retval; bprm->called_set_creds = 1; + retval = cap_bprm_set_creds(bprm); + if (retval) + return retval; + memset(bprm->buf, 0, BINPRM_BUF_SIZE); return kernel_read(bprm->file, bprm->buf, BINPRM_BUF_SIZE, &pos); } diff --git a/include/linux/security.h b/include/linux/security.h index a8d9310472df..c1aa1638429a 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -571,7 +571,7 @@ static inline int security_vm_enough_memory_mm(struct mm_struct *mm, long pages) static inline int security_bprm_set_creds(struct linux_binprm *bprm) { - return cap_bprm_set_creds(bprm); + return 0; } static inline int security_bprm_check(struct linux_binprm *bprm) diff --git a/security/commoncap.c b/security/commoncap.c index f4ee0ae106b2..3757988abe42 100644 --- a/security/commoncap.c +++ b/security/commoncap.c @@ -1346,7 +1346,6 @@ static struct security_hook_list capability_hooks[] __lsm_ro_after_init = { LSM_HOOK_INIT(ptrace_traceme, cap_ptrace_traceme), LSM_HOOK_INIT(capget, cap_capget), LSM_HOOK_INIT(capset, cap_capset), - LSM_HOOK_INIT(bprm_set_creds, cap_bprm_set_creds), LSM_HOOK_INIT(inode_need_killpriv, cap_inode_need_killpriv), LSM_HOOK_INIT(inode_killpriv, cap_inode_killpriv), LSM_HOOK_INIT(inode_getsecurity, cap_inode_getsecurity), -- 2.25.0
Now that security_bprm_set_creds is no longer responsible for calling cap_bprm_set_creds, security_bprm_set_creds only does something for the primary file that is being executed (not any interpreters it may have). Therefore call security_bprm_set_creds from __do_execve_file, instead of from prepare_binprm so that it is only called once, and remove the now unnecessary called_set_creds field of struct binprm. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- fs/exec.c | 11 +++++------ include/linux/binfmts.h | 6 ------ security/apparmor/domain.c | 3 --- security/selinux/hooks.c | 2 -- security/smack/smack_lsm.c | 3 --- security/tomoyo/tomoyo.c | 6 ------ 6 files changed, 5 insertions(+), 26 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index 765bfd51a546..635b5085050c 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1635,12 +1635,6 @@ int prepare_binprm(struct linux_binprm *bprm) bprm_fill_uid(bprm); - /* fill in binprm security blob */ - retval = security_bprm_set_creds(bprm); - if (retval) - return retval; - bprm->called_set_creds = 1; - retval = cap_bprm_set_creds(bprm); if (retval) return retval; @@ -1858,6 +1852,11 @@ static int __do_execve_file(int fd, struct filename *filename, if (retval < 0) goto out; + /* fill in binprm security blob */ + retval = security_bprm_set_creds(bprm); + if (retval) + goto out; + retval = prepare_binprm(bprm); if (retval < 0) goto out; diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h index 1b48e2154766..42f760acfc2c 100644 --- a/include/linux/binfmts.h +++ b/include/linux/binfmts.h @@ -26,12 +26,6 @@ struct linux_binprm { unsigned long p; /* current top of mem */ unsigned long argmin; /* rlimit marker for copy_strings() */ unsigned int - /* - * True after the bprm_set_creds hook has been called once - * (multiple calls can be made via prepare_binprm() for - * binfmt_script/misc). - */ - called_set_creds:1, /* * True if most recent call to the commoncaps bprm_set_creds * hook (due to multiple prepare_binprm() calls from the diff --git a/security/apparmor/domain.c b/security/apparmor/domain.c index 6ceb74e0f789..61b9181a9e1f 100644 --- a/security/apparmor/domain.c +++ b/security/apparmor/domain.c @@ -875,9 +875,6 @@ int apparmor_bprm_set_creds(struct linux_binprm *bprm) file_inode(bprm->file)->i_mode }; - if (bprm->called_set_creds) - return 0; - ctx = task_ctx(current); AA_BUG(!cred_label(bprm->cred)); AA_BUG(!ctx); diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index 0b4e32161b77..ff3e1be53da5 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -2297,8 +2297,6 @@ static int selinux_bprm_set_creds(struct linux_binprm *bprm) /* SELinux context only depends on initial program or script and not * the script interpreter */ - if (bprm->called_set_creds) - return 0; old_tsec = selinux_cred(current_cred()); new_tsec = selinux_cred(bprm->cred); diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c index 8c61d175e195..bd1967730fec 100644 --- a/security/smack/smack_lsm.c +++ b/security/smack/smack_lsm.c @@ -904,9 +904,6 @@ static int smack_bprm_set_creds(struct linux_binprm *bprm) struct superblock_smack *sbsp; int rc; - if (bprm->called_set_creds) - return 0; - isp = smack_inode(inode); if (isp->smk_task == NULL || isp->smk_task == bsp->smk_task) return 0; diff --git a/security/tomoyo/tomoyo.c b/security/tomoyo/tomoyo.c index 716c92ec941a..d965ce80a7fb 100644 --- a/security/tomoyo/tomoyo.c +++ b/security/tomoyo/tomoyo.c @@ -71,12 +71,6 @@ static void tomoyo_bprm_committed_creds(struct linux_binprm *bprm) */ static int tomoyo_bprm_set_creds(struct linux_binprm *bprm) { - /* - * Do only if this function is called for the first time of an execve - * operation. - */ - if (bprm->called_set_creds) - return 0; /* * Load policy if /sbin/tomoyo-init exists and /sbin/init is requested * for the first time. -- 2.25.0
Instead of recursing in search_binary_handler have the methods that would recurse return a positive value, and simply loop in exec_binprm. This is a trivial change as all of the methods that would recurse do so as effectively the last thing they do. Making this a trivial code change. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- arch/alpha/kernel/binfmt_loader.c | 2 +- fs/binfmt_em86.c | 2 +- fs/binfmt_misc.c | 5 +---- fs/binfmt_script.c | 2 +- fs/exec.c | 20 +++++++++----------- include/linux/binfmts.h | 2 -- 6 files changed, 13 insertions(+), 20 deletions(-) diff --git a/arch/alpha/kernel/binfmt_loader.c b/arch/alpha/kernel/binfmt_loader.c index a8d0d6e06526..a90c8b1d5498 100644 --- a/arch/alpha/kernel/binfmt_loader.c +++ b/arch/alpha/kernel/binfmt_loader.c @@ -38,7 +38,7 @@ static int load_binary(struct linux_binprm *bprm) retval = prepare_binprm(bprm); if (retval < 0) return retval; - return search_binary_handler(bprm); + return 1; /* Search for the interpreter */ } static struct linux_binfmt loader_format = { diff --git a/fs/binfmt_em86.c b/fs/binfmt_em86.c index 466497860c62..a9b9ac7f9bb0 100644 --- a/fs/binfmt_em86.c +++ b/fs/binfmt_em86.c @@ -95,7 +95,7 @@ static int load_em86(struct linux_binprm *bprm) if (retval < 0) return retval; - return search_binary_handler(bprm); + return 1; /* Search for the interpreter */ } static struct linux_binfmt em86_format = { diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c index cdb45829354d..127fae9c21ab 100644 --- a/fs/binfmt_misc.c +++ b/fs/binfmt_misc.c @@ -234,10 +234,7 @@ static int load_misc_binary(struct linux_binprm *bprm) if (retval < 0) goto error; - retval = search_binary_handler(bprm); - if (retval < 0) - goto error; - + retval = 1; /* Search for the interpreter */ ret: dput(fmt->dentry); return retval; diff --git a/fs/binfmt_script.c b/fs/binfmt_script.c index e9e6a6f4a35f..76a05696d376 100644 --- a/fs/binfmt_script.c +++ b/fs/binfmt_script.c @@ -146,7 +146,7 @@ static int load_script(struct linux_binprm *bprm) retval = prepare_binprm(bprm); if (retval < 0) return retval; - return search_binary_handler(bprm); + return 1; /* Search for the interpreter */ } static struct linux_binfmt script_format = { diff --git a/fs/exec.c b/fs/exec.c index 635b5085050c..8bbf5fa785a6 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1690,16 +1690,12 @@ EXPORT_SYMBOL(remove_arg_zero); /* * cycle the list of binary formats handler, until one recognizes the image */ -int search_binary_handler(struct linux_binprm *bprm) +static int search_binary_handler(struct linux_binprm *bprm) { bool need_retry = IS_ENABLED(CONFIG_MODULES); struct linux_binfmt *fmt; int retval; - /* This allows 4 levels of binfmt rewrites before failing hard. */ - if (bprm->recursion_depth > 5) - return -ELOOP; - retval = security_bprm_check(bprm); if (retval) return retval; @@ -1712,10 +1708,7 @@ int search_binary_handler(struct linux_binprm *bprm) continue; read_unlock(&binfmt_lock); - bprm->recursion_depth++; retval = fmt->load_binary(bprm); - bprm->recursion_depth--; - read_lock(&binfmt_lock); put_binfmt(fmt); if (bprm->point_of_no_return || !bprm->file || @@ -1738,12 +1731,11 @@ int search_binary_handler(struct linux_binprm *bprm) return retval; } -EXPORT_SYMBOL(search_binary_handler); static int exec_binprm(struct linux_binprm *bprm) { pid_t old_pid, old_vpid; - int ret; + int ret, depth = 0; /* Need to fetch pid before load_binary changes it */ old_pid = current->pid; @@ -1751,7 +1743,13 @@ static int exec_binprm(struct linux_binprm *bprm) old_vpid = task_pid_nr_ns(current, task_active_pid_ns(current->parent)); rcu_read_unlock(); - ret = search_binary_handler(bprm); + do { + depth++; + ret = search_binary_handler(bprm); + /* This allows 4 levels of binfmt rewrites before failing hard. */ + if ((ret > 0) && (depth > 5)) + ret = -ELOOP; + } while (ret > 0); if (ret >= 0) { audit_bprm(bprm); trace_sched_process_exec(current, old_pid, bprm); diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h index 42f760acfc2c..89f1135dcb75 100644 --- a/include/linux/binfmts.h +++ b/include/linux/binfmts.h @@ -47,7 +47,6 @@ struct linux_binprm { #ifdef __alpha__ unsigned int taso:1; #endif - unsigned int recursion_depth; /* only for search_binary_handler() */ struct file * file; struct cred *cred; /* new credentials */ int unsafe; /* how unsafe this exec is (mask of LSM_UNSAFE_*) */ @@ -118,7 +117,6 @@ extern void unregister_binfmt(struct linux_binfmt *); extern int prepare_binprm(struct linux_binprm *); extern int __must_check remove_arg_zero(struct linux_binprm *); -extern int search_binary_handler(struct linux_binprm *); extern int begin_new_exec(struct linux_binprm * bprm); extern void setup_new_exec(struct linux_binprm * bprm); extern void finalize_exec(struct linux_binprm *bprm); -- 2.25.0
Add a flag preserve_creds that binfmt_misc can set to prevent credentials from being updated. This allows binfmrt_misc to always call prepare_binfmt. Allowing the credential computation logic to be consolidated. Ref: c407c033de84 ("[PATCH] binfmt_misc: improve calculation of interpreter's credentials") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- fs/binfmt_misc.c | 15 +++------------ fs/exec.c | 14 +++++++++----- include/linux/binfmts.h | 2 ++ 3 files changed, 14 insertions(+), 17 deletions(-) diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c index 127fae9c21ab..16bfafd2671d 100644 --- a/fs/binfmt_misc.c +++ b/fs/binfmt_misc.c @@ -218,19 +218,10 @@ static int load_misc_binary(struct linux_binprm *bprm) goto error; bprm->file = interp_file; - if (fmt->flags & MISC_FMT_CREDENTIALS) { - loff_t pos = 0; - - /* - * No need to call prepare_binprm(), it's already been - * done. bprm->buf is stale, update from interp_file. - */ - memset(bprm->buf, 0, BINPRM_BUF_SIZE); - retval = kernel_read(bprm->file, bprm->buf, BINPRM_BUF_SIZE, - &pos); - } else - retval = prepare_binprm(bprm); + if (fmt->flags & MISC_FMT_CREDENTIALS) + bprm->preserve_creds = 1; + retval = prepare_binprm(bprm); if (retval < 0) goto error; diff --git a/fs/exec.c b/fs/exec.c index 8bbf5fa785a6..01dbeb025c46 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1630,14 +1630,18 @@ static void bprm_fill_uid(struct linux_binprm *bprm) */ int prepare_binprm(struct linux_binprm *bprm) { - int retval; loff_t pos = 0; - bprm_fill_uid(bprm); + if (!bprm->preserve_creds) { + int retval; - retval = cap_bprm_set_creds(bprm); - if (retval) - return retval; + bprm_fill_uid(bprm); + + retval = cap_bprm_set_creds(bprm); + if (retval) + return retval; + } + bprm->preserve_creds = 0; memset(bprm->buf, 0, BINPRM_BUF_SIZE); return kernel_read(bprm->file, bprm->buf, BINPRM_BUF_SIZE, &pos); diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h index 89f1135dcb75..cb016f001e7a 100644 --- a/include/linux/binfmts.h +++ b/include/linux/binfmts.h @@ -26,6 +26,8 @@ struct linux_binprm { unsigned long p; /* current top of mem */ unsigned long argmin; /* rlimit marker for copy_strings() */ unsigned int + /* Don't update the creds for an interpreter (see binfmt_misc) */ + preserve_creds:1, /* * True if most recent call to the commoncaps bprm_set_creds * hook (due to multiple prepare_binprm() calls from the -- 2.25.0
The code in prepare_binary_handler needs to be run every time search_binary_handler is called so move the call into search_binary_handler itself to make the code simpler and easier to understand. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- arch/alpha/kernel/binfmt_loader.c | 3 --- fs/binfmt_em86.c | 5 ----- fs/binfmt_misc.c | 4 ---- fs/binfmt_script.c | 3 --- fs/exec.c | 12 +++++------- include/linux/binfmts.h | 1 - 6 files changed, 5 insertions(+), 23 deletions(-) diff --git a/arch/alpha/kernel/binfmt_loader.c b/arch/alpha/kernel/binfmt_loader.c index a90c8b1d5498..ec7b26e4b81a 100644 --- a/arch/alpha/kernel/binfmt_loader.c +++ b/arch/alpha/kernel/binfmt_loader.c @@ -35,9 +35,6 @@ static int load_binary(struct linux_binprm *bprm) bprm->file = file; bprm->loader = loader; - retval = prepare_binprm(bprm); - if (retval < 0) - return retval; return 1; /* Search for the interpreter */ } diff --git a/fs/binfmt_em86.c b/fs/binfmt_em86.c index a9b9ac7f9bb0..2726bfb832b2 100644 --- a/fs/binfmt_em86.c +++ b/fs/binfmt_em86.c @@ -90,11 +90,6 @@ static int load_em86(struct linux_binprm *bprm) return PTR_ERR(file); bprm->file = file; - - retval = prepare_binprm(bprm); - if (retval < 0) - return retval; - return 1; /* Search for the interpreter */ } diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c index 16bfafd2671d..6b5e67eed65e 100644 --- a/fs/binfmt_misc.c +++ b/fs/binfmt_misc.c @@ -221,10 +221,6 @@ static int load_misc_binary(struct linux_binprm *bprm) if (fmt->flags & MISC_FMT_CREDENTIALS) bprm->preserve_creds = 1; - retval = prepare_binprm(bprm); - if (retval < 0) - goto error; - retval = 1; /* Search for the interpreter */ ret: dput(fmt->dentry); diff --git a/fs/binfmt_script.c b/fs/binfmt_script.c index 76a05696d376..ed4607c7095e 100644 --- a/fs/binfmt_script.c +++ b/fs/binfmt_script.c @@ -143,9 +143,6 @@ static int load_script(struct linux_binprm *bprm) return PTR_ERR(file); bprm->file = file; - retval = prepare_binprm(bprm); - if (retval < 0) - return retval; return 1; /* Search for the interpreter */ } diff --git a/fs/exec.c b/fs/exec.c index 01dbeb025c46..206f18120073 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1628,7 +1628,7 @@ static void bprm_fill_uid(struct linux_binprm *bprm) * * This may be called multiple times for binary chains (scripts for example). */ -int prepare_binprm(struct linux_binprm *bprm) +static int prepare_binprm(struct linux_binprm *bprm) { loff_t pos = 0; @@ -1647,8 +1647,6 @@ int prepare_binprm(struct linux_binprm *bprm) return kernel_read(bprm->file, bprm->buf, BINPRM_BUF_SIZE, &pos); } -EXPORT_SYMBOL(prepare_binprm); - /* * Arguments are '\0' separated strings found at the location bprm->p * points to; chop off the first by relocating brpm->p to right after @@ -1700,6 +1698,10 @@ static int search_binary_handler(struct linux_binprm *bprm) struct linux_binfmt *fmt; int retval; + retval = prepare_binprm(bprm); + if (retval < 0) + return retval; + retval = security_bprm_check(bprm); if (retval) return retval; @@ -1859,10 +1861,6 @@ static int __do_execve_file(int fd, struct filename *filename, if (retval) goto out; - retval = prepare_binprm(bprm); - if (retval < 0) - goto out; - retval = copy_strings_kernel(1, &bprm->filename, bprm); if (retval < 0) goto out; diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h index cb016f001e7a..0748afca40cb 100644 --- a/include/linux/binfmts.h +++ b/include/linux/binfmts.h @@ -117,7 +117,6 @@ static inline void insert_binfmt(struct linux_binfmt *fmt) extern void unregister_binfmt(struct linux_binfmt *); -extern int prepare_binprm(struct linux_binprm *); extern int __must_check remove_arg_zero(struct linux_binprm *); extern int begin_new_exec(struct linux_binprm * bprm); extern void setup_new_exec(struct linux_binprm * bprm); -- 2.25.0
On Sat, May 9, 2020 at 12:44 PM Eric W. Biederman <ebiederm@xmission.com> wrote: > > The function cap_bprm_set_creds is the only instance of > security_bprm_set_creds that does something for the primary executable > file and for every interpreter the rest of the implementations of > security_bprm_set_creds do something only for the primary executable > file even if that file is a shell script. Eric, can you please re-write that sentence as something that can be parsed and understood? I'm pretty sure that what you are talking about is the whole "called_set_creds" flag logic, where the logic is that some security layers only react to the first one, while the capability checks are done for every one. But there is no way to realize that from your description above. In fact, the description above is actively incorrect and misleading, since you say that "cap_bprm_set_creds is the only instance [..] that does something for the primary executable" I think that you mean to say that it does something for *every* instance of the executable, not just the primary one. > The function cap_bprm_set_creds is also special in that it is called > even when CONFIG_SECURITY is unset. > > So calling cap_bprm_set_creds separately to make these two cases explicit, > and allow future changes to take advantages of these differences > to simplify the code. I think you need to rename "security_bprm_set_creds()" too, to show what it does. Since it clearly no longer does that "bprm_set_creds()" from the common capabilities. In fact, I think it would probably be good to change the patch too, so that it is actually understandable what the heck the logic is. Instead of retval = security_bprm_set_creds(bprm); if (retval) return retval; bprm->called_set_creds = 1; retval = cap_bprm_set_creds(bprm); if (retval) return retval; which makes no sense at all when you read it, do this: /* Every instance of the executable gets called for capabilities */ retval = cap_bprm_set_creds(bprm); if (retval) return retval; /* Other security layers only want the primary executable */ if (!bprm->called_set_creds) { retval = security_primary_bprm_set_creds(bprm); if (retval) return retval; bprm->called_set_creds = 1; } which now actually describes what is going on. Then remove the 'called_set_creds' logic from the security layers, and rename those 'xyz_bprm_set_creds()' to be 'xyz_primary_bprm_set_creds()'. After that, and with a proper commit message that actually explains this _properly_, this looks like a cleanup. Because right now that patch description makes zero sense at all, and the patch itself results in this insane situation where "security_bprm_set_creds()" expressly doesn't call the basic "cap_bprm_set_creds()" at all, which just makes things very very confusing and the naming actively misleading. Linus
On Sat, May 9, 2020 at 12:44 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> Now that security_bprm_set_creds is no longer responsible for calling
> cap_bprm_set_creds, security_bprm_set_creds only does something for
> the primary file that is being executed (not any interpreters it may
> have). Therefore call security_bprm_set_creds from __do_execve_file,
> instead of from prepare_binprm so that it is only called once, and
> remove the now unnecessary called_set_creds field of struct binprm.
Ahh, good, this patch removes the 'called_set_creds' logic from the
security subsystems.
So it does half of what I asked for: please also just rename that
"security_bprm_set_creds()" to be "security_primary_bprm_set_creds()"
so that the change of semantics also shows up that way.
And so that there is no confusion about the fact that
"cap_bprm_set_creds()" has absolutely nothing to do with
"security_bprm_set_creds()" any more.
Linus
Linus Torvalds <torvalds@linux-foundation.org> writes:
> On Sat, May 9, 2020 at 12:44 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>>
>> Now that security_bprm_set_creds is no longer responsible for calling
>> cap_bprm_set_creds, security_bprm_set_creds only does something for
>> the primary file that is being executed (not any interpreters it may
>> have). Therefore call security_bprm_set_creds from __do_execve_file,
>> instead of from prepare_binprm so that it is only called once, and
>> remove the now unnecessary called_set_creds field of struct binprm.
>
> Ahh, good, this patch removes the 'called_set_creds' logic from the
> security subsystems.
>
> So it does half of what I asked for: please also just rename that
> "security_bprm_set_creds()" to be "security_primary_bprm_set_creds()"
> so that the change of semantics also shows up that way.
>
> And so that there is no confusion about the fact that
> "cap_bprm_set_creds()" has absolutely nothing to do with
> "security_bprm_set_creds()" any more.
I agree something needs to be renamed, to remove confusion.
I am off for a nap now, and tomorrow is Mother's day so I probably won't
be back to this seriously until Monday. But please disect these patches
and I will address any problems.
Eric
On Sat, May 9, 2020 at 12:45 PM Eric W. Biederman <ebiederm@xmission.com> wrote: > > Instead of recursing in search_binary_handler have the methods that > would recurse return a positive value, and simply loop in exec_binprm. > > This is a trivial change as all of the methods that would recurse do > so as effectively the last thing they do. Making this a trivial code > change. Looks good. I'd suggest doing that loop slightly differently: > - ret = search_binary_handler(bprm); > + do { > + depth++; > + ret = search_binary_handler(bprm); > + /* This allows 4 levels of binfmt rewrites before failing hard. */ > + if ((ret > 0) && (depth > 5)) > + ret = -ELOOP; > + } while (ret > 0); > if (ret >= 0) { That's really an odd way to write this. So honestly, if "ret < 0", then we can just return directly. So I think would make much more sense to do this loop something like for (depth = 0; depth < 5; depth++) { int ret; ret = search_binary_handler(bprm); if (ret < 0) return ret; /* Continue searching for the next binary handler? */ if (ret > 0) continue; /* Success! */ audit_bprm(bprm); trace_sched_process_exec(current, old_pid, bprm); ptrace_event(PTRACE_EVENT_EXEC, old_vpid); proc_exec_connector(current); return 0; } return -ELOOP; (if I read the logic of exec_binprm() right - I might have missed something). Linus
On Sat, May 9, 2020 at 1:15 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> I agree something needs to be renamed, to remove confusion.
Yeah, the alternative is to rename the capability version. I don't
care much which way it goes, although I do think it's best to call out
explicitly that the security hook functions get only the "primary"
executable brpm info.
Which is why I'd prefer to just rename all those low-level security
cases. It makes for a slightly bigger patch, but I think it makes for
better readability, and makes it explicit that that hook is literally
just for the primary executable, not for the interpreter or whatever.
Linus
On 2020/05/10 4:41, Eric W. Biederman wrote:
> --- a/fs/binfmt_misc.c
> +++ b/fs/binfmt_misc.c
> @@ -234,10 +234,7 @@ static int load_misc_binary(struct linux_binprm *bprm)
> if (retval < 0)
> goto error;
>
> - retval = search_binary_handler(bprm);
> - if (retval < 0)
> - goto error;
> -
> + retval = 1; /* Search for the interpreter */
> ret:
> dput(fmt->dentry);
> return retval;
Wouldn't this change cause
if (fd_binary > 0)
ksys_close(fd_binary);
bprm->interp_flags = 0;
bprm->interp_data = 0;
not to be called when "Search for the interpreter" failed?
On Sat, May 9, 2020 at 9:30 PM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> Wouldn't this change cause
>
> if (fd_binary > 0)
> ksys_close(fd_binary);
> bprm->interp_flags = 0;
> bprm->interp_data = 0;
>
> not to be called when "Search for the interpreter" failed?
Good catch. We seem to have some subtle magic wrt the fd_binary file
descriptor, which depends on the recursive behavior.
I'm not seeing how to fix it cleanly with the "turn it into a loop".
Basically, that binfmt_misc use-case isn't really a tail-call.
Eric, ideas?
Linus
On Sat, May 09, 2020 at 02:41:17PM -0500, Eric W. Biederman wrote:
>
> Now that security_bprm_set_creds is no longer responsible for calling
> cap_bprm_set_creds, security_bprm_set_creds only does something for
> the primary file that is being executed (not any interpreters it may
> have). Therefore call security_bprm_set_creds from __do_execve_file,
> instead of from prepare_binprm so that it is only called once, and
> remove the now unnecessary called_set_creds field of struct binprm.
>
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
> fs/exec.c | 11 +++++------
> include/linux/binfmts.h | 6 ------
> security/apparmor/domain.c | 3 ---
> security/selinux/hooks.c | 2 --
> security/smack/smack_lsm.c | 3 ---
> security/tomoyo/tomoyo.c | 6 ------
> 6 files changed, 5 insertions(+), 26 deletions(-)
>
> diff --git a/fs/exec.c b/fs/exec.c
> index 765bfd51a546..635b5085050c 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1635,12 +1635,6 @@ int prepare_binprm(struct linux_binprm *bprm)
>
> bprm_fill_uid(bprm);
>
> - /* fill in binprm security blob */
> - retval = security_bprm_set_creds(bprm);
> - if (retval)
> - return retval;
> - bprm->called_set_creds = 1;
> -
> retval = cap_bprm_set_creds(bprm);
> if (retval)
> return retval;
> @@ -1858,6 +1852,11 @@ static int __do_execve_file(int fd, struct filename *filename,
> if (retval < 0)
> goto out;
>
> + /* fill in binprm security blob */
> + retval = security_bprm_set_creds(bprm);
> + if (retval)
> + goto out;
> +
> retval = prepare_binprm(bprm);
> if (retval < 0)
> goto out;
>
Here I go with a Sunday night review, so hopefully I'm thinking better
than Friday night's review, but I *think* this patch is broken from
the LSM sense of the world in that security_bprm_set_creds() is getting
called _before_ the creds actually get fully set (in prepare_binprm()
by the calls to bprm_fill_uid(), cap_bprm_set_creds(), and
check_unsafe_exec()).
As a specific example, see the setting of LSM_UNSAFE_NO_NEW_PRIVS in
bprm->unsafe during check_unsafe_exec(), which must happen after
bprm_fill_uid(bprm) and cap_bprm_set_creds(bprm), to have a "true" view
of the execution privileges. Apparmor checks for this flag in its
security_bprm_set_creds() hook. Similarly do selinux, smack, etc...
The security_bprm_set_creds() boundary for LSM is to see the "final"
state of the process privileges, and that needs to happen after
bprm_fill_uid(), cap_bprm_set_creds(), and check_unsafe_exec() have all
finished.
So, as it stands, I don't think this will work, but perhaps it can still
be rearranged to avoid the called_set_creds silliness. I'll look more
this week...
-Kees
--
Kees Cook
Linus Torvalds <torvalds@linux-foundation.org> writes: > On Sat, May 9, 2020 at 9:30 PM Tetsuo Handa > <penguin-kernel@i-love.sakura.ne.jp> wrote: >> >> Wouldn't this change cause >> >> if (fd_binary > 0) >> ksys_close(fd_binary); >> bprm->interp_flags = 0; >> bprm->interp_data = 0; >> >> not to be called when "Search for the interpreter" failed? > > Good catch. We seem to have some subtle magic wrt the fd_binary file > descriptor, which depends on the recursive behavior. Yes. I Tetsuo I really appreciate you noticing this. This is exactly the kind of behavior I am trying to flush out and keep from being hidden. > I'm not seeing how to fix it cleanly with the "turn it into a loop". > Basically, that binfmt_misc use-case isn't really a tail-call. I have reservations about installing a new file descriptor before we process the close on exec logic and the related security modules closing file descriptors that your new credentials no longer give you access to logic. I haven't yet figured out how opening a file descriptor during exec should fit into all of that. What I do see is that interp_data is just a parameter that is smuggled into the call of search binary handler. And the next binary handler needs to be binfmt_elf for it to make much sense, as only binfmt_elf (and binfmt_elf_fdpic) deals with BINPRM_FLAGS_EXECFD. So I think what needs to happen is to rename bprm->interp_data to bprm->execfd, remove BINPRM_FLAGS_EXECFD and make closing that file descriptor free_bprm's responsiblity. I hope such a change will make it easier to see all of the pieces that are intereacting during exec. I am still asking: is the installation of that file descriptor useful if it is not exported passed to userspace as an AT_EXECFD note? I will dig in and see what I can come up with. Eric
Kees Cook <keescook@chromium.org> writes: > On Sat, May 09, 2020 at 02:41:17PM -0500, Eric W. Biederman wrote: >> >> Now that security_bprm_set_creds is no longer responsible for calling >> cap_bprm_set_creds, security_bprm_set_creds only does something for >> the primary file that is being executed (not any interpreters it may >> have). Therefore call security_bprm_set_creds from __do_execve_file, >> instead of from prepare_binprm so that it is only called once, and >> remove the now unnecessary called_set_creds field of struct binprm. >> >> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> >> --- >> fs/exec.c | 11 +++++------ >> include/linux/binfmts.h | 6 ------ >> security/apparmor/domain.c | 3 --- >> security/selinux/hooks.c | 2 -- >> security/smack/smack_lsm.c | 3 --- >> security/tomoyo/tomoyo.c | 6 ------ >> 6 files changed, 5 insertions(+), 26 deletions(-) >> >> diff --git a/fs/exec.c b/fs/exec.c >> index 765bfd51a546..635b5085050c 100644 >> --- a/fs/exec.c >> +++ b/fs/exec.c >> @@ -1635,12 +1635,6 @@ int prepare_binprm(struct linux_binprm *bprm) >> >> bprm_fill_uid(bprm); >> >> - /* fill in binprm security blob */ >> - retval = security_bprm_set_creds(bprm); >> - if (retval) >> - return retval; >> - bprm->called_set_creds = 1; >> - >> retval = cap_bprm_set_creds(bprm); >> if (retval) >> return retval; >> @@ -1858,6 +1852,11 @@ static int __do_execve_file(int fd, struct filename *filename, >> if (retval < 0) >> goto out; >> >> + /* fill in binprm security blob */ >> + retval = security_bprm_set_creds(bprm); >> + if (retval) >> + goto out; >> + >> retval = prepare_binprm(bprm); >> if (retval < 0) >> goto out; >> > > Here I go with a Sunday night review, so hopefully I'm thinking better > than Friday night's review, but I *think* this patch is broken from > the LSM sense of the world in that security_bprm_set_creds() is getting > called _before_ the creds actually get fully set (in prepare_binprm() > by the calls to bprm_fill_uid(), cap_bprm_set_creds(), and > check_unsafe_exec()). > > As a specific example, see the setting of LSM_UNSAFE_NO_NEW_PRIVS in > bprm->unsafe during check_unsafe_exec(), which must happen after > bprm_fill_uid(bprm) and cap_bprm_set_creds(bprm), to have a "true" view > of the execution privileges. Apparmor checks for this flag in its > security_bprm_set_creds() hook. Similarly do selinux, smack, etc... I think you are getting prepare_binprm confused with prepare_bprm_creds. Understandable given the similarity of their names. > The security_bprm_set_creds() boundary for LSM is to see the "final" > state of the process privileges, and that needs to happen after > bprm_fill_uid(), cap_bprm_set_creds(), and check_unsafe_exec() have all > finished. > > So, as it stands, I don't think this will work, but perhaps it can still > be rearranged to avoid the called_set_creds silliness. I'll look more > this week... If you look at the flow of the code in __do_execve_file before this change it is: prepare_bprm_creds() check_unsafe_exec() ... prepare_binprm() bprm_file_uid() bprm->cred->euid = current_euid() bprm->cred->egid = current_egid() security_bprm_set_creds() for_each_lsm() lsm->bprm_set_creds() if (called_set_creds) return; ... bprm->called_set_creds = 1; ... exec_binprm() search_binary_handler() security_bprm_check() tomoyo_bprm_check_security() ima_bprm_check() load_script() prepare_binprm() /* called_set_creds already == 1 */ bprm_file_uid() security_bprm_set_creds() for_each_lsm() lsm->bprm_set_creds() if (called_set_creds) return; ... search_binary_handler() security_bprm_check_security() load_elf_binary() ... setup_new_exec ... Assuming you are executing a shell script. Now bprm_file_uid is written with the assumption that it will be called multiple times and it reinitializes all of it's variables each time. As you can see in above the implementations of bprm_set_creds() only really execute before called_set_creds is set, aka the first time. They in no way see the final state. Further when I looked as those hooks they were not looking at the values set by bprm_file_uid at all. There were busy with the values their they needed to set in that hook for their particular lsm. So while in theory I can see the danger of moving above bprm_file_uid I don't see anything in practice that would be a problem. Further by moving the call of security_bprm_set_creds out of prepare_binprm int __do_execve_file just before the call of prepare_binprm I am just moving the call above binprm_fill_uid and nothing else. So I think you just confused prepare_bprm_creds with prepare_binprm. As most of your criticisms appear valid in that case. Can you take a second look? Thank you, Eric
On 5/11/20 9:33 AM, Eric W. Biederman wrote:
> What I do see is that interp_data is just a parameter that is smuggled
> into the call of search binary handler. And the next binary handler
> needs to be binfmt_elf for it to make much sense, as only binfmt_elf
> (and binfmt_elf_fdpic) deals with BINPRM_FLAGS_EXECFD.
The binfmt_elf_fdpic driver is separate from binfmt_elf for the same reason
ext2/ext3/ext4 used to have 3 drivers: fdpic is really just binfmt_elf with the
4 main sections (text, data, bss, rodata) able to move independently of each
other (each tracked with its own base pointer).
It's kind of -fPIE on steroids, and various security people have sniffed at it
over the years to give ASLR more degrees of freedom on with-MMU systems. Many
moons ago Rich Felker proposed teaching the fdpic loader how to load normal ELF
binaries so there's just the one loader (there's a flag in the ELF header to say
whether the sections are independent or not).
Rob
On Mon, May 11, 2020 at 11:52:41AM -0500, Eric W. Biederman wrote: > Kees Cook <keescook@chromium.org> writes: > > > On Sat, May 09, 2020 at 02:41:17PM -0500, Eric W. Biederman wrote: > >> > >> Now that security_bprm_set_creds is no longer responsible for calling > >> cap_bprm_set_creds, security_bprm_set_creds only does something for > >> the primary file that is being executed (not any interpreters it may > >> have). Therefore call security_bprm_set_creds from __do_execve_file, > >> instead of from prepare_binprm so that it is only called once, and > >> remove the now unnecessary called_set_creds field of struct binprm. > >> > >> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> > >> --- > >> fs/exec.c | 11 +++++------ > >> include/linux/binfmts.h | 6 ------ > >> security/apparmor/domain.c | 3 --- > >> security/selinux/hooks.c | 2 -- > >> security/smack/smack_lsm.c | 3 --- > >> security/tomoyo/tomoyo.c | 6 ------ > >> 6 files changed, 5 insertions(+), 26 deletions(-) > >> > >> diff --git a/fs/exec.c b/fs/exec.c > >> index 765bfd51a546..635b5085050c 100644 > >> --- a/fs/exec.c > >> +++ b/fs/exec.c > >> @@ -1635,12 +1635,6 @@ int prepare_binprm(struct linux_binprm *bprm) > >> > >> bprm_fill_uid(bprm); > >> > >> - /* fill in binprm security blob */ > >> - retval = security_bprm_set_creds(bprm); > >> - if (retval) > >> - return retval; > >> - bprm->called_set_creds = 1; > >> - > >> retval = cap_bprm_set_creds(bprm); > >> if (retval) > >> return retval; > >> @@ -1858,6 +1852,11 @@ static int __do_execve_file(int fd, struct filename *filename, > >> if (retval < 0) > >> goto out; > >> > >> + /* fill in binprm security blob */ > >> + retval = security_bprm_set_creds(bprm); > >> + if (retval) > >> + goto out; > >> + > >> retval = prepare_binprm(bprm); > >> if (retval < 0) > >> goto out; > >> > > > > Here I go with a Sunday night review, so hopefully I'm thinking better > > than Friday night's review, but I *think* this patch is broken from > > the LSM sense of the world in that security_bprm_set_creds() is getting > > called _before_ the creds actually get fully set (in prepare_binprm() > > by the calls to bprm_fill_uid(), cap_bprm_set_creds(), and > > check_unsafe_exec()). > > > > As a specific example, see the setting of LSM_UNSAFE_NO_NEW_PRIVS in > > bprm->unsafe during check_unsafe_exec(), which must happen after > > bprm_fill_uid(bprm) and cap_bprm_set_creds(bprm), to have a "true" view > > of the execution privileges. Apparmor checks for this flag in its > > security_bprm_set_creds() hook. Similarly do selinux, smack, etc... > > I think you are getting prepare_binprm confused with prepare_bprm_creds. > Understandable given the similarity of their names. I fixated on a bad example, having confused myself about when check_unsafe_exec() happens. My original concern (with the bad example) was that the LSM is having security_bprm_set_creds() called before the new cred in bprm->cred has been initialized with all the correct uid/gid, caps, and associated flags. But anything associated with capabilities should be confined to the commoncap LSM, though there is "leakage" into the uid/gid states and some bprm state (more on this later). That said, as you also found, I can't find any LSM that examines those fields of the cred (I had stopped this research last night when I saw check_unsafe_exec() and confused myself); they're all looking at other bprm state not associated with caps and uid changes (file, unsafe_exec, security field of new cred, etc). So that's very good! That means we've actually kept a bright line between things here -- whew. > > The security_bprm_set_creds() boundary for LSM is to see the "final" > > state of the process privileges, and that needs to happen after > > bprm_fill_uid(), cap_bprm_set_creds(), and check_unsafe_exec() have all > > finished. > > > > So, as it stands, I don't think this will work, but perhaps it can still > > be rearranged to avoid the called_set_creds silliness. I'll look more > > this week... > > If you look at the flow of the code in __do_execve_file before this > change it is: > > prepare_bprm_creds() > check_unsafe_exec() > > ... > > prepare_binprm() > bprm_file_uid() (bprm_fill_uid(), but yes) > bprm->cred->euid = current_euid() > bprm->cred->egid = current_egid() > security_bprm_set_creds() > for_each_lsm() > lsm->bprm_set_creds() > if (called_set_creds) > return; > ... > bprm->called_set_creds = 1; > ... > > exec_binprm() > search_binary_handler() > security_bprm_check() > tomoyo_bprm_check_security() > ima_bprm_check() > load_script() > prepare_binprm() > /* called_set_creds already == 1 */ > bprm_file_uid() > security_bprm_set_creds() > for_each_lsm() > lsm->bprm_set_creds() > if (called_set_creds) > return; > ... > search_binary_handler() > security_bprm_check_security() > load_elf_binary() > ... > setup_new_exec > ... > > > Assuming you are executing a shell script. > > Now bprm_file_uid is written with the assumption that it will be called > multiple times and it reinitializes all of it's variables each time. Right -- and the same is true for cap_bprm_set_creds() (in that it needs to be run multiple times and depends on the work done in bprm_fill_uid()). If we encounter a future use-case for having other LSMs call out here multiple time, we can introduce a new LSM hook. > As you can see in above the implementations of bprm_set_creds() only > really execute before called_set_creds is set, aka the first time. > They in no way see the final state. > > Further when I looked as those hooks they were not looking at the values > set by bprm_file_uid at all. There were busy with the values their > they needed to set in that hook for their particular lsm. Agreed (though I'd love some other LSM eyes on this conclusion). > So while in theory I can see the danger of moving above bprm_file_uid > I don't see anything in practice that would be a problem. > > Further by moving the call of security_bprm_set_creds out of > prepare_binprm int __do_execve_file just before the call of > prepare_binprm I am just moving the call above binprm_fill_uid > and nothing else. > > So I think you just confused prepare_bprm_creds with prepare_binprm. > As most of your criticisms appear valid in that case. Can you take a > second look? So, in earlier attempts to clean up code near all this, I removed the LSM's bprm_secureexec hook, which only commoncap was using to impart details about privilege elevation. I switched the semantics to having LSMs set bprm->secureexec to true (but never to zero). Since commoncap's idea of "was I elevated?" might repeatedly change, I had to store its results "privately" in the bprm, which got us cap_elevated (in 46d98eb4e1d2): c425e189ffd7 ("binfmt: Introduce secureexec flag") 993b3ab0642e ("apparmor: Refactor to remove bprm_secureexec hook") 62874c3adf70 ("selinux: Refactor to remove bprm_secureexec hook") 46d98eb4e1d2 ("commoncap: Refactor to remove bprm_secureexec hook") ee67ae7ef6ff ("commoncap: Move cap_elevated calculation into bprm_set_creds") 2af622802696 ("LSM: drop bprm_secureexec hook") So, given the special-case nature of capabilities here, this does seem to be the right choice (assuming we're not missing something in the other LSMs). As such, I think the comment for cap_elevated needs to be updated to reflect the change to function call flow, and to specify it cannot be used by the other LSMs. Maybe something like: /* * True if most recent call to cap_bprm_set_creds() * (due to multiple prepare_binprm() calls from the * binfmt_script/misc handlers) resulted in elevated * privileges. This is used internally by fs/exec.c * to set bprm->secureexec. */ cap_elevated:1, And that brings us to naming. Whee. I think we should make the following name changes: bprm_fill_uid -> bprm_establish_privileges cap_bprm_set_creds -> cap_establish_privileges Finally, I think we should update the comment on bprm_set_creds (which, actually, I think is the correct name now) to something like: * @bprm_set_creds: * Save security information in the @bprm->cred->security field, * typically based on information about the bprm->file, for later * use during the @bprm_committing_creds hook. Specifically * the credentials themselves (uid, gid, etc), are not finalized * yet and must not be examined until the @bprm_committing_creds * hook. * This hook is called once, after the creds structure has been * allocated. * The hook must set @bprm->secureexec to 1 if a "secure exec" * has happened as a result of this hook call. The flag is used to * indicate the need for a sanitized execution environment, and is * also passed in the ELF auxiliary table on the initial stack to * indicate whether libc should enable secure mode. * This hook may also optionally check LSM-specific permissions * (e.g. for transitions between security domains). * @bprm contains the linux_binprm structure. * Return 0 if the hook is successful and permission is granted. -Kees -- Kees Cook
On Mon, May 11, 2020 at 09:33:21AM -0500, Eric W. Biederman wrote: > Linus Torvalds <torvalds@linux-foundation.org> writes: > > > On Sat, May 9, 2020 at 9:30 PM Tetsuo Handa > > <penguin-kernel@i-love.sakura.ne.jp> wrote: > >> > >> Wouldn't this change cause > >> > >> if (fd_binary > 0) > >> ksys_close(fd_binary); > >> bprm->interp_flags = 0; > >> bprm->interp_data = 0; > >> > >> not to be called when "Search for the interpreter" failed? > > > > Good catch. We seem to have some subtle magic wrt the fd_binary file > > descriptor, which depends on the recursive behavior. > > Yes. I Tetsuo I really appreciate you noticing this. This is exactly > the kind of behavior I am trying to flush out and keep from being > hidden. > > > I'm not seeing how to fix it cleanly with the "turn it into a loop". > > Basically, that binfmt_misc use-case isn't really a tail-call. > > I have reservations about installing a new file descriptor before > we process the close on exec logic and the related security modules > closing file descriptors that your new credentials no longer give > you access to logic. Hm, this does feel odd. In looking at this, it seems like this file never gets close-on-exec set, and doesn't have its flags changed from its original open: .open_flag = O_LARGEFILE | O_RDONLY | __FMODE_EXEC, only the UMH path through exec doesn't explicitly open a file by name from what I can see, so we'll only have these flags. > I haven't yet figured out how opening a file descriptor during exec > should fit into all of that. > > What I do see is that interp_data is just a parameter that is smuggled > into the call of search binary handler. And the next binary handler > needs to be binfmt_elf for it to make much sense, as only binfmt_elf > (and binfmt_elf_fdpic) deals with BINPRM_FLAGS_EXECFD. > > So I think what needs to happen is to rename bprm->interp_data to > bprm->execfd, remove BINPRM_FLAGS_EXECFD and make closing that file > descriptor free_bprm's responsiblity. Yeah, I would agree. As far as the close handling, I don't think there is a difference here: it interp_data was closed on the binfmt_misc.c error path, and in the new world it would be the exec error path -- both would be under the original credentials. > I hope such a change will make it easier to see all of the pieces that > are intereacting during exec. Right -- I'm not sure which piece should "consume" bprm->execfd though, which I think is what you're asking next... > I am still asking: is the installation of that file descriptor useful if > it is not exported passed to userspace as an AT_EXECFD note? > > I will dig in and see what I can come up with. Should binfmt_misc do the install, or can the consuming binfmt do it? i.e. when binfmt_elf sees bprm->execfd, does it perform the install instead? -- Kees Cook
On Sat, May 09, 2020 at 02:42:23PM -0500, Eric W. Biederman wrote: > > Add a flag preserve_creds that binfmt_misc can set to prevent > credentials from being updated. This allows binfmrt_misc to always > call prepare_binfmt. Allowing the credential computation logic to be > consolidated. > > Ref: c407c033de84 ("[PATCH] binfmt_misc: improve calculation of interpreter's credentials") > Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> > --- > fs/binfmt_misc.c | 15 +++------------ > fs/exec.c | 14 +++++++++----- > include/linux/binfmts.h | 2 ++ > 3 files changed, 14 insertions(+), 17 deletions(-) > > diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c > index 127fae9c21ab..16bfafd2671d 100644 > --- a/fs/binfmt_misc.c > +++ b/fs/binfmt_misc.c > @@ -218,19 +218,10 @@ static int load_misc_binary(struct linux_binprm *bprm) > goto error; > > bprm->file = interp_file; > - if (fmt->flags & MISC_FMT_CREDENTIALS) { > - loff_t pos = 0; > - > - /* > - * No need to call prepare_binprm(), it's already been > - * done. bprm->buf is stale, update from interp_file. > - */ > - memset(bprm->buf, 0, BINPRM_BUF_SIZE); > - retval = kernel_read(bprm->file, bprm->buf, BINPRM_BUF_SIZE, > - &pos); > - } else > - retval = prepare_binprm(bprm); > + if (fmt->flags & MISC_FMT_CREDENTIALS) > + bprm->preserve_creds = 1; > > + retval = prepare_binprm(bprm); > if (retval < 0) > goto error; > > diff --git a/fs/exec.c b/fs/exec.c > index 8bbf5fa785a6..01dbeb025c46 100644 > --- a/fs/exec.c > +++ b/fs/exec.c > @@ -1630,14 +1630,18 @@ static void bprm_fill_uid(struct linux_binprm *bprm) > */ > int prepare_binprm(struct linux_binprm *bprm) > { > - int retval; > loff_t pos = 0; > > - bprm_fill_uid(bprm); > + if (!bprm->preserve_creds) { nit: hint this to the common execution path: if (likely(!bprm->preserve_creds) { > + int retval; > > - retval = cap_bprm_set_creds(bprm); > - if (retval) > - return retval; > + bprm_fill_uid(bprm); > + > + retval = cap_bprm_set_creds(bprm); > + if (retval) > + return retval; > + } > + bprm->preserve_creds = 0; > > memset(bprm->buf, 0, BINPRM_BUF_SIZE); > return kernel_read(bprm->file, bprm->buf, BINPRM_BUF_SIZE, &pos); > diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h > index 89f1135dcb75..cb016f001e7a 100644 > --- a/include/linux/binfmts.h > +++ b/include/linux/binfmts.h > @@ -26,6 +26,8 @@ struct linux_binprm { > unsigned long p; /* current top of mem */ > unsigned long argmin; /* rlimit marker for copy_strings() */ > unsigned int > + /* Don't update the creds for an interpreter (see binfmt_misc) */ I'd like a much more verbose comment here. How about this: /* * Skip setting new privileges for an interpreter (see * binfmt_misc) on the next call to prepare_binprm(). */ > + preserve_creds:1, Nit pick: we've seen there is a logical difference here between "creds" (which mean "the creds struct itself") and "privileges" (which are stored in the cred struct). I think we should reinforce this distinction here and name this: preserve_privileges:1, > /* > * True if most recent call to the commoncaps bprm_set_creds > * hook (due to multiple prepare_binprm() calls from the > -- > 2.25.0 > Otherwise, yeah, this seems okay to me. -- Kees Cook
On Sat, May 09, 2020 at 02:42:52PM -0500, Eric W. Biederman wrote:
>
> The code in prepare_binary_handler needs to be run every time
> search_binary_handler is called so move the call into search_binary_handler
> itself to make the code simpler and easier to understand.
>
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Yes, nice. :) I don't see any ordering dependencies here. The only thing
I see is a potential for more "work done by kernel before bailing" in
the sense that the arg copying will be performed before we check the
kernel_read() result. I struggle to see how that might be a problem,
and this get us to fewer exec.c exports. Yay!
Reviewed-by: Kees Cook <keescook@chromium.org>
--
Kees Cook
Kees Cook <keescook@chromium.org> writes:
> On Mon, May 11, 2020 at 09:33:21AM -0500, Eric W. Biederman wrote:
>> Linus Torvalds <torvalds@linux-foundation.org> writes:
>>
>> > On Sat, May 9, 2020 at 9:30 PM Tetsuo Handa
>> > <penguin-kernel@i-love.sakura.ne.jp> wrote:
>> >>
>> >> Wouldn't this change cause
>> >>
>> >> if (fd_binary > 0)
>> >> ksys_close(fd_binary);
>> >> bprm->interp_flags = 0;
>> >> bprm->interp_data = 0;
>> >>
>> >> not to be called when "Search for the interpreter" failed?
>> >
>> > Good catch. We seem to have some subtle magic wrt the fd_binary file
>> > descriptor, which depends on the recursive behavior.
>>
>> Yes. I Tetsuo I really appreciate you noticing this. This is exactly
>> the kind of behavior I am trying to flush out and keep from being
>> hidden.
>>
>> > I'm not seeing how to fix it cleanly with the "turn it into a loop".
>> > Basically, that binfmt_misc use-case isn't really a tail-call.
>>
>> I have reservations about installing a new file descriptor before
>> we process the close on exec logic and the related security modules
>> closing file descriptors that your new credentials no longer give
>> you access to logic.
>
> Hm, this does feel odd. In looking at this, it seems like this file
> never gets close-on-exec set, and doesn't have its flags changed from
> its original open:
> .open_flag = O_LARGEFILE | O_RDONLY | __FMODE_EXEC,
> only the UMH path through exec doesn't explicitly open a file by name
> from what I can see, so we'll only have these flags.
>
>> I haven't yet figured out how opening a file descriptor during exec
>> should fit into all of that.
>>
>> What I do see is that interp_data is just a parameter that is smuggled
>> into the call of search binary handler. And the next binary handler
>> needs to be binfmt_elf for it to make much sense, as only binfmt_elf
>> (and binfmt_elf_fdpic) deals with BINPRM_FLAGS_EXECFD.
>>
>> So I think what needs to happen is to rename bprm->interp_data to
>> bprm->execfd, remove BINPRM_FLAGS_EXECFD and make closing that file
>> descriptor free_bprm's responsiblity.
>
> Yeah, I would agree. As far as the close handling, I don't think there
> is a difference here: it interp_data was closed on the binfmt_misc.c
> error path, and in the new world it would be the exec error path -- both
> would be under the original credentials.
>
>> I hope such a change will make it easier to see all of the pieces that
>> are intereacting during exec.
>
> Right -- I'm not sure which piece should "consume" bprm->execfd though,
> which I think is what you're asking next...
>
>> I am still asking: is the installation of that file descriptor useful if
>> it is not exported passed to userspace as an AT_EXECFD note?
>>
>> I will dig in and see what I can come up with.
>
> Should binfmt_misc do the install, or can the consuming binfmt do it?
> i.e. when binfmt_elf sees bprm->execfd, does it perform the install
> instead?
I am still thinking about this one, but here is where I am at. At a
practical level passing the file descriptor of the script to interpreter
seems like something we should encourage in the long term. It removes
races and it is cheaper because then the interpreter does not have to
turn around and open the script itself.
Strictly speaking binfmt_misc should not need to close the file
descriptor in binfmt_misc because we have already unshared the files
struct and reset_files_struct should handle restoring it.
Calling fd_install in binfmt_misc still seems wrong, as that exposes
the new file descriptor to user space with the old creds.
It is possible although unlikely for userspace to find the file
descriptor without consulting AT_EXECFD so just to be conservative I
think we should install the file descriptor in begin_new_exec even if
the next interpreter does not support AT_EXECFD.
I am still working on how to handle recursive binfmts but I suspect it
is just a matter of having an array of struct files in struct
linux_binprm.
Eric
On Tue, May 12, 2020 at 01:42:53PM -0500, Eric W. Biederman wrote: > Kees Cook <keescook@chromium.org> writes: > > Should binfmt_misc do the install, or can the consuming binfmt do it? > > i.e. when binfmt_elf sees bprm->execfd, does it perform the install > > instead? > > I am still thinking about this one, but here is where I am at. At a > practical level passing the file descriptor of the script to interpreter > seems like something we should encourage in the long term. It removes > races and it is cheaper because then the interpreter does not have to > turn around and open the script itself. Yeah, this does sounds pretty good, though I have concerns about doing it for a process that isn't expecting it. I've seen a lot of bad code make assumptions about initial fd numbers. :( > Strictly speaking binfmt_misc should not need to close the file > descriptor in binfmt_misc because we have already unshared the files > struct and reset_files_struct should handle restoring it. If I get what you mean, I agree. The error case is fine. > Calling fd_install in binfmt_misc still seems wrong, as that exposes > the new file descriptor to user space with the old creds. I haven't dug into the details here -- is there a real risk here? The old creds are what opened the file originally for the exec. Are you thinking about executable-but-not-readable files? > It is possible although unlikely for userspace to find the file > descriptor without consulting AT_EXECFD so just to be conservative I > think we should install the file descriptor in begin_new_exec even if > the next interpreter does not support AT_EXECFD. I think universally installing the fd needs to be a distinct patch -- it's going to have a lot of consequences, IMO. We can certainly deal with them, but I don't think it should be part of this clean-up series. > I am still working on how to handle recursive binfmts but I suspect it > is just a matter of having an array of struct files in struct > linux_binprm. If install is left if binfmt_misc, then the recursive problem goes away, yes? -- Kees Cook
Kees Cook <keescook@chromium.org> writes: > On Tue, May 12, 2020 at 01:42:53PM -0500, Eric W. Biederman wrote: >> Kees Cook <keescook@chromium.org> writes: >> > Should binfmt_misc do the install, or can the consuming binfmt do it? >> > i.e. when binfmt_elf sees bprm->execfd, does it perform the install >> > instead? >> >> I am still thinking about this one, but here is where I am at. At a >> practical level passing the file descriptor of the script to interpreter >> seems like something we should encourage in the long term. It removes >> races and it is cheaper because then the interpreter does not have to >> turn around and open the script itself. > > Yeah, this does sounds pretty good, though I have concerns about doing > it for a process that isn't expecting it. I've seen a lot of bad code > make assumptions about initial fd numbers. :( Yes. That is definitely a concern. >> Strictly speaking binfmt_misc should not need to close the file >> descriptor in binfmt_misc because we have already unshared the files >> struct and reset_files_struct should handle restoring it. > > If I get what you mean, I agree. The error case is fine. > >> Calling fd_install in binfmt_misc still seems wrong, as that exposes >> the new file descriptor to user space with the old creds. > > I haven't dug into the details here -- is there a real risk here? The > old creds are what opened the file originally for the exec. Are you > thinking about executable-but-not-readable files? I am thinking about looking in proc/<pid>/fd and maybe opening those files. That access is gated by ptrace_may_access which is gated by the process credentials. So I know strictly speaking it is wrong. I think you are correct that it would only allow access to a file that could be accessed another way. Even execveat at a quick glance appears to go through the orinary permission checks of open. The current code is definitely a maintenance pitfall as it install state into the process early. >> It is possible although unlikely for userspace to find the file >> descriptor without consulting AT_EXECFD so just to be conservative I >> think we should install the file descriptor in begin_new_exec even if >> the next interpreter does not support AT_EXECFD. > > I think universally installing the fd needs to be a distinct patch -- > it's going to have a lot of consequences, IMO. We can certainly deal > with them, but I don't think it should be part of this clean-up series. I meant generically installing the fd not universally installing it. >> I am still working on how to handle recursive binfmts but I suspect it >> is just a matter of having an array of struct files in struct >> linux_binprm. > > If install is left if binfmt_misc, then the recursive problem goes away, > yes? I don't think leaving the install in binfmt_misc is responsible at this point. Eric
On Tue, May 12, 2020 at 03:31:57PM -0500, Eric W. Biederman wrote:
> >> It is possible although unlikely for userspace to find the file
> >> descriptor without consulting AT_EXECFD so just to be conservative I
> >> think we should install the file descriptor in begin_new_exec even if
> >> the next interpreter does not support AT_EXECFD.
> >
> > I think universally installing the fd needs to be a distinct patch --
> > it's going to have a lot of consequences, IMO. We can certainly deal
> > with them, but I don't think it should be part of this clean-up series.
>
> I meant generically installing the fd not universally installing it.
>
> >> I am still working on how to handle recursive binfmts but I suspect it
> >> is just a matter of having an array of struct files in struct
> >> linux_binprm.
> >
> > If install is left if binfmt_misc, then the recursive problem goes away,
> > yes?
>
> I don't think leaving the install in binfmt_misc is responsible at this
> point.
I'm nearly certain the answer is "yes", but I wonder if we should stop
for a moment and ask "does anything still use MISC_FMT_OPEN_BINARY ? It
looks like either "O" or "C" binfmt_misc registration flag. My installed
binfmts on Ubuntu don't use them...
I'm currently pulling a list of all the packages in Debian than depend
on the binfmt-support package and checking their flags.
--
Kees Cook
On Tue, May 12, 2020 at 04:08:56PM -0700, Kees Cook wrote:
> I'm nearly certain the answer is "yes", but I wonder if we should stop
> for a moment and ask "does anything still use MISC_FMT_OPEN_BINARY ? It
> looks like either "O" or "C" binfmt_misc registration flag. My installed
> binfmts on Ubuntu don't use them...
>
> I'm currently pulling a list of all the packages in Debian than depend
> on the binfmt-support package and checking their flags.
So, binfmt-support in Debian doesn't in _support_ MISC_FMT_OPEN_BINARY
("O"):
credentials =
(binfmt->credentials && !strcmp (binfmt->credentials, "yes"))
? "C" : "";
preserve = (binfmt->preserve && !strcmp (binfmt->preserve, "yes"))
? "P" : "";
fix_binary =
(binfmt->fix_binary && !strcmp (binfmt->fix_binary, "yes"))
? "F" : "";
...
regstring = xasprintf (":%s:%c:%s:%s:%s:%s:%s%s%s\n",
name, type, binfmt->offset, binfmt->magic,
binfmt->mask, interpreter,
credentials, preserve, fix_binary);
However, "credentials" ("C") does imply MISC_FMT_OPEN_BINARY.
I looked at every Debian package using binfmt-support, and "only" qemu
uses "credential".
And now I wonder if qemu actually uses the resulting AT_EXECFD ...
--
Kees Cook
On Tue, May 12, 2020 at 04:47:14PM -0700, Kees Cook wrote:
> And now I wonder if qemu actually uses the resulting AT_EXECFD ...
It does, though I'm not sure if this is to support crossing mount points,
dropping privileges, or something else, since it does fall back to just
trying to open the file.
execfd = qemu_getauxval(AT_EXECFD);
if (execfd == 0) {
execfd = open(filename, O_RDONLY);
if (execfd < 0) {
printf("Error while loading %s: %s\n", filename, strerror(errno));
_exit(EXIT_FAILURE);
}
}
--
Kees Cook
On Tue, May 12, 2020 at 11:46 AM Eric W. Biederman <ebiederm@xmission.com> wrote: > > I am still thinking about this one, but here is where I am at. At a > practical level passing the file descriptor of the script to interpreter > seems like something we should encourage in the long term. It removes > races and it is cheaper because then the interpreter does not have to > turn around and open the script itself. Yeah, I think we should continue to support it, because I think it's the right thing to do (and we might just end up having compatibility issues if we don't). How about trying to move the logic to the common code, out of binfmt_misc? IOW, how about something very similar to your "brpm->preserve_creds" thing that you did for the credentials (also for binfmt_misc, which shouldn't surprise anybody: binfmt_misc is simply the "this is the generic thing for letting user mode do the final details"). > Calling fd_install in binfmt_misc still seems wrong, as that exposes > the new file descriptor to user space with the old creds. Right. And it really would be good to simply not have these kinds of very special cases inside the low-level binfmt code: I'd much rather have the special cases in the generic code, so that we see what the ordering is etc. One of the big problems with all these binfmt callbacks has been the fact that it makes it so hard to think about and change the generic code, because the low-level binfmt handlers all do their own special thing. So moving it to generic code would likely simplify things from that angle, even if the actual complexity of the feature itself remains. Besides, we really have exposed this to other code anyway thanks to that whole bprm->interp_data thing, and the AT_EXECFD AUX entries that we have. So it's not really "internal" to binfmt_misc _anyway_. So how about we just move the fd_binary logic to the generic execve code, and just binfmt_misc set the flag for "yes, please do this", exactly like "preserve_creds"? > It is possible although unlikely for userspace to find the file > descriptor without consulting AT_EXECFD so just to be conservative I > think we should install the file descriptor in begin_new_exec even if > the next interpreter does not support AT_EXECFD. Ack. I think the AT_EXECFD thing is a sign that this isn't internal to binfmt_misc, but it also shouldn't be gating this issue. In reality, ELF is the only real binary format that matters - the script/misc binfmts are just indirection entries - and it supports AT_EXECFD, so let's just ignore the theoretical case of "maybe nobody exposes it". So yes, just make it part of begin_new_exec(), and there's no reason to support more than a single fd. No stacks or arrays of these things required, I feel. It's not like AT_EXECFD supports the notion of multiple fd's being reported anyway, nor does it make any sense to have some kind of nested misc->misc binfmt nesting. So making that whole interp_data and fd_binary thing be a generic layer thing would make the search_binary_handler() code in binfmt_misc be a pure tailcall too, and then the conversion to a loop ends up working and being the right thing. No? Linus
On 5/12/20 7:20 PM, Linus Torvalds wrote: > On Tue, May 12, 2020 at 11:46 AM Eric W. Biederman > <ebiederm@xmission.com> wrote: >> >> I am still thinking about this one, but here is where I am at. At a >> practical level passing the file descriptor of the script to interpreter >> seems like something we should encourage in the long term. It removes >> races and it is cheaper because then the interpreter does not have to >> turn around and open the script itself. > > Yeah, I think we should continue to support it, because I think it's > the right thing to do (and we might just end up having compatibility > issues if we don't). ... >> It is possible although unlikely for userspace to find the file >> descriptor without consulting AT_EXECFD so just to be conservative I >> think we should install the file descriptor in begin_new_exec even if >> the next interpreter does not support AT_EXECFD. > > Ack. I think the AT_EXECFD thing is a sign that this isn't internal to > binfmt_misc, but it also shouldn't be gating this issue. In reality, > ELF is the only real binary format that matters - the script/misc > binfmts are just indirection entries - and it supports AT_EXECFD, so > let's just ignore the theoretical case of "maybe nobody exposes it". Would this potentially make the re-exec-yourself case easier to do at some point? (Which nommu needs to do, and /proc/self/exe isn't always available.) Here's the first time I asked about that: https://lore.kernel.org/lkml/200612261823.07927.rob@landley.net/ Here's the most recent: https://lkml.org/lkml/2017/9/5/246 Here's someone else asking and being basically told "chroot isn't a thing": http://lkml.iu.edu/hypermail/linux/kernel/0906.3/00584.html (See also "CVE-2019-5736" and the workarounds thereto.) Rob P.S. Yes I'm aware it would only work properly with static binaries. Not the first thing that's true for.
On Tue, May 12, 2020 at 7:32 PM Rob Landley <rob@landley.net> wrote:
>
> On 5/12/20 7:20 PM, Linus Torvalds wrote:
> > Ack. I think the AT_EXECFD thing is a sign that this isn't internal to
> > binfmt_misc, but it also shouldn't be gating this issue. In reality,
> > ELF is the only real binary format that matters - the script/misc
> > binfmts are just indirection entries - and it supports AT_EXECFD, so
> > let's just ignore the theoretical case of "maybe nobody exposes it".
>
> Would this potentially make the re-exec-yourself case easier to do at some
> point? (Which nommu needs to do, and /proc/self/exe isn't always available.)
AT_EXECFD may be an ELF thing, but normal ELF binaries don't do that
"we have a fd". So it only triggers for binfmt_misc (and only when the
flag is set for "I want the fd").
So no, this wouldn't help re-exec-yourself in general.
Although I guess we could add an ELF section note that does that whole
"executable fd" thing for other things too.
Everything is possible in theory..
Linus
Rob Landley <rob@landley.net> writes:
> On 5/11/20 9:33 AM, Eric W. Biederman wrote:
>> What I do see is that interp_data is just a parameter that is smuggled
>> into the call of search binary handler. And the next binary handler
>> needs to be binfmt_elf for it to make much sense, as only binfmt_elf
>> (and binfmt_elf_fdpic) deals with BINPRM_FLAGS_EXECFD.
>
> The binfmt_elf_fdpic driver is separate from binfmt_elf for the same reason
> ext2/ext3/ext4 used to have 3 drivers: fdpic is really just binfmt_elf with the
> 4 main sections (text, data, bss, rodata) able to move independently of each
> other (each tracked with its own base pointer).
>
> It's kind of -fPIE on steroids, and various security people have sniffed at it
> over the years to give ASLR more degrees of freedom on with-MMU systems. Many
> moons ago Rich Felker proposed teaching the fdpic loader how to load normal ELF
> binaries so there's just the one loader (there's a flag in the ELF header to say
> whether the sections are independent or not).
Careful with your terminology. ELF sections are for .o's For
executables ELF have segments. And reading through the code it is the
program segments that are independently relocatable.
There is a flag but it is defined per architecture and I don't think one
of the architectures define it.
I looked at ARM and apparently with an MMU ARM turns fdpic binaries into
PIE executables. I am not certain why.
The registers passed to the entry point are also different for both
cases.
I think it would have been nice if the fdpic support had used a
different ELF type, instead of a different depending on using a
different architecture.
All that aside the core dumping code looks to be essentially the same
between binfmt_elf.c and binfmt_elf_fdpic.c. Do you think people would
be interested in refactoring binfmt_elf.c and binfmt_elf_fdpic.c so that
they could share the same core dumping code?
Eric
Kees Cook <keescook@chromium.org> writes:
> On Tue, May 12, 2020 at 04:47:14PM -0700, Kees Cook wrote:
>> And now I wonder if qemu actually uses the resulting AT_EXECFD ...
>
> It does, though I'm not sure if this is to support crossing mount points,
> dropping privileges, or something else, since it does fall back to just
> trying to open the file.
>
> execfd = qemu_getauxval(AT_EXECFD);
> if (execfd == 0) {
> execfd = open(filename, O_RDONLY);
> if (execfd < 0) {
> printf("Error while loading %s: %s\n", filename, strerror(errno));
> _exit(EXIT_FAILURE);
> }
> }
My hunch is that the fallback exists from a time when the kernel did not
implement AT_EXECFD, or so that qemu can run on kernels that don't
implement AT_EXECFD. It doesn't really matter unless the executable is
suid, or otherwise changes privileges.
I looked into this a bit to remind myself why exec works the way it
works, with changing privileges.
The classic attack is pointing a symlink at a #! script that is suid or
otherwise changes privileges. The kernel will open the script and set
the privileges, read the interpreter from the first line, and proceed to
exec the interpreter. The interpreter will then open the script using
the pathname supplied by the kernel. The name of the symlink.
Before the interpreter reopens the script the attack would replace
the symlink with a script that does something else, but gets to run
with the privileges of the script.
Defending against that time of check vs time of use attack is why
bprm_fill_uid, and cap_bprm_set_creds use the credentials derived from
the interpreter instead of the credentials derived from the script.
The other defense is to replace the pathname of the executable that the
intepreter will open with /dev/fd/N.
All of this predates Linux entirely. I do remember this was fixed at
some point in Linux but I don't remember the details. I can just read
the solution that was picked in the code.
All of this makes me wonder how are the LSMs protected against this
attack.
Let's see the following LSMS implement brpm_set_creds:
tomoyo - Abuses bprm_set_creds to call tomoyo_load_policy [ safe ]
smack - Requires CAP_MAC_ADMIN to smack setxattrs [ vulnerable? ]
Uses those xattrs in smack_bprm_set_creds
apparmor - Everything is based on names so the symlink [ safe? ]
attack won't work as it has the wrong name.
As long as the trusted names can't be renamed
apparmor appears good.
selinux - Appears to let anyone set selinux xattrs [ safe? ]
Requires permission for a sid transfer
As the attack appears not to allow anything that
would not be allowed anyway it looks like selinux
is safe.
LSM folks, especially Casey am I reading this correctly? Did I
correctly infer how your LSMs deal with the time of check to time of use
attack on the script name?
Eric
Linus Torvalds <torvalds@linux-foundation.org> writes:
> On Tue, May 12, 2020 at 11:46 AM Eric W. Biederman
> <ebiederm@xmission.com> wrote:
>>
>> I am still thinking about this one, but here is where I am at. At a
>> practical level passing the file descriptor of the script to interpreter
>> seems like something we should encourage in the long term. It removes
>> races and it is cheaper because then the interpreter does not have to
>> turn around and open the script itself.
>
> Yeah, I think we should continue to support it, because I think it's
> the right thing to do (and we might just end up having compatibility
> issues if we don't).
>
> How about trying to move the logic to the common code, out of binfmt_misc?
>
> IOW, how about something very similar to your "brpm->preserve_creds"
> thing that you did for the credentials (also for binfmt_misc, which
> shouldn't surprise anybody: binfmt_misc is simply the "this is the
> generic thing for letting user mode do the final details").
>
>> Calling fd_install in binfmt_misc still seems wrong, as that exposes
>> the new file descriptor to user space with the old creds.
>
> Right. And it really would be good to simply not have these kinds of
> very special cases inside the low-level binfmt code: I'd much rather
> have the special cases in the generic code, so that we see what the
> ordering is etc. One of the big problems with all these binfmt
> callbacks has been the fact that it makes it so hard to think about
> and change the generic code, because the low-level binfmt handlers all
> do their own special thing.
>
> So moving it to generic code would likely simplify things from that
> angle, even if the actual complexity of the feature itself remains.
>
> Besides, we really have exposed this to other code anyway thanks to
> that whole bprm->interp_data thing, and the AT_EXECFD AUX entries that
> we have. So it's not really "internal" to binfmt_misc _anyway_.
>
> So how about we just move the fd_binary logic to the generic execve
> code, and just binfmt_misc set the flag for "yes, please do this",
> exactly like "preserve_creds"?
>
>> It is possible although unlikely for userspace to find the file
>> descriptor without consulting AT_EXECFD so just to be conservative I
>> think we should install the file descriptor in begin_new_exec even if
>> the next interpreter does not support AT_EXECFD.
>
> Ack. I think the AT_EXECFD thing is a sign that this isn't internal to
> binfmt_misc, but it also shouldn't be gating this issue. In reality,
> ELF is the only real binary format that matters - the script/misc
> binfmts are just indirection entries - and it supports AT_EXECFD, so
> let's just ignore the theoretical case of "maybe nobody exposes it".
>
> So yes, just make it part of begin_new_exec(), and there's no reason
> to support more than a single fd. No stacks or arrays of these things
> required, I feel. It's not like AT_EXECFD supports the notion of
> multiple fd's being reported anyway, nor does it make any sense to
> have some kind of nested misc->misc binfmt nesting.
>
> So making that whole interp_data and fd_binary thing be a generic
> layer thing would make the search_binary_handler() code in binfmt_misc
> be a pure tailcall too, and then the conversion to a loop ends up
> working and being the right thing.
That is pretty much what I have been thinking. I have just been taking
it slow so I find as many funny corner cases as I can.
Nothing ever clears the BINPRM_FLAGS_EXECFD so the current code can
not support nesting.
Now I do think a nested misc->misc binfmt thing can make sense in
principal. I have an old dos spectrum emulator that I use to play some
of the games that I grew up with. Running that emulator makes me two
emulators deep. I can also imagine writting a domain specific language
in python or perl, and setting things up so scripts in the domain
specific language can be run directly.
So I think I need to deliberately test and prevent a nested misc->misc,
just so data structures don't get stomped. If the cases where it could
useful prove sufficiently interesting we can enable them later.
Eric
On 5/14/2020 7:56 AM, Eric W. Biederman wrote: > Kees Cook <keescook@chromium.org> writes: > >> On Tue, May 12, 2020 at 04:47:14PM -0700, Kees Cook wrote: >>> And now I wonder if qemu actually uses the resulting AT_EXECFD ... >> It does, though I'm not sure if this is to support crossing mount points, >> dropping privileges, or something else, since it does fall back to just >> trying to open the file. >> >> execfd = qemu_getauxval(AT_EXECFD); >> if (execfd == 0) { >> execfd = open(filename, O_RDONLY); >> if (execfd < 0) { >> printf("Error while loading %s: %s\n", filename, strerror(errno)); >> _exit(EXIT_FAILURE); >> } >> } > My hunch is that the fallback exists from a time when the kernel did not > implement AT_EXECFD, or so that qemu can run on kernels that don't > implement AT_EXECFD. It doesn't really matter unless the executable is > suid, or otherwise changes privileges. > > > I looked into this a bit to remind myself why exec works the way it > works, with changing privileges. > > The classic attack is pointing a symlink at a #! script that is suid or > otherwise changes privileges. The kernel will open the script and set > the privileges, read the interpreter from the first line, and proceed to > exec the interpreter. The interpreter will then open the script using > the pathname supplied by the kernel. The name of the symlink. > Before the interpreter reopens the script the attack would replace > the symlink with a script that does something else, but gets to run > with the privileges of the script. > > > Defending against that time of check vs time of use attack is why > bprm_fill_uid, and cap_bprm_set_creds use the credentials derived from > the interpreter instead of the credentials derived from the script. > > > The other defense is to replace the pathname of the executable that the > intepreter will open with /dev/fd/N. > > All of this predates Linux entirely. I do remember this was fixed at > some point in Linux but I don't remember the details. I can just read > the solution that was picked in the code. > > > > All of this makes me wonder how are the LSMs protected against this > attack. > > Let's see the following LSMS implement brpm_set_creds: > tomoyo - Abuses bprm_set_creds to call tomoyo_load_policy [ safe ] > smack - Requires CAP_MAC_ADMIN to smack setxattrs [ vulnerable? ] > Uses those xattrs in smack_bprm_set_creds What is the concern? If the xattrs change after the check, the behavior should still be consistent. > apparmor - Everything is based on names so the symlink [ safe? ] > attack won't work as it has the wrong name. > As long as the trusted names can't be renamed > apparmor appears good. > selinux - Appears to let anyone set selinux xattrs [ safe? ] > Requires permission for a sid transfer > As the attack appears not to allow anything that > would not be allowed anyway it looks like selinux > is safe. > > LSM folks, especially Casey am I reading this correctly? Did I > correctly infer how your LSMs deal with the time of check to time of use > attack on the script name? > > Eric >
Casey Schaufler <casey@schaufler-ca.com> writes:
> On 5/14/2020 7:56 AM, Eric W. Biederman wrote:
>> Kees Cook <keescook@chromium.org> writes:
>>
>>> On Tue, May 12, 2020 at 04:47:14PM -0700, Kees Cook wrote:
>>>> And now I wonder if qemu actually uses the resulting AT_EXECFD ...
>>> It does, though I'm not sure if this is to support crossing mount points,
>>> dropping privileges, or something else, since it does fall back to just
>>> trying to open the file.
>>>
>>> execfd = qemu_getauxval(AT_EXECFD);
>>> if (execfd == 0) {
>>> execfd = open(filename, O_RDONLY);
>>> if (execfd < 0) {
>>> printf("Error while loading %s: %s\n", filename, strerror(errno));
>>> _exit(EXIT_FAILURE);
>>> }
>>> }
>> My hunch is that the fallback exists from a time when the kernel did not
>> implement AT_EXECFD, or so that qemu can run on kernels that don't
>> implement AT_EXECFD. It doesn't really matter unless the executable is
>> suid, or otherwise changes privileges.
>>
>>
>> I looked into this a bit to remind myself why exec works the way it
>> works, with changing privileges.
>>
>> The classic attack is pointing a symlink at a #! script that is suid or
>> otherwise changes privileges. The kernel will open the script and set
>> the privileges, read the interpreter from the first line, and proceed to
>> exec the interpreter. The interpreter will then open the script using
>> the pathname supplied by the kernel. The name of the symlink.
>> Before the interpreter reopens the script the attack would replace
>> the symlink with a script that does something else, but gets to run
>> with the privileges of the script.
>>
>>
>> Defending against that time of check vs time of use attack is why
>> bprm_fill_uid, and cap_bprm_set_creds use the credentials derived from
>> the interpreter instead of the credentials derived from the script.
>>
>>
>> The other defense is to replace the pathname of the executable that the
>> intepreter will open with /dev/fd/N.
>>
>> All of this predates Linux entirely. I do remember this was fixed at
>> some point in Linux but I don't remember the details. I can just read
>> the solution that was picked in the code.
>>
>>
>>
>> All of this makes me wonder how are the LSMs protected against this
>> attack.
>>
>> Let's see the following LSMS implement brpm_set_creds:
>> tomoyo - Abuses bprm_set_creds to call tomoyo_load_policy [ safe ]
>> smack - Requires CAP_MAC_ADMIN to smack setxattrs [ vulnerable? ]
>> Uses those xattrs in smack_bprm_set_creds
>
> What is the concern? If the xattrs change after the check,
> the behavior should still be consistent.
The concern is that there are xattrs set on a #! script. Someone
replaces the script after smack reads the xattr and sets bprm->cred but
before the interpreter reopens the script.
In short if there is one script with xattrs set. I can run any script as
if those xattrs were set on it.
I don't know the smack security model well enough to know if that
is a problem or not. It looks like it may be a concern because smack
limits who can mess with it's security xattrs.
Eric
On 5/13/20 4:59 PM, Eric W. Biederman wrote: > Careful with your terminology. ELF sections are for .o's For > executables ELF have segments. And reading through the code it is the > program segments that are independently relocatable. Sorry, I have trouble keeping this stuff straight when it's not in front of me. (I have a paperback copy of the old "linkers and loaders" book and it was the driest thing I have _ever_ slogged through. Back before the Linux Foundation ate the FSG I was pushing https://refspecs.linuxbase.org/ to include missing ABI supplement, I have copies of ones it doesn't collected from now long-dead sites...) But more recently I've just made puppy eyes at Rich Felker to have him fix this stuff for me, because I do _not_ retain the terminology here. REL vs RELA vs PLT, can you have a PLT without a GOT...? > There is a flag but it is defined per architecture and I don't think one > of the architectures define it. They all check for one, but I don't remember there being a #define. I have a todo item to check more architectures' fdpic binaries, this was from sh2eb (ala j-core): https://github.com/landley/toybox/commit/d61aeaf9e#diff-4442ddbb8949R65 There was the out of tree arm fdpic toolchain from the french guys for cortex-m, and the original frv paper, and in theory blackfin but nothing they touched ever got merged upstream anywhere: In _theory_ you could do fdpic for x86, but as with u-boot for x86 nobody ever bothers because it's got an x86-only solution. (And then the x86 version of stuff gets pushed to other platforms because all our device tree files were GPLed so of course acpi for arm became a thing. Sigh...) > I looked at ARM and apparently with an MMU ARM turns fdpic binaries into > PIE executables. I am not certain why. Falling back to a more widely tested codepath, I expect. Also maybe it saves 3 registers if all 4 are using the same base register? Map them linearly and it becomes "single base + offset"? Which of course looses the extra ASLR benefits the security people wanted, but "undoing what the security people want in the name of an unmeasurable microbenchmark optimization" is a proud tradition. Just because the 4 segments are compiled as independently relocatable doesn't mean they HAVE to be. (You'd think the code would be using different register numbers to index stuff so you'd STILL be using 4 registers, but I haven't looked at what arm's doing...) > The registers passed to the entry point are also different for both > cases. From the same machine code chunks? I boggle at what the ld.so fixup is doing then... > I think it would have been nice if the fdpic support had used a > different ELF type, instead of a different depending on using a > different architecture. This is what you get when a blackfin developer talks to the gnu/binutils developers: https://sourceware.org/legacy-ml/binutils/2008-04/msg00350.html > All that aside the core dumping code looks to be essentially the same > between binfmt_elf.c and binfmt_elf_fdpic.c. Do you think people would > be interested in refactoring binfmt_elf.c and binfmt_elf_fdpic.c so that > they could share the same core dumping code? I think merging the two of them together entirely would be a good idea, and anything that can collapse together I'm happy to regression test on sh2. I also note that qemu-sh4eb can run these binaries, maybe I can whip up a qemu-system-sh4eb that runs a nommu fdpic userspace... [hours later] Ok, here's me asking Rich Felker a question: >>> So fdpic binaries run under qemu-sh2eb and there's a qemu-system-sh2eb that >>> SHOULD also be able to run them under the r2d board emulation, and the kernel >>> builds fine under the sh2eb compiler but I can't enable fdpic support without >>> CONFIG_NOMMU, and if I yank that dependency from Kconfig (which only sh2 has, >>> arm and such do fdpic with or without mmu) the build breaks with: >>> >>> /home/landley/toybox/clean/ccc/sh2eb-linux-muslfdpic-cross/bin/sh2eb-linux-muslfdpic-ld: >>> fs/binfmt_elf_fdpic.o: in function `load_elf_fdpic_binary': >>> binfmt_elf_fdpic.c:(.text+0x1734): undefined reference to >>> `elf_fdpic_arch_lay_out_mm' >>> >>> The problem is if I switch off CONFIG_MMU in the kernel, buckets of stuff in the >>> r2d board kernel config changes and suddenly I don't get serial output from the >>> qemu-system-sh2eb -M r2d boot anymore. Before it was running the kernel but just >>> failing to run init... And his response: >> I don't think qemu-system-sh4eb can boot a nommu kernel. But you don't >> need to in order to do userspace-only testing. Just build a normal >> sh4eb kernel. It doesn't need CONFIG_BINFMT_ELF_FDPIC. The normal ELF >> loader can load FDPIC just fine, because a valid FDPIC ELF file is a >> valid ELF file, just with more constraints (in same sense a square is >> a rectangle). The normal ELF loader won't independently float the text >> and data segments, but that's okay because your emulated system has an >> MMU and can just map them adjacently like they show up in the ELF file >> with their untransformed addresses. >> >> Now that I think about it, it's possible that the ARM folks broke this >> when adding support for enabling CONFIG_BINFMT_ELF_FDPIC with MMU. If >> so, and you find you really do need the FDPIC loader now because they >> made the normal ELF loader refuse to do it, I think it will suffice to >> copy the ARM version of elf_fdpic_arch_lay_out_mm from >> arch/arm/kernel/elf.c to somewhere it will be compiled on SH. I.E. testing the kernel fdpic loader under qemu is NOT EASY (because the fdpic loader refuses to build in a with-mmu context, and the relevant board emulations refuse to build without), but it can fall back to the conventional ELF loader which collates the segments and treats fdpic as PIE? (Which... is how qemu-sh2eb application emulation is loading them...?) Which was news to me... > Eric Rob
It is hard to follow the control flow in exec.c as the code has evolved over time and something that used to work one way now works another. This set of changes attempts to address the worst of that, to remove unnecessary work and to make the code a little easier to follow. The churn is a bit higher than the last version of this patchset, with renaming and cleaning up of comments. I have split security_bprm_set_creds into security_bprm_creds_for_exec and security_bprm_repopulate_creds. My goal was to make it clear that one hook completes its work while the other recaculates it's work each time a new interpreter is selected. I have added a new change at the beginning to make it clear that neither security_bprm_creds_for_exec nor security_bprm_repopulate_creds needs to be implemented as prepare_exec_creds properly does the work of setting up credentials unless something special is going on. I have made the execfd support generic and moved out of binfmt_misc so that I can remove the recursion. I have moved reassigning bprm->file into the loop that replaces the recursion. In doing so I discovered that binfmt_misc was naughty and was returning -ENOEXEC in such a way that the search_binary_handler loop could not continue. So I added a change to remove that naughtiness. Eric W. Biederman (8): exec: Teach prepare_exec_creds how exec treats uids & gids exec: Factor security_bprm_creds_for_exec out of security_bprm_set_creds exec: Convert security_bprm_set_creds into security_bprm_repopulate_creds exec: Allow load_misc_binary to call prepare_binfmt unconditionally exec: Move the call of prepare_binprm into search_binary_handler exec/binfmt_script: Don't modify bprm->buf and then return -ENOEXEC exec: Generic execfd support exec: Remove recursion from search_binary_handler arch/alpha/kernel/binfmt_loader.c | 11 +---- fs/binfmt_elf.c | 4 +- fs/binfmt_elf_fdpic.c | 4 +- fs/binfmt_em86.c | 13 +---- fs/binfmt_misc.c | 69 ++++----------------------- fs/binfmt_script.c | 82 ++++++++++++++------------------ fs/exec.c | 97 ++++++++++++++++++++++++++------------ include/linux/binfmts.h | 36 ++++++-------- include/linux/lsm_hook_defs.h | 3 +- include/linux/lsm_hooks.h | 52 +++++++++++--------- include/linux/security.h | 14 ++++-- kernel/cred.c | 3 ++ security/apparmor/domain.c | 7 +-- security/apparmor/include/domain.h | 2 +- security/apparmor/lsm.c | 2 +- security/commoncap.c | 9 ++-- security/security.c | 9 +++- security/selinux/hooks.c | 8 ++-- security/smack/smack_lsm.c | 9 ++-- security/tomoyo/tomoyo.c | 12 ++--- 20 files changed, 202 insertions(+), 244 deletions(-)
It is almost possible to use the result of prepare_exec_creds with no modifications during exec. Update prepare_exec_creds to initialize the suid and the fsuid to the euid, and the sgid and the fsgid to the egid. This is all that is needed to handle the common case of exec when nothing special like a setuid exec is happening. That this preserves the existing behavior of exec can be verified by examing bprm_fill_uid and cap_bprm_set_creds. This change makes it clear that the later parts of exec that update bprm->cred are just need to handle special cases such as setuid exec and change of domains. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- kernel/cred.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/kernel/cred.c b/kernel/cred.c index 71a792616917..421b1149c651 100644 --- a/kernel/cred.c +++ b/kernel/cred.c @@ -315,6 +315,9 @@ struct cred *prepare_exec_creds(void) new->process_keyring = NULL; #endif + new->suid = new->fsuid = new->euid; + new->sgid = new->fsgid = new->egid; + return new; } -- 2.25.0
Today security_bprm_set_creds has several implementations: apparmor_bprm_set_creds, cap_bprm_set_creds, selinux_bprm_set_creds, smack_bprm_set_creds, and tomoyo_bprm_set_creds. Except for cap_bprm_set_creds they all test bprm->called_set_creds and return immediately if it is true. The function cap_bprm_set_creds ignores bprm->calld_sed_creds entirely. Create a new LSM hook security_bprm_creds_for_exec that is called just before prepare_binprm in __do_execve_file, resulting in a LSM hook that is called exactly once for the entire of exec. Modify the bits of security_bprm_set_creds that only want to be called once per exec into security_bprm_creds_for_exec, leaving only cap_bprm_set_creds behind. Remove bprm->called_set_creds all of it's former users have been moved to security_bprm_creds_for_exec. Add or upate comments a appropriate to bring them up to date and to reflect this change. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- fs/exec.c | 6 +++- include/linux/binfmts.h | 18 +++-------- include/linux/lsm_hook_defs.h | 1 + include/linux/lsm_hooks.h | 50 +++++++++++++++++------------- include/linux/security.h | 6 ++++ security/apparmor/domain.c | 7 ++--- security/apparmor/include/domain.h | 2 +- security/apparmor/lsm.c | 2 +- security/security.c | 5 +++ security/selinux/hooks.c | 8 ++--- security/smack/smack_lsm.c | 9 ++---- security/tomoyo/tomoyo.c | 12 ++----- 12 files changed, 63 insertions(+), 63 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index 14b786158aa9..9e70da47f8d9 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1640,7 +1640,6 @@ int prepare_binprm(struct linux_binprm *bprm) retval = security_bprm_set_creds(bprm); if (retval) return retval; - bprm->called_set_creds = 1; memset(bprm->buf, 0, BINPRM_BUF_SIZE); return kernel_read(bprm->file, bprm->buf, BINPRM_BUF_SIZE, &pos); @@ -1855,6 +1854,11 @@ static int __do_execve_file(int fd, struct filename *filename, if (retval < 0) goto out; + /* Set the unchanging part of bprm->cred */ + retval = security_bprm_creds_for_exec(bprm); + if (retval) + goto out; + retval = prepare_binprm(bprm); if (retval < 0) goto out; diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h index 1b48e2154766..d1217fcdedea 100644 --- a/include/linux/binfmts.h +++ b/include/linux/binfmts.h @@ -27,22 +27,14 @@ struct linux_binprm { unsigned long argmin; /* rlimit marker for copy_strings() */ unsigned int /* - * True after the bprm_set_creds hook has been called once - * (multiple calls can be made via prepare_binprm() for - * binfmt_script/misc). - */ - called_set_creds:1, - /* - * True if most recent call to the commoncaps bprm_set_creds - * hook (due to multiple prepare_binprm() calls from the - * binfmt_script/misc handlers) resulted in elevated - * privileges. + * True if most recent call to cap_bprm_set_creds + * resulted in elevated privileges. */ cap_elevated:1, /* - * Set by bprm_set_creds hook to indicate a privilege-gaining - * exec has happened. Used to sanitize execution environment - * and to set AT_SECURE auxv for glibc. + * Set by bprm_creds_for_exec hook to indicate a + * privilege-gaining exec has happened. Used to set + * AT_SECURE auxv for glibc. */ secureexec:1, /* diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h index 9cd4455528e5..aab0695f41df 100644 --- a/include/linux/lsm_hook_defs.h +++ b/include/linux/lsm_hook_defs.h @@ -49,6 +49,7 @@ LSM_HOOK(int, 0, syslog, int type) LSM_HOOK(int, 0, settime, const struct timespec64 *ts, const struct timezone *tz) LSM_HOOK(int, 0, vm_enough_memory, struct mm_struct *mm, long pages) +LSM_HOOK(int, 0, bprm_creds_for_exec, struct linux_binprm *bprm) LSM_HOOK(int, 0, bprm_set_creds, struct linux_binprm *bprm) LSM_HOOK(int, 0, bprm_check_security, struct linux_binprm *bprm) LSM_HOOK(void, LSM_RET_VOID, bprm_committing_creds, struct linux_binprm *bprm) diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index 988ca0df7824..c719af37df20 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -34,40 +34,46 @@ * * Security hooks for program execution operations. * + * @bprm_creds_for_exec: + * If the setup in prepare_exec_creds did not setup @bprm->cred->security + * properly for executing @bprm->file, update the LSM's portion of + * @bprm->cred->security to be what commit_creds needs to install for the + * new program. This hook may also optionally check permissions + * (e.g. for transitions between security domains). + * The hook must set @bprm->secureexec to 1 if AT_SECURE should be set to + * request libc enable secure mode. + * @bprm contains the linux_binprm structure. + * Return 0 if the hook is successful and permission is granted. * @bprm_set_creds: - * Save security information in the bprm->security field, typically based - * on information about the bprm->file, for later use by the apply_creds - * hook. This hook may also optionally check permissions (e.g. for + * Assuming that the relevant bits of @bprm->cred->security have been + * previously set, examine @bprm->file and regenerate them. This is + * so that the credentials derived from the interpreter the code is + * actually going to run are used rather than credentials derived + * from a script. This done because the interpreter binary needs to + * reopen script, and may end up opening something completely different. + * This hook may also optionally check permissions (e.g. for * transitions between security domains). - * This hook may be called multiple times during a single execve, e.g. for - * interpreters. The hook can tell whether it has already been called by - * checking to see if @bprm->security is non-NULL. If so, then the hook - * may decide either to retain the security information saved earlier or - * to replace it. The hook must set @bprm->secureexec to 1 if a "secure - * exec" has happened as a result of this hook call. The flag is used to - * indicate the need for a sanitized execution environment, and is also - * passed in the ELF auxiliary table on the initial stack to indicate - * whether libc should enable secure mode. + * The hook must set @bprm->cap_elevated to 1 if AT_SECURE should be set to + * request libc enable secure mode. * @bprm contains the linux_binprm structure. * Return 0 if the hook is successful and permission is granted. * @bprm_check_security: * This hook mediates the point when a search for a binary handler will - * begin. It allows a check the @bprm->security value which is set in the - * preceding set_creds call. The primary difference from set_creds is - * that the argv list and envp list are reliably available in @bprm. This - * hook may be called multiple times during a single execve; and in each - * pass set_creds is called first. + * begin. It allows a check against the @bprm->cred->security value + * which was set in the preceding creds_for_exec call. The argv list and + * envp list are reliably available in @bprm. This hook may be called + * multiple times during a single execve. * @bprm contains the linux_binprm structure. * Return 0 if the hook is successful and permission is granted. * @bprm_committing_creds: * Prepare to install the new security attributes of a process being * transformed by an execve operation, based on the old credentials * pointed to by @current->cred and the information set in @bprm->cred by - * the bprm_set_creds hook. @bprm points to the linux_binprm structure. - * This hook is a good place to perform state changes on the process such - * as closing open file descriptors to which access will no longer be - * granted when the attributes are changed. This is called immediately - * before commit_creds(). + * the bprm_creds_for_exec hook. @bprm points to the linux_binprm + * structure. This hook is a good place to perform state changes on the + * process such as closing open file descriptors to which access will no + * longer be granted when the attributes are changed. This is called + * immediately before commit_creds(). * @bprm_committed_creds: * Tidy up after the installation of the new security attributes of a * process being transformed by an execve operation. The new credentials diff --git a/include/linux/security.h b/include/linux/security.h index a8d9310472df..1bd7a6582775 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -276,6 +276,7 @@ int security_quota_on(struct dentry *dentry); int security_syslog(int type); int security_settime64(const struct timespec64 *ts, const struct timezone *tz); int security_vm_enough_memory_mm(struct mm_struct *mm, long pages); +int security_bprm_creds_for_exec(struct linux_binprm *bprm); int security_bprm_set_creds(struct linux_binprm *bprm); int security_bprm_check(struct linux_binprm *bprm); void security_bprm_committing_creds(struct linux_binprm *bprm); @@ -569,6 +570,11 @@ static inline int security_vm_enough_memory_mm(struct mm_struct *mm, long pages) return __vm_enough_memory(mm, pages, cap_vm_enough_memory(mm, pages)); } +static inline int security_bprm_creds_for_exec(struct linux_binprm *bprm) +{ + return 0; +} + static inline int security_bprm_set_creds(struct linux_binprm *bprm) { return cap_bprm_set_creds(bprm); diff --git a/security/apparmor/domain.c b/security/apparmor/domain.c index 6ceb74e0f789..0b870a647488 100644 --- a/security/apparmor/domain.c +++ b/security/apparmor/domain.c @@ -854,14 +854,14 @@ static struct aa_label *handle_onexec(struct aa_label *label, } /** - * apparmor_bprm_set_creds - set the new creds on the bprm struct + * apparmor_bprm_creds_for_exec - Update the new creds on the bprm struct * @bprm: binprm for the exec (NOT NULL) * * Returns: %0 or error on failure * * TODO: once the other paths are done see if we can't refactor into a fn */ -int apparmor_bprm_set_creds(struct linux_binprm *bprm) +int apparmor_bprm_creds_for_exec(struct linux_binprm *bprm) { struct aa_task_ctx *ctx; struct aa_label *label, *new = NULL; @@ -875,9 +875,6 @@ int apparmor_bprm_set_creds(struct linux_binprm *bprm) file_inode(bprm->file)->i_mode }; - if (bprm->called_set_creds) - return 0; - ctx = task_ctx(current); AA_BUG(!cred_label(bprm->cred)); AA_BUG(!ctx); diff --git a/security/apparmor/include/domain.h b/security/apparmor/include/domain.h index 21b875fe2d37..d14928fe1c6f 100644 --- a/security/apparmor/include/domain.h +++ b/security/apparmor/include/domain.h @@ -30,7 +30,7 @@ struct aa_domain { struct aa_label *x_table_lookup(struct aa_profile *profile, u32 xindex, const char **name); -int apparmor_bprm_set_creds(struct linux_binprm *bprm); +int apparmor_bprm_creds_for_exec(struct linux_binprm *bprm); void aa_free_domain_entries(struct aa_domain *domain); int aa_change_hat(const char *hats[], int count, u64 token, int flags); diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c index b621ad74f54a..3623ab08279d 100644 --- a/security/apparmor/lsm.c +++ b/security/apparmor/lsm.c @@ -1232,7 +1232,7 @@ static struct security_hook_list apparmor_hooks[] __lsm_ro_after_init = { LSM_HOOK_INIT(cred_prepare, apparmor_cred_prepare), LSM_HOOK_INIT(cred_transfer, apparmor_cred_transfer), - LSM_HOOK_INIT(bprm_set_creds, apparmor_bprm_set_creds), + LSM_HOOK_INIT(bprm_creds_for_exec, apparmor_bprm_creds_for_exec), LSM_HOOK_INIT(bprm_committing_creds, apparmor_bprm_committing_creds), LSM_HOOK_INIT(bprm_committed_creds, apparmor_bprm_committed_creds), diff --git a/security/security.c b/security/security.c index 7fed24b9d57e..4ee76a729f73 100644 --- a/security/security.c +++ b/security/security.c @@ -823,6 +823,11 @@ int security_vm_enough_memory_mm(struct mm_struct *mm, long pages) return __vm_enough_memory(mm, pages, cap_sys_admin); } +int security_bprm_creds_for_exec(struct linux_binprm *bprm) +{ + return call_int_hook(bprm_creds_for_exec, 0, bprm); +} + int security_bprm_set_creds(struct linux_binprm *bprm) { return call_int_hook(bprm_set_creds, 0, bprm); diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index 0b4e32161b77..718345dd76bb 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -2286,7 +2286,7 @@ static int check_nnp_nosuid(const struct linux_binprm *bprm, return -EACCES; } -static int selinux_bprm_set_creds(struct linux_binprm *bprm) +static int selinux_bprm_creds_for_exec(struct linux_binprm *bprm) { const struct task_security_struct *old_tsec; struct task_security_struct *new_tsec; @@ -2297,8 +2297,6 @@ static int selinux_bprm_set_creds(struct linux_binprm *bprm) /* SELinux context only depends on initial program or script and not * the script interpreter */ - if (bprm->called_set_creds) - return 0; old_tsec = selinux_cred(current_cred()); new_tsec = selinux_cred(bprm->cred); @@ -6385,7 +6383,7 @@ static int selinux_setprocattr(const char *name, void *value, size_t size) /* Permission checking based on the specified context is performed during the actual operation (execve, open/mkdir/...), when we know the full context of the - operation. See selinux_bprm_set_creds for the execve + operation. See selinux_bprm_creds_for_exec for the execve checks and may_create for the file creation checks. The operation will then fail if the context is not permitted. */ tsec = selinux_cred(new); @@ -6914,7 +6912,7 @@ static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = { LSM_HOOK_INIT(netlink_send, selinux_netlink_send), - LSM_HOOK_INIT(bprm_set_creds, selinux_bprm_set_creds), + LSM_HOOK_INIT(bprm_creds_for_exec, selinux_bprm_creds_for_exec), LSM_HOOK_INIT(bprm_committing_creds, selinux_bprm_committing_creds), LSM_HOOK_INIT(bprm_committed_creds, selinux_bprm_committed_creds), diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c index 8c61d175e195..0ac8f4518d07 100644 --- a/security/smack/smack_lsm.c +++ b/security/smack/smack_lsm.c @@ -891,12 +891,12 @@ static int smack_sb_statfs(struct dentry *dentry) */ /** - * smack_bprm_set_creds - set creds for exec + * smack_bprm_creds_for_exec - Update bprm->cred if needed for exec * @bprm: the exec information * * Returns 0 if it gets a blob, -EPERM if exec forbidden and -ENOMEM otherwise */ -static int smack_bprm_set_creds(struct linux_binprm *bprm) +static int smack_bprm_creds_for_exec(struct linux_binprm *bprm) { struct inode *inode = file_inode(bprm->file); struct task_smack *bsp = smack_cred(bprm->cred); @@ -904,9 +904,6 @@ static int smack_bprm_set_creds(struct linux_binprm *bprm) struct superblock_smack *sbsp; int rc; - if (bprm->called_set_creds) - return 0; - isp = smack_inode(inode); if (isp->smk_task == NULL || isp->smk_task == bsp->smk_task) return 0; @@ -4598,7 +4595,7 @@ static struct security_hook_list smack_hooks[] __lsm_ro_after_init = { LSM_HOOK_INIT(sb_statfs, smack_sb_statfs), LSM_HOOK_INIT(sb_set_mnt_opts, smack_set_mnt_opts), - LSM_HOOK_INIT(bprm_set_creds, smack_bprm_set_creds), + LSM_HOOK_INIT(bprm_creds_for_exec, smack_bprm_creds_for_exec), LSM_HOOK_INIT(inode_alloc_security, smack_inode_alloc_security), LSM_HOOK_INIT(inode_init_security, smack_inode_init_security), diff --git a/security/tomoyo/tomoyo.c b/security/tomoyo/tomoyo.c index 716c92ec941a..f9adddc42ac8 100644 --- a/security/tomoyo/tomoyo.c +++ b/security/tomoyo/tomoyo.c @@ -63,20 +63,14 @@ static void tomoyo_bprm_committed_creds(struct linux_binprm *bprm) #ifndef CONFIG_SECURITY_TOMOYO_OMIT_USERSPACE_LOADER /** - * tomoyo_bprm_set_creds - Target for security_bprm_set_creds(). + * tomoyo_bprm_for_exec - Target for security_bprm_creds_for_exec(). * * @bprm: Pointer to "struct linux_binprm". * * Returns 0. */ -static int tomoyo_bprm_set_creds(struct linux_binprm *bprm) +static int tomoyo_bprm_creds_for_exec(struct linux_binprm *bprm) { - /* - * Do only if this function is called for the first time of an execve - * operation. - */ - if (bprm->called_set_creds) - return 0; /* * Load policy if /sbin/tomoyo-init exists and /sbin/init is requested * for the first time. @@ -539,7 +533,7 @@ static struct security_hook_list tomoyo_hooks[] __lsm_ro_after_init = { LSM_HOOK_INIT(task_alloc, tomoyo_task_alloc), LSM_HOOK_INIT(task_free, tomoyo_task_free), #ifndef CONFIG_SECURITY_TOMOYO_OMIT_USERSPACE_LOADER - LSM_HOOK_INIT(bprm_set_creds, tomoyo_bprm_set_creds), + LSM_HOOK_INIT(bprm_creds_for_exec, tomoyo_bprm_creds_for_exec), #endif LSM_HOOK_INIT(bprm_check_security, tomoyo_bprm_check_security), LSM_HOOK_INIT(file_fcntl, tomoyo_file_fcntl), -- 2.25.0
Rename bprm->cap_elevated to bprm->active_secureexec and initialize it in prepare_binprm instead of in cap_bprm_set_creds. Initializing bprm->active_secureexec in prepare_binprm allows multiple implementations of security_bprm_repopulate_creds to play nicely with each other. Rename security_bprm_set_creds to security_bprm_reopulate_creds to emphasize that this path recomputes part of bprm->cred. This recomputation avoids the time of check vs time of use problems that are inherent in unix #! interpreters. In short two renames and a move in the location of initializing bprm->active_secureexec. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- fs/exec.c | 8 ++++---- include/linux/binfmts.h | 4 ++-- include/linux/lsm_hook_defs.h | 2 +- include/linux/lsm_hooks.h | 4 ++-- include/linux/security.h | 8 ++++---- security/commoncap.c | 9 ++++----- security/security.c | 4 ++-- 7 files changed, 19 insertions(+), 20 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index 9e70da47f8d9..8e3b93d51d31 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1366,7 +1366,7 @@ int begin_new_exec(struct linux_binprm * bprm) * the final state of setuid/setgid/fscaps can be merged into the * secureexec flag. */ - bprm->secureexec |= bprm->cap_elevated; + bprm->secureexec |= bprm->active_secureexec; if (bprm->secureexec) { /* Make sure parent cannot signal privileged process. */ @@ -1634,10 +1634,10 @@ int prepare_binprm(struct linux_binprm *bprm) int retval; loff_t pos = 0; + /* Recompute parts of bprm->cred based on bprm->file */ + bprm->active_secureexec = 0; bprm_fill_uid(bprm); - - /* fill in binprm security blob */ - retval = security_bprm_set_creds(bprm); + retval = security_bprm_repopulate_creds(bprm); if (retval) return retval; diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h index d1217fcdedea..8605ab4a0f89 100644 --- a/include/linux/binfmts.h +++ b/include/linux/binfmts.h @@ -27,10 +27,10 @@ struct linux_binprm { unsigned long argmin; /* rlimit marker for copy_strings() */ unsigned int /* - * True if most recent call to cap_bprm_set_creds + * True if most recent call to security_bprm_set_creds * resulted in elevated privileges. */ - cap_elevated:1, + active_secureexec:1, /* * Set by bprm_creds_for_exec hook to indicate a * privilege-gaining exec has happened. Used to set diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h index aab0695f41df..1e295ba12c0d 100644 --- a/include/linux/lsm_hook_defs.h +++ b/include/linux/lsm_hook_defs.h @@ -50,7 +50,7 @@ LSM_HOOK(int, 0, settime, const struct timespec64 *ts, const struct timezone *tz) LSM_HOOK(int, 0, vm_enough_memory, struct mm_struct *mm, long pages) LSM_HOOK(int, 0, bprm_creds_for_exec, struct linux_binprm *bprm) -LSM_HOOK(int, 0, bprm_set_creds, struct linux_binprm *bprm) +LSM_HOOK(int, 0, bprm_repopulate_creds, struct linux_binprm *bprm) LSM_HOOK(int, 0, bprm_check_security, struct linux_binprm *bprm) LSM_HOOK(void, LSM_RET_VOID, bprm_committing_creds, struct linux_binprm *bprm) LSM_HOOK(void, LSM_RET_VOID, bprm_committed_creds, struct linux_binprm *bprm) diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index c719af37df20..d618ecc4d660 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -44,7 +44,7 @@ * request libc enable secure mode. * @bprm contains the linux_binprm structure. * Return 0 if the hook is successful and permission is granted. - * @bprm_set_creds: + * @bprm_repopulate_creds: * Assuming that the relevant bits of @bprm->cred->security have been * previously set, examine @bprm->file and regenerate them. This is * so that the credentials derived from the interpreter the code is @@ -53,7 +53,7 @@ * reopen script, and may end up opening something completely different. * This hook may also optionally check permissions (e.g. for * transitions between security domains). - * The hook must set @bprm->cap_elevated to 1 if AT_SECURE should be set to + * The hook must set @bprm->active_secureexec to 1 if AT_SECURE should be set to * request libc enable secure mode. * @bprm contains the linux_binprm structure. * Return 0 if the hook is successful and permission is granted. diff --git a/include/linux/security.h b/include/linux/security.h index 1bd7a6582775..d23f078eb589 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -140,7 +140,7 @@ extern int cap_capset(struct cred *new, const struct cred *old, const kernel_cap_t *effective, const kernel_cap_t *inheritable, const kernel_cap_t *permitted); -extern int cap_bprm_set_creds(struct linux_binprm *bprm); +extern int cap_bprm_repopulate_creds(struct linux_binprm *bprm); extern int cap_inode_setxattr(struct dentry *dentry, const char *name, const void *value, size_t size, int flags); extern int cap_inode_removexattr(struct dentry *dentry, const char *name); @@ -277,7 +277,7 @@ int security_syslog(int type); int security_settime64(const struct timespec64 *ts, const struct timezone *tz); int security_vm_enough_memory_mm(struct mm_struct *mm, long pages); int security_bprm_creds_for_exec(struct linux_binprm *bprm); -int security_bprm_set_creds(struct linux_binprm *bprm); +int security_bprm_repopulate_creds(struct linux_binprm *bprm); int security_bprm_check(struct linux_binprm *bprm); void security_bprm_committing_creds(struct linux_binprm *bprm); void security_bprm_committed_creds(struct linux_binprm *bprm); @@ -575,9 +575,9 @@ static inline int security_bprm_creds_for_exec(struct linux_binprm *bprm) return 0; } -static inline int security_bprm_set_creds(struct linux_binprm *bprm) +static inline int security_bprm_repopulate_creds(struct linux_binprm *bprm) { - return cap_bprm_set_creds(bprm); + return cap_bprm_repopluate_creds(bprm); } static inline int security_bprm_check(struct linux_binprm *bprm) diff --git a/security/commoncap.c b/security/commoncap.c index f4ee0ae106b2..045b5b80ea40 100644 --- a/security/commoncap.c +++ b/security/commoncap.c @@ -797,14 +797,14 @@ static inline bool nonroot_raised_pE(struct cred *new, const struct cred *old, } /** - * cap_bprm_set_creds - Set up the proposed credentials for execve(). + * cap_bprm_repopulate_creds - Set up the proposed credentials for execve(). * @bprm: The execution parameters, including the proposed creds * * Set up the proposed credentials for a new execution context being * constructed by execve(). The proposed creds in @bprm->cred is altered, * which won't take effect immediately. Returns 0 if successful, -ve on error. */ -int cap_bprm_set_creds(struct linux_binprm *bprm) +int cap_bprm_repopulate_creds(struct linux_binprm *bprm) { const struct cred *old = current_cred(); struct cred *new = bprm->cred; @@ -884,12 +884,11 @@ int cap_bprm_set_creds(struct linux_binprm *bprm) return -EPERM; /* Check for privilege-elevated exec. */ - bprm->cap_elevated = 0; if (is_setid || (!__is_real(root_uid, new) && (effective || __cap_grew(permitted, ambient, new)))) - bprm->cap_elevated = 1; + bprm->active_secureexec = 1; return 0; } @@ -1346,7 +1345,7 @@ static struct security_hook_list capability_hooks[] __lsm_ro_after_init = { LSM_HOOK_INIT(ptrace_traceme, cap_ptrace_traceme), LSM_HOOK_INIT(capget, cap_capget), LSM_HOOK_INIT(capset, cap_capset), - LSM_HOOK_INIT(bprm_set_creds, cap_bprm_set_creds), + LSM_HOOK_INIT(bprm_repopulate_creds, cap_bprm_repopulate_creds), LSM_HOOK_INIT(inode_need_killpriv, cap_inode_need_killpriv), LSM_HOOK_INIT(inode_killpriv, cap_inode_killpriv), LSM_HOOK_INIT(inode_getsecurity, cap_inode_getsecurity), diff --git a/security/security.c b/security/security.c index 4ee76a729f73..b890b7e2a765 100644 --- a/security/security.c +++ b/security/security.c @@ -828,9 +828,9 @@ int security_bprm_creds_for_exec(struct linux_binprm *bprm) return call_int_hook(bprm_creds_for_exec, 0, bprm); } -int security_bprm_set_creds(struct linux_binprm *bprm) +int security_bprm_repopulate_creds(struct linux_binprm *bprm) { - return call_int_hook(bprm_set_creds, 0, bprm); + return call_int_hook(bprm_repopulate_creds, 0, bprm); } int security_bprm_check(struct linux_binprm *bprm) -- 2.25.0
Add a flag preserve_creds that binfmt_misc can set to prevent credentials from being updated. This allows binfmt_misc to always call prepare_binfmt. Allowing the credential computation logic to be consolidated. Not replacing the credentials with the interpreters credentials is safe because because an open file descriptor to the executable is passed to the interpreter. As the interpreter does not need to reopen the executable it is guaranteed to see the same file that exec sees. Ref: c407c033de84 ("[PATCH] binfmt_misc: improve calculation of interpreter's credentials") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- fs/binfmt_misc.c | 15 +++------------ fs/exec.c | 19 ++++++++++++------- include/linux/binfmts.h | 2 ++ 3 files changed, 17 insertions(+), 19 deletions(-) diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c index cdb45829354d..264829745d6f 100644 --- a/fs/binfmt_misc.c +++ b/fs/binfmt_misc.c @@ -218,19 +218,10 @@ static int load_misc_binary(struct linux_binprm *bprm) goto error; bprm->file = interp_file; - if (fmt->flags & MISC_FMT_CREDENTIALS) { - loff_t pos = 0; - - /* - * No need to call prepare_binprm(), it's already been - * done. bprm->buf is stale, update from interp_file. - */ - memset(bprm->buf, 0, BINPRM_BUF_SIZE); - retval = kernel_read(bprm->file, bprm->buf, BINPRM_BUF_SIZE, - &pos); - } else - retval = prepare_binprm(bprm); + if (fmt->flags & MISC_FMT_CREDENTIALS) + bprm->preserve_creds = 1; + retval = prepare_binprm(bprm); if (retval < 0) goto error; diff --git a/fs/exec.c b/fs/exec.c index 8e3b93d51d31..028e0e323af5 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1631,15 +1631,20 @@ static void bprm_fill_uid(struct linux_binprm *bprm) */ int prepare_binprm(struct linux_binprm *bprm) { - int retval; loff_t pos = 0; - /* Recompute parts of bprm->cred based on bprm->file */ - bprm->active_secureexec = 0; - bprm_fill_uid(bprm); - retval = security_bprm_repopulate_creds(bprm); - if (retval) - return retval; + /* Can the interpreter get to the executable without races? */ + if (!bprm->preserve_creds) { + int retval; + + /* Recompute parts of bprm->cred based on bprm->file */ + bprm->active_secureexec = 0; + bprm_fill_uid(bprm); + retval = security_bprm_repopulate_creds(bprm); + if (retval) + return retval; + } + bprm->preserve_creds = 0; memset(bprm->buf, 0, BINPRM_BUF_SIZE); return kernel_read(bprm->file, bprm->buf, BINPRM_BUF_SIZE, &pos); diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h index 8605ab4a0f89..dbb5614d62a2 100644 --- a/include/linux/binfmts.h +++ b/include/linux/binfmts.h @@ -26,6 +26,8 @@ struct linux_binprm { unsigned long p; /* current top of mem */ unsigned long argmin; /* rlimit marker for copy_strings() */ unsigned int + /* It is safe to use the creds of a script (see binfmt_misc) */ + preserve_creds:1, /* * True if most recent call to security_bprm_set_creds * resulted in elevated privileges. -- 2.25.0
The code in prepare_binary_handler needs to be run every time search_binary_handler is called so move the call into search_binary_handler itself to make the code simpler and easier to understand. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- arch/alpha/kernel/binfmt_loader.c | 3 --- fs/binfmt_em86.c | 4 ---- fs/binfmt_misc.c | 4 ---- fs/binfmt_script.c | 3 --- fs/exec.c | 12 +++++------- include/linux/binfmts.h | 1 - 6 files changed, 5 insertions(+), 22 deletions(-) diff --git a/arch/alpha/kernel/binfmt_loader.c b/arch/alpha/kernel/binfmt_loader.c index a8d0d6e06526..d712ba51d15a 100644 --- a/arch/alpha/kernel/binfmt_loader.c +++ b/arch/alpha/kernel/binfmt_loader.c @@ -35,9 +35,6 @@ static int load_binary(struct linux_binprm *bprm) bprm->file = file; bprm->loader = loader; - retval = prepare_binprm(bprm); - if (retval < 0) - return retval; return search_binary_handler(bprm); } diff --git a/fs/binfmt_em86.c b/fs/binfmt_em86.c index 466497860c62..cedde2341ade 100644 --- a/fs/binfmt_em86.c +++ b/fs/binfmt_em86.c @@ -91,10 +91,6 @@ static int load_em86(struct linux_binprm *bprm) bprm->file = file; - retval = prepare_binprm(bprm); - if (retval < 0) - return retval; - return search_binary_handler(bprm); } diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c index 264829745d6f..50a73afdf9b7 100644 --- a/fs/binfmt_misc.c +++ b/fs/binfmt_misc.c @@ -221,10 +221,6 @@ static int load_misc_binary(struct linux_binprm *bprm) if (fmt->flags & MISC_FMT_CREDENTIALS) bprm->preserve_creds = 1; - retval = prepare_binprm(bprm); - if (retval < 0) - goto error; - retval = search_binary_handler(bprm); if (retval < 0) goto error; diff --git a/fs/binfmt_script.c b/fs/binfmt_script.c index e9e6a6f4a35f..8d718d8fd0fe 100644 --- a/fs/binfmt_script.c +++ b/fs/binfmt_script.c @@ -143,9 +143,6 @@ static int load_script(struct linux_binprm *bprm) return PTR_ERR(file); bprm->file = file; - retval = prepare_binprm(bprm); - if (retval < 0) - return retval; return search_binary_handler(bprm); } diff --git a/fs/exec.c b/fs/exec.c index 028e0e323af5..5fc458460e44 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1629,7 +1629,7 @@ static void bprm_fill_uid(struct linux_binprm *bprm) * * This may be called multiple times for binary chains (scripts for example). */ -int prepare_binprm(struct linux_binprm *bprm) +static int prepare_binprm(struct linux_binprm *bprm) { loff_t pos = 0; @@ -1650,8 +1650,6 @@ int prepare_binprm(struct linux_binprm *bprm) return kernel_read(bprm->file, bprm->buf, BINPRM_BUF_SIZE, &pos); } -EXPORT_SYMBOL(prepare_binprm); - /* * Arguments are '\0' separated strings found at the location bprm->p * points to; chop off the first by relocating brpm->p to right after @@ -1707,6 +1705,10 @@ int search_binary_handler(struct linux_binprm *bprm) if (bprm->recursion_depth > 5) return -ELOOP; + retval = prepare_binprm(bprm); + if (retval < 0) + return retval; + retval = security_bprm_check(bprm); if (retval) return retval; @@ -1864,10 +1866,6 @@ static int __do_execve_file(int fd, struct filename *filename, if (retval) goto out; - retval = prepare_binprm(bprm); - if (retval < 0) - goto out; - retval = copy_strings_kernel(1, &bprm->filename, bprm); if (retval < 0) goto out; diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h index dbb5614d62a2..8c7779d6bf19 100644 --- a/include/linux/binfmts.h +++ b/include/linux/binfmts.h @@ -116,7 +116,6 @@ static inline void insert_binfmt(struct linux_binfmt *fmt) extern void unregister_binfmt(struct linux_binfmt *); -extern int prepare_binprm(struct linux_binprm *); extern int __must_check remove_arg_zero(struct linux_binprm *); extern int search_binary_handler(struct linux_binprm *); extern int begin_new_exec(struct linux_binprm * bprm); -- 2.25.0
The return code -ENOEXEC serves to tell search_binary_handler that it should continue searching for the binfmt to handle a given file. This makes return -ENOEXEC with a bprm->buf that is needed to continue the search problematic. The current binfmt_script manages to escape problems as it closes and clears bprm->file before return -ENOEXEC with bprm->buf modified. This prevents search_binary_handler from looping as it explicitly handles a NULL bprm->file. I plan on moving all of the bprm->file managment into fs/exec.c and out of the binary handlers so this will become a problem. Move closing bprm->file and the test for BINPRM_PATH_INACCESSIBLE down below the last return of -ENOEXEC. Introduce i_sep and i_end to track the end of the first argument and the end of the parameters respectively. Using those, constification of all char * pointers, and the helpers next_terminator and next_non_spacetab guarantee the parameter parsing will not modify bprm->buf. Only modify bprm->buf to terminate the strings i_arg and i_name with '\0' for passing to copy_strings_kernel. When replacing loops with next_non_spacetab and next_terminator care has been take that the logic of the parsing code (short of replacing characters by '\0') remains the same. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- fs/binfmt_script.c | 80 ++++++++++++++++++++++------------------------ 1 file changed, 38 insertions(+), 42 deletions(-) diff --git a/fs/binfmt_script.c b/fs/binfmt_script.c index 8d718d8fd0fe..85e0ef86eb11 100644 --- a/fs/binfmt_script.c +++ b/fs/binfmt_script.c @@ -16,14 +16,14 @@ #include <linux/fs.h> static inline bool spacetab(char c) { return c == ' ' || c == '\t'; } -static inline char *next_non_spacetab(char *first, const char *last) +static inline const char *next_non_spacetab(const char *first, const char *last) { for (; first <= last; first++) if (!spacetab(*first)) return first; return NULL; } -static inline char *next_terminator(char *first, const char *last) +static inline const char *next_terminator(const char *first, const char *last) { for (; first <= last; first++) if (spacetab(*first) || !*first) @@ -33,8 +33,7 @@ static inline char *next_terminator(char *first, const char *last) static int load_script(struct linux_binprm *bprm) { - const char *i_arg, *i_name; - char *cp, *buf_end; + const char *i_name, *i_sep, *i_arg, *i_end, *buf_end; struct file *file; int retval; @@ -42,20 +41,6 @@ static int load_script(struct linux_binprm *bprm) if ((bprm->buf[0] != '#') || (bprm->buf[1] != '!')) return -ENOEXEC; - /* - * If the script filename will be inaccessible after exec, typically - * because it is a "/dev/fd/<fd>/.." path against an O_CLOEXEC fd, give - * up now (on the assumption that the interpreter will want to load - * this file). - */ - if (bprm->interp_flags & BINPRM_FLAGS_PATH_INACCESSIBLE) - return -ENOENT; - - /* Release since we are not mapping a binary into memory. */ - allow_write_access(bprm->file); - fput(bprm->file); - bprm->file = NULL; - /* * This section handles parsing the #! line into separate * interpreter path and argument strings. We must be careful @@ -71,39 +56,48 @@ static int load_script(struct linux_binprm *bprm) * parse them on its own. */ buf_end = bprm->buf + sizeof(bprm->buf) - 1; - cp = strnchr(bprm->buf, sizeof(bprm->buf), '\n'); - if (!cp) { - cp = next_non_spacetab(bprm->buf + 2, buf_end); - if (!cp) + i_end = strnchr(bprm->buf, sizeof(bprm->buf), '\n'); + if (!i_end) { + i_end = next_non_spacetab(bprm->buf + 2, buf_end); + if (!i_end) return -ENOEXEC; /* Entire buf is spaces/tabs */ /* * If there is no later space/tab/NUL we must assume the * interpreter path is truncated. */ - if (!next_terminator(cp, buf_end)) + if (!next_terminator(i_end, buf_end)) return -ENOEXEC; - cp = buf_end; + i_end = buf_end; } - /* NUL-terminate the buffer and any trailing spaces/tabs. */ - *cp = '\0'; - while (cp > bprm->buf) { - cp--; - if ((*cp == ' ') || (*cp == '\t')) - *cp = '\0'; - else - break; - } - for (cp = bprm->buf+2; (*cp == ' ') || (*cp == '\t'); cp++); - if (*cp == '\0') + /* Trim any trailing spaces/tabs from i_end */ + while (spacetab(i_end[-1])) + i_end--; + + /* Skip over leading spaces/tabs */ + i_name = next_non_spacetab(bprm->buf+2, i_end); + if (!i_name || (i_name == i_end)) return -ENOEXEC; /* No interpreter name found */ - i_name = cp; + + /* Is there an optional argument? */ i_arg = NULL; - for ( ; *cp && (*cp != ' ') && (*cp != '\t'); cp++) - /* nothing */ ; - while ((*cp == ' ') || (*cp == '\t')) - *cp++ = '\0'; - if (*cp) - i_arg = cp; + i_sep = next_terminator(i_name, i_end); + if (i_sep && (*i_sep != '\0')) + i_arg = next_non_spacetab(i_sep, i_end); + + /* + * If the script filename will be inaccessible after exec, typically + * because it is a "/dev/fd/<fd>/.." path against an O_CLOEXEC fd, give + * up now (on the assumption that the interpreter will want to load + * this file). + */ + if (bprm->interp_flags & BINPRM_FLAGS_PATH_INACCESSIBLE) + return -ENOENT; + + /* Release since we are not mapping a binary into memory. */ + allow_write_access(bprm->file); + fput(bprm->file); + bprm->file = NULL; + /* * OK, we've parsed out the interpreter name and * (optional) argument. @@ -121,7 +115,9 @@ static int load_script(struct linux_binprm *bprm) if (retval < 0) return retval; bprm->argc++; + *((char *)i_end) = '\0'; if (i_arg) { + *((char *)i_sep) = '\0'; retval = copy_strings_kernel(1, &i_arg, bprm); if (retval < 0) return retval; -- 2.25.0
Most of the support for passing the file descriptor of an executable to an interpreter already lives in the generic code and in binfmt_elf. Rework the fields in binfmt_elf that deal with executable file descriptor passing to make executable file descriptor passing a first class concept. Move the fd_install from binfmt_misc into begin_new_exec after the new creds have been installed. This means that accessing the file through /proc/<pid>/fd/N is able to see the creds for the new executable before allowing access to the new executables files. Performing the install of the executables file descriptor after the point of no return also means that nothing special needs to be done on error. The exiting of the process will close all of it's open files. Move the would_dump from binfmt_misc into begin_new_exec right after would_dump is called on the bprm->file. This makes it obvious this case exists and that no nesting of bprm->file is currently supported. In binfmt_misc the movement of fd_install into generic code means that it's special error exit path is no longer needed. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- fs/binfmt_elf.c | 4 ++-- fs/binfmt_elf_fdpic.c | 4 ++-- fs/binfmt_misc.c | 40 ++++++++-------------------------------- fs/exec.c | 15 +++++++++++++++ include/linux/binfmts.h | 10 +++++----- 5 files changed, 32 insertions(+), 41 deletions(-) diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index 396d5c2e6b5e..441c85f04dfd 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -273,8 +273,8 @@ create_elf_tables(struct linux_binprm *bprm, const struct elfhdr *exec, NEW_AUX_ENT(AT_BASE_PLATFORM, (elf_addr_t)(unsigned long)u_base_platform); } - if (bprm->interp_flags & BINPRM_FLAGS_EXECFD) { - NEW_AUX_ENT(AT_EXECFD, bprm->interp_data); + if (bprm->have_execfd) { + NEW_AUX_ENT(AT_EXECFD, bprm->execfd); } #undef NEW_AUX_ENT /* AT_NULL is zero; clear the rest too */ diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c index 896e3ca9bf85..2d5e9eb12075 100644 --- a/fs/binfmt_elf_fdpic.c +++ b/fs/binfmt_elf_fdpic.c @@ -628,10 +628,10 @@ static int create_elf_fdpic_tables(struct linux_binprm *bprm, (elf_addr_t) (unsigned long) u_base_platform); } - if (bprm->interp_flags & BINPRM_FLAGS_EXECFD) { + if (bprm->have_execfd) { nr = 0; csp -= 2 * sizeof(unsigned long); - NEW_AUX_ENT(AT_EXECFD, bprm->interp_data); + NEW_AUX_ENT(AT_EXECFD, bprm->execfd); } nr = 0; diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c index 50a73afdf9b7..ad2866f28f0c 100644 --- a/fs/binfmt_misc.c +++ b/fs/binfmt_misc.c @@ -134,7 +134,6 @@ static int load_misc_binary(struct linux_binprm *bprm) Node *fmt; struct file *interp_file = NULL; int retval; - int fd_binary = -1; retval = -ENOEXEC; if (!enabled) @@ -161,29 +160,12 @@ static int load_misc_binary(struct linux_binprm *bprm) } if (fmt->flags & MISC_FMT_OPEN_BINARY) { - - /* if the binary should be opened on behalf of the - * interpreter than keep it open and assign descriptor - * to it - */ - fd_binary = get_unused_fd_flags(0); - if (fd_binary < 0) { - retval = fd_binary; - goto ret; - } - fd_install(fd_binary, bprm->file); - - /* if the binary is not readable than enforce mm->dumpable=0 - regardless of the interpreter's permissions */ - would_dump(bprm, bprm->file); + /* Pass the open binary to the interpreter */ + bprm->have_execfd = 1; + bprm->executable = bprm->file; allow_write_access(bprm->file); bprm->file = NULL; - - /* mark the bprm that fd should be passed to interp */ - bprm->interp_flags |= BINPRM_FLAGS_EXECFD; - bprm->interp_data = fd_binary; - } else { allow_write_access(bprm->file); fput(bprm->file); @@ -192,19 +174,19 @@ static int load_misc_binary(struct linux_binprm *bprm) /* make argv[1] be the path to the binary */ retval = copy_strings_kernel(1, &bprm->interp, bprm); if (retval < 0) - goto error; + goto ret; bprm->argc++; /* add the interp as argv[0] */ retval = copy_strings_kernel(1, &fmt->interpreter, bprm); if (retval < 0) - goto error; + goto ret; bprm->argc++; /* Update interp in case binfmt_script needs it. */ retval = bprm_change_interp(fmt->interpreter, bprm); if (retval < 0) - goto error; + goto ret; if (fmt->flags & MISC_FMT_OPEN_FILE) { interp_file = file_clone_open(fmt->interp_file); @@ -215,7 +197,7 @@ static int load_misc_binary(struct linux_binprm *bprm) } retval = PTR_ERR(interp_file); if (IS_ERR(interp_file)) - goto error; + goto ret; bprm->file = interp_file; if (fmt->flags & MISC_FMT_CREDENTIALS) @@ -223,17 +205,11 @@ static int load_misc_binary(struct linux_binprm *bprm) retval = search_binary_handler(bprm); if (retval < 0) - goto error; + goto ret; ret: dput(fmt->dentry); return retval; -error: - if (fd_binary > 0) - ksys_close(fd_binary); - bprm->interp_flags = 0; - bprm->interp_data = 0; - goto ret; } /* Command parsers */ diff --git a/fs/exec.c b/fs/exec.c index 5fc458460e44..ca91393893ea 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1323,7 +1323,10 @@ int begin_new_exec(struct linux_binprm * bprm) */ set_mm_exe_file(bprm->mm, bprm->file); + /* If the binary is not readable than enforce mm->dumpable=0 */ would_dump(bprm, bprm->file); + if (bprm->have_execfd) + would_dump(bprm, bprm->executable); /* * Release all of the old mmap stuff @@ -1427,6 +1430,16 @@ int begin_new_exec(struct linux_binprm * bprm) * credentials; any time after this it may be unlocked. */ security_bprm_committed_creds(bprm); + + /* Pass the opened binary to the interpreter. */ + if (bprm->have_execfd) { + retval = get_unused_fd_flags(0); + if (retval < 0) + goto out_unlock; + fd_install(retval, bprm->executable); + bprm->executable = NULL; + bprm->execfd = retval; + } return 0; out_unlock: @@ -1516,6 +1529,8 @@ static void free_bprm(struct linux_binprm *bprm) allow_write_access(bprm->file); fput(bprm->file); } + if (bprm->executable) + fput(bprm->executable); /* If a binfmt changed the interp, free it. */ if (bprm->interp != bprm->filename) kfree(bprm->interp); diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h index 8c7779d6bf19..653508b25815 100644 --- a/include/linux/binfmts.h +++ b/include/linux/binfmts.h @@ -26,6 +26,9 @@ struct linux_binprm { unsigned long p; /* current top of mem */ unsigned long argmin; /* rlimit marker for copy_strings() */ unsigned int + /* Should an execfd be passed to userspace? */ + have_execfd:1, + /* It is safe to use the creds of a script (see binfmt_misc) */ preserve_creds:1, /* @@ -48,6 +51,7 @@ struct linux_binprm { unsigned int taso:1; #endif unsigned int recursion_depth; /* only for search_binary_handler() */ + struct file * executable; /* Executable to pass to the interpreter */ struct file * file; struct cred *cred; /* new credentials */ int unsafe; /* how unsafe this exec is (mask of LSM_UNSAFE_*) */ @@ -58,7 +62,7 @@ struct linux_binprm { of the time same as filename, but could be different for binfmt_{misc,script} */ unsigned interp_flags; - unsigned interp_data; + int execfd; /* File descriptor of the executable */ unsigned long loader, exec; struct rlimit rlim_stack; /* Saved RLIMIT_STACK used during exec. */ @@ -69,10 +73,6 @@ struct linux_binprm { #define BINPRM_FLAGS_ENFORCE_NONDUMP_BIT 0 #define BINPRM_FLAGS_ENFORCE_NONDUMP (1 << BINPRM_FLAGS_ENFORCE_NONDUMP_BIT) -/* fd of the binary should be passed to the interpreter */ -#define BINPRM_FLAGS_EXECFD_BIT 1 -#define BINPRM_FLAGS_EXECFD (1 << BINPRM_FLAGS_EXECFD_BIT) - /* filename of the binary will be inaccessible after exec */ #define BINPRM_FLAGS_PATH_INACCESSIBLE_BIT 2 #define BINPRM_FLAGS_PATH_INACCESSIBLE (1 << BINPRM_FLAGS_PATH_INACCESSIBLE_BIT) -- 2.25.0
Recursion in kernel code is generally a bad idea as it can overflow the kernel stack. Recursion in exec also hides that the code is looping and that the loop changes bprm->file. Instead of recursing in search_binary_handler have the methods that would recurse set bprm->interpreter and return 0. Modify exec_binprm to loop when bprm->interpreter is set. Consolidate all of the reassignments of bprm->file in that loop to make it clear what is going on. The structure of the new loop in exec_binprm is that all errors return immediately, while successful completion (ret == 0 && !bprm->interpreter) just breaks out of the loop and runs what exec_bprm has always run upon successful completion. Fail if the an interpreter is being call after execfd has been set. The code has never properly handled an interpreter being called with execfd being set and with reassignments of bprm->file and the assignment of bprm->executable in generic code it has finally become possible to test and fail when if this problematic condition happens. With the reassignments of bprm->file and the assignment of bprm->executable moved into the generic code add a test to see if bprm->executable is being reassigned. In search_binary_handler remove the test for !bprm->file. With all reassignments of bprm->file moved to exec_binprm bprm->file can never be NULL in search_binary_handler. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- arch/alpha/kernel/binfmt_loader.c | 8 ++--- fs/binfmt_em86.c | 9 ++---- fs/binfmt_misc.c | 18 ++--------- fs/binfmt_script.c | 9 ++---- fs/exec.c | 51 ++++++++++++++++++++----------- include/linux/binfmts.h | 3 +- 6 files changed, 43 insertions(+), 55 deletions(-) diff --git a/arch/alpha/kernel/binfmt_loader.c b/arch/alpha/kernel/binfmt_loader.c index d712ba51d15a..e4be7a543ecf 100644 --- a/arch/alpha/kernel/binfmt_loader.c +++ b/arch/alpha/kernel/binfmt_loader.c @@ -19,10 +19,6 @@ static int load_binary(struct linux_binprm *bprm) if (bprm->loader) return -ENOEXEC; - allow_write_access(bprm->file); - fput(bprm->file); - bprm->file = NULL; - loader = bprm->vma->vm_end - sizeof(void *); file = open_exec("/sbin/loader"); @@ -33,9 +29,9 @@ static int load_binary(struct linux_binprm *bprm) /* Remember if the application is TASO. */ bprm->taso = eh->ah.entry < 0x100000000UL; - bprm->file = file; + bprm->interpreter = file; bprm->loader = loader; - return search_binary_handler(bprm); + return 0; } static struct linux_binfmt loader_format = { diff --git a/fs/binfmt_em86.c b/fs/binfmt_em86.c index cedde2341ade..995883693cb2 100644 --- a/fs/binfmt_em86.c +++ b/fs/binfmt_em86.c @@ -48,10 +48,6 @@ static int load_em86(struct linux_binprm *bprm) if (bprm->interp_flags & BINPRM_FLAGS_PATH_INACCESSIBLE) return -ENOENT; - allow_write_access(bprm->file); - fput(bprm->file); - bprm->file = NULL; - /* Unlike in the script case, we don't have to do any hairy * parsing to find our interpreter... it's hardcoded! */ @@ -89,9 +85,8 @@ static int load_em86(struct linux_binprm *bprm) if (IS_ERR(file)) return PTR_ERR(file); - bprm->file = file; - - return search_binary_handler(bprm); + bprm->interpreter = file; + return 0; } static struct linux_binfmt em86_format = { diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c index ad2866f28f0c..53968ea07b57 100644 --- a/fs/binfmt_misc.c +++ b/fs/binfmt_misc.c @@ -159,18 +159,9 @@ static int load_misc_binary(struct linux_binprm *bprm) goto ret; } - if (fmt->flags & MISC_FMT_OPEN_BINARY) { - /* Pass the open binary to the interpreter */ + if (fmt->flags & MISC_FMT_OPEN_BINARY) bprm->have_execfd = 1; - bprm->executable = bprm->file; - allow_write_access(bprm->file); - bprm->file = NULL; - } else { - allow_write_access(bprm->file); - fput(bprm->file); - bprm->file = NULL; - } /* make argv[1] be the path to the binary */ retval = copy_strings_kernel(1, &bprm->interp, bprm); if (retval < 0) @@ -199,14 +190,11 @@ static int load_misc_binary(struct linux_binprm *bprm) if (IS_ERR(interp_file)) goto ret; - bprm->file = interp_file; + bprm->interpreter = interp_file; if (fmt->flags & MISC_FMT_CREDENTIALS) bprm->preserve_creds = 1; - retval = search_binary_handler(bprm); - if (retval < 0) - goto ret; - + retval = 0; ret: dput(fmt->dentry); return retval; diff --git a/fs/binfmt_script.c b/fs/binfmt_script.c index 85e0ef86eb11..0e8b953d12cf 100644 --- a/fs/binfmt_script.c +++ b/fs/binfmt_script.c @@ -93,11 +93,6 @@ static int load_script(struct linux_binprm *bprm) if (bprm->interp_flags & BINPRM_FLAGS_PATH_INACCESSIBLE) return -ENOENT; - /* Release since we are not mapping a binary into memory. */ - allow_write_access(bprm->file); - fput(bprm->file); - bprm->file = NULL; - /* * OK, we've parsed out the interpreter name and * (optional) argument. @@ -138,8 +133,8 @@ static int load_script(struct linux_binprm *bprm) if (IS_ERR(file)) return PTR_ERR(file); - bprm->file = file; - return search_binary_handler(bprm); + bprm->interpreter = file; + return 0; } static struct linux_binfmt script_format = { diff --git a/fs/exec.c b/fs/exec.c index ca91393893ea..47d831e5efde 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1710,16 +1710,12 @@ EXPORT_SYMBOL(remove_arg_zero); /* * cycle the list of binary formats handler, until one recognizes the image */ -int search_binary_handler(struct linux_binprm *bprm) +static int search_binary_handler(struct linux_binprm *bprm) { bool need_retry = IS_ENABLED(CONFIG_MODULES); struct linux_binfmt *fmt; int retval; - /* This allows 4 levels of binfmt rewrites before failing hard. */ - if (bprm->recursion_depth > 5) - return -ELOOP; - retval = prepare_binprm(bprm); if (retval < 0) return retval; @@ -1736,14 +1732,11 @@ int search_binary_handler(struct linux_binprm *bprm) continue; read_unlock(&binfmt_lock); - bprm->recursion_depth++; retval = fmt->load_binary(bprm); - bprm->recursion_depth--; read_lock(&binfmt_lock); put_binfmt(fmt); - if (bprm->point_of_no_return || !bprm->file || - (retval != -ENOEXEC)) { + if (bprm->point_of_no_return || (retval != -ENOEXEC)) { read_unlock(&binfmt_lock); return retval; } @@ -1762,12 +1755,11 @@ int search_binary_handler(struct linux_binprm *bprm) return retval; } -EXPORT_SYMBOL(search_binary_handler); static int exec_binprm(struct linux_binprm *bprm) { pid_t old_pid, old_vpid; - int ret; + int ret, depth; /* Need to fetch pid before load_binary changes it */ old_pid = current->pid; @@ -1775,15 +1767,38 @@ static int exec_binprm(struct linux_binprm *bprm) old_vpid = task_pid_nr_ns(current, task_active_pid_ns(current->parent)); rcu_read_unlock(); - ret = search_binary_handler(bprm); - if (ret >= 0) { - audit_bprm(bprm); - trace_sched_process_exec(current, old_pid, bprm); - ptrace_event(PTRACE_EVENT_EXEC, old_vpid); - proc_exec_connector(current); + /* This allows 4 levels of binfmt rewrites before failing hard. */ + for (depth = 0;; depth++) { + struct file *exec; + if (depth > 5) + return -ELOOP; + + ret = search_binary_handler(bprm); + if (ret < 0) + return ret; + if (!bprm->interpreter) + break; + + exec = bprm->file; + bprm->file = bprm->interpreter; + bprm->interpreter = NULL; + + allow_write_access(exec); + if (unlikely(bprm->have_execfd)) { + if (bprm->executable) { + fput(exec); + return -ENOEXEC; + } + bprm->executable = exec; + } else + fput(exec); } - return ret; + audit_bprm(bprm); + trace_sched_process_exec(current, old_pid, bprm); + ptrace_event(PTRACE_EVENT_EXEC, old_vpid); + proc_exec_connector(current); + return 0; } /* diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h index 653508b25815..7fc05929c967 100644 --- a/include/linux/binfmts.h +++ b/include/linux/binfmts.h @@ -50,8 +50,8 @@ struct linux_binprm { #ifdef __alpha__ unsigned int taso:1; #endif - unsigned int recursion_depth; /* only for search_binary_handler() */ struct file * executable; /* Executable to pass to the interpreter */ + struct file * interpreter; struct file * file; struct cred *cred; /* new credentials */ int unsafe; /* how unsafe this exec is (mask of LSM_UNSAFE_*) */ @@ -117,7 +117,6 @@ static inline void insert_binfmt(struct linux_binfmt *fmt) extern void unregister_binfmt(struct linux_binfmt *); extern int __must_check remove_arg_zero(struct linux_binprm *); -extern int search_binary_handler(struct linux_binprm *); extern int begin_new_exec(struct linux_binprm * bprm); extern void setup_new_exec(struct linux_binprm * bprm); extern void finalize_exec(struct linux_binprm *bprm); -- 2.25.0
On Mon, May 18, 2020 at 5:32 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> It is hard to follow the control flow in exec.c as the code has evolved over
> time and something that used to work one way now works another. This set of
> changes attempts to address the worst of that, to remove unnecessary work
> and to make the code a little easier to follow.
It is indeed hard to follow, and maybe I missed something, but from
what I can tell, your series looks all sane. It certainly seems to
make things much more straightforward.
Of course, exactly _because_ it's such a messy area, maybe it
introduces something odd, but all the patches look relatively
straightforward. And you remove more lines of code than you add, which
is always nice to see.
So ack from me.
Oleg? Jann? Anybody? Do you see anything strange that I missed?
Linus
On 5/18/2020 5:30 PM, Eric W. Biederman wrote: > Today security_bprm_set_creds has several implementations: > apparmor_bprm_set_creds, cap_bprm_set_creds, selinux_bprm_set_creds, > smack_bprm_set_creds, and tomoyo_bprm_set_creds. > > Except for cap_bprm_set_creds they all test bprm->called_set_creds and > return immediately if it is true. The function cap_bprm_set_creds > ignores bprm->calld_sed_creds entirely. > > Create a new LSM hook security_bprm_creds_for_exec that is called just > before prepare_binprm in __do_execve_file, resulting in a LSM hook > that is called exactly once for the entire of exec. Modify the bits > of security_bprm_set_creds that only want to be called once per exec > into security_bprm_creds_for_exec, leaving only cap_bprm_set_creds > behind. > > Remove bprm->called_set_creds all of it's former users have been moved > to security_bprm_creds_for_exec. > > Add or upate comments a appropriate to bring them up to date and > to reflect this change. > > Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> For the LSM and Smack bits Acked-by: Casey Schaufler <casey@schaufler-ca.com> > --- > fs/exec.c | 6 +++- > include/linux/binfmts.h | 18 +++-------- > include/linux/lsm_hook_defs.h | 1 + > include/linux/lsm_hooks.h | 50 +++++++++++++++++------------- > include/linux/security.h | 6 ++++ > security/apparmor/domain.c | 7 ++--- > security/apparmor/include/domain.h | 2 +- > security/apparmor/lsm.c | 2 +- > security/security.c | 5 +++ > security/selinux/hooks.c | 8 ++--- > security/smack/smack_lsm.c | 9 ++---- > security/tomoyo/tomoyo.c | 12 ++----- > 12 files changed, 63 insertions(+), 63 deletions(-) > > diff --git a/fs/exec.c b/fs/exec.c > index 14b786158aa9..9e70da47f8d9 100644 > --- a/fs/exec.c > +++ b/fs/exec.c > @@ -1640,7 +1640,6 @@ int prepare_binprm(struct linux_binprm *bprm) > retval = security_bprm_set_creds(bprm); > if (retval) > return retval; > - bprm->called_set_creds = 1; > > memset(bprm->buf, 0, BINPRM_BUF_SIZE); > return kernel_read(bprm->file, bprm->buf, BINPRM_BUF_SIZE, &pos); > @@ -1855,6 +1854,11 @@ static int __do_execve_file(int fd, struct filename *filename, > if (retval < 0) > goto out; > > + /* Set the unchanging part of bprm->cred */ > + retval = security_bprm_creds_for_exec(bprm); > + if (retval) > + goto out; > + > retval = prepare_binprm(bprm); > if (retval < 0) > goto out; > diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h > index 1b48e2154766..d1217fcdedea 100644 > --- a/include/linux/binfmts.h > +++ b/include/linux/binfmts.h > @@ -27,22 +27,14 @@ struct linux_binprm { > unsigned long argmin; /* rlimit marker for copy_strings() */ > unsigned int > /* > - * True after the bprm_set_creds hook has been called once > - * (multiple calls can be made via prepare_binprm() for > - * binfmt_script/misc). > - */ > - called_set_creds:1, > - /* > - * True if most recent call to the commoncaps bprm_set_creds > - * hook (due to multiple prepare_binprm() calls from the > - * binfmt_script/misc handlers) resulted in elevated > - * privileges. > + * True if most recent call to cap_bprm_set_creds > + * resulted in elevated privileges. > */ > cap_elevated:1, > /* > - * Set by bprm_set_creds hook to indicate a privilege-gaining > - * exec has happened. Used to sanitize execution environment > - * and to set AT_SECURE auxv for glibc. > + * Set by bprm_creds_for_exec hook to indicate a > + * privilege-gaining exec has happened. Used to set > + * AT_SECURE auxv for glibc. > */ > secureexec:1, > /* > diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h > index 9cd4455528e5..aab0695f41df 100644 > --- a/include/linux/lsm_hook_defs.h > +++ b/include/linux/lsm_hook_defs.h > @@ -49,6 +49,7 @@ LSM_HOOK(int, 0, syslog, int type) > LSM_HOOK(int, 0, settime, const struct timespec64 *ts, > const struct timezone *tz) > LSM_HOOK(int, 0, vm_enough_memory, struct mm_struct *mm, long pages) > +LSM_HOOK(int, 0, bprm_creds_for_exec, struct linux_binprm *bprm) > LSM_HOOK(int, 0, bprm_set_creds, struct linux_binprm *bprm) > LSM_HOOK(int, 0, bprm_check_security, struct linux_binprm *bprm) > LSM_HOOK(void, LSM_RET_VOID, bprm_committing_creds, struct linux_binprm *bprm) > diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h > index 988ca0df7824..c719af37df20 100644 > --- a/include/linux/lsm_hooks.h > +++ b/include/linux/lsm_hooks.h > @@ -34,40 +34,46 @@ > * > * Security hooks for program execution operations. > * > + * @bprm_creds_for_exec: > + * If the setup in prepare_exec_creds did not setup @bprm->cred->security > + * properly for executing @bprm->file, update the LSM's portion of > + * @bprm->cred->security to be what commit_creds needs to install for the > + * new program. This hook may also optionally check permissions > + * (e.g. for transitions between security domains). > + * The hook must set @bprm->secureexec to 1 if AT_SECURE should be set to > + * request libc enable secure mode. > + * @bprm contains the linux_binprm structure. > + * Return 0 if the hook is successful and permission is granted. > * @bprm_set_creds: > - * Save security information in the bprm->security field, typically based > - * on information about the bprm->file, for later use by the apply_creds > - * hook. This hook may also optionally check permissions (e.g. for > + * Assuming that the relevant bits of @bprm->cred->security have been > + * previously set, examine @bprm->file and regenerate them. This is > + * so that the credentials derived from the interpreter the code is > + * actually going to run are used rather than credentials derived > + * from a script. This done because the interpreter binary needs to > + * reopen script, and may end up opening something completely different. > + * This hook may also optionally check permissions (e.g. for > * transitions between security domains). > - * This hook may be called multiple times during a single execve, e.g. for > - * interpreters. The hook can tell whether it has already been called by > - * checking to see if @bprm->security is non-NULL. If so, then the hook > - * may decide either to retain the security information saved earlier or > - * to replace it. The hook must set @bprm->secureexec to 1 if a "secure > - * exec" has happened as a result of this hook call. The flag is used to > - * indicate the need for a sanitized execution environment, and is also > - * passed in the ELF auxiliary table on the initial stack to indicate > - * whether libc should enable secure mode. > + * The hook must set @bprm->cap_elevated to 1 if AT_SECURE should be set to > + * request libc enable secure mode. > * @bprm contains the linux_binprm structure. > * Return 0 if the hook is successful and permission is granted. > * @bprm_check_security: > * This hook mediates the point when a search for a binary handler will > - * begin. It allows a check the @bprm->security value which is set in the > - * preceding set_creds call. The primary difference from set_creds is > - * that the argv list and envp list are reliably available in @bprm. This > - * hook may be called multiple times during a single execve; and in each > - * pass set_creds is called first. > + * begin. It allows a check against the @bprm->cred->security value > + * which was set in the preceding creds_for_exec call. The argv list and > + * envp list are reliably available in @bprm. This hook may be called > + * multiple times during a single execve. > * @bprm contains the linux_binprm structure. > * Return 0 if the hook is successful and permission is granted. > * @bprm_committing_creds: > * Prepare to install the new security attributes of a process being > * transformed by an execve operation, based on the old credentials > * pointed to by @current->cred and the information set in @bprm->cred by > - * the bprm_set_creds hook. @bprm points to the linux_binprm structure. > - * This hook is a good place to perform state changes on the process such > - * as closing open file descriptors to which access will no longer be > - * granted when the attributes are changed. This is called immediately > - * before commit_creds(). > + * the bprm_creds_for_exec hook. @bprm points to the linux_binprm > + * structure. This hook is a good place to perform state changes on the > + * process such as closing open file descriptors to which access will no > + * longer be granted when the attributes are changed. This is called > + * immediately before commit_creds(). > * @bprm_committed_creds: > * Tidy up after the installation of the new security attributes of a > * process being transformed by an execve operation. The new credentials > diff --git a/include/linux/security.h b/include/linux/security.h > index a8d9310472df..1bd7a6582775 100644 > --- a/include/linux/security.h > +++ b/include/linux/security.h > @@ -276,6 +276,7 @@ int security_quota_on(struct dentry *dentry); > int security_syslog(int type); > int security_settime64(const struct timespec64 *ts, const struct timezone *tz); > int security_vm_enough_memory_mm(struct mm_struct *mm, long pages); > +int security_bprm_creds_for_exec(struct linux_binprm *bprm); > int security_bprm_set_creds(struct linux_binprm *bprm); > int security_bprm_check(struct linux_binprm *bprm); > void security_bprm_committing_creds(struct linux_binprm *bprm); > @@ -569,6 +570,11 @@ static inline int security_vm_enough_memory_mm(struct mm_struct *mm, long pages) > return __vm_enough_memory(mm, pages, cap_vm_enough_memory(mm, pages)); > } > > +static inline int security_bprm_creds_for_exec(struct linux_binprm *bprm) > +{ > + return 0; > +} > + > static inline int security_bprm_set_creds(struct linux_binprm *bprm) > { > return cap_bprm_set_creds(bprm); > diff --git a/security/apparmor/domain.c b/security/apparmor/domain.c > index 6ceb74e0f789..0b870a647488 100644 > --- a/security/apparmor/domain.c > +++ b/security/apparmor/domain.c > @@ -854,14 +854,14 @@ static struct aa_label *handle_onexec(struct aa_label *label, > } > > /** > - * apparmor_bprm_set_creds - set the new creds on the bprm struct > + * apparmor_bprm_creds_for_exec - Update the new creds on the bprm struct > * @bprm: binprm for the exec (NOT NULL) > * > * Returns: %0 or error on failure > * > * TODO: once the other paths are done see if we can't refactor into a fn > */ > -int apparmor_bprm_set_creds(struct linux_binprm *bprm) > +int apparmor_bprm_creds_for_exec(struct linux_binprm *bprm) > { > struct aa_task_ctx *ctx; > struct aa_label *label, *new = NULL; > @@ -875,9 +875,6 @@ int apparmor_bprm_set_creds(struct linux_binprm *bprm) > file_inode(bprm->file)->i_mode > }; > > - if (bprm->called_set_creds) > - return 0; > - > ctx = task_ctx(current); > AA_BUG(!cred_label(bprm->cred)); > AA_BUG(!ctx); > diff --git a/security/apparmor/include/domain.h b/security/apparmor/include/domain.h > index 21b875fe2d37..d14928fe1c6f 100644 > --- a/security/apparmor/include/domain.h > +++ b/security/apparmor/include/domain.h > @@ -30,7 +30,7 @@ struct aa_domain { > struct aa_label *x_table_lookup(struct aa_profile *profile, u32 xindex, > const char **name); > > -int apparmor_bprm_set_creds(struct linux_binprm *bprm); > +int apparmor_bprm_creds_for_exec(struct linux_binprm *bprm); > > void aa_free_domain_entries(struct aa_domain *domain); > int aa_change_hat(const char *hats[], int count, u64 token, int flags); > diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c > index b621ad74f54a..3623ab08279d 100644 > --- a/security/apparmor/lsm.c > +++ b/security/apparmor/lsm.c > @@ -1232,7 +1232,7 @@ static struct security_hook_list apparmor_hooks[] __lsm_ro_after_init = { > LSM_HOOK_INIT(cred_prepare, apparmor_cred_prepare), > LSM_HOOK_INIT(cred_transfer, apparmor_cred_transfer), > > - LSM_HOOK_INIT(bprm_set_creds, apparmor_bprm_set_creds), > + LSM_HOOK_INIT(bprm_creds_for_exec, apparmor_bprm_creds_for_exec), > LSM_HOOK_INIT(bprm_committing_creds, apparmor_bprm_committing_creds), > LSM_HOOK_INIT(bprm_committed_creds, apparmor_bprm_committed_creds), > > diff --git a/security/security.c b/security/security.c > index 7fed24b9d57e..4ee76a729f73 100644 > --- a/security/security.c > +++ b/security/security.c > @@ -823,6 +823,11 @@ int security_vm_enough_memory_mm(struct mm_struct *mm, long pages) > return __vm_enough_memory(mm, pages, cap_sys_admin); > } > > +int security_bprm_creds_for_exec(struct linux_binprm *bprm) > +{ > + return call_int_hook(bprm_creds_for_exec, 0, bprm); > +} > + > int security_bprm_set_creds(struct linux_binprm *bprm) > { > return call_int_hook(bprm_set_creds, 0, bprm); > diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c > index 0b4e32161b77..718345dd76bb 100644 > --- a/security/selinux/hooks.c > +++ b/security/selinux/hooks.c > @@ -2286,7 +2286,7 @@ static int check_nnp_nosuid(const struct linux_binprm *bprm, > return -EACCES; > } > > -static int selinux_bprm_set_creds(struct linux_binprm *bprm) > +static int selinux_bprm_creds_for_exec(struct linux_binprm *bprm) > { > const struct task_security_struct *old_tsec; > struct task_security_struct *new_tsec; > @@ -2297,8 +2297,6 @@ static int selinux_bprm_set_creds(struct linux_binprm *bprm) > > /* SELinux context only depends on initial program or script and not > * the script interpreter */ > - if (bprm->called_set_creds) > - return 0; > > old_tsec = selinux_cred(current_cred()); > new_tsec = selinux_cred(bprm->cred); > @@ -6385,7 +6383,7 @@ static int selinux_setprocattr(const char *name, void *value, size_t size) > /* Permission checking based on the specified context is > performed during the actual operation (execve, > open/mkdir/...), when we know the full context of the > - operation. See selinux_bprm_set_creds for the execve > + operation. See selinux_bprm_creds_for_exec for the execve > checks and may_create for the file creation checks. The > operation will then fail if the context is not permitted. */ > tsec = selinux_cred(new); > @@ -6914,7 +6912,7 @@ static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = { > > LSM_HOOK_INIT(netlink_send, selinux_netlink_send), > > - LSM_HOOK_INIT(bprm_set_creds, selinux_bprm_set_creds), > + LSM_HOOK_INIT(bprm_creds_for_exec, selinux_bprm_creds_for_exec), > LSM_HOOK_INIT(bprm_committing_creds, selinux_bprm_committing_creds), > LSM_HOOK_INIT(bprm_committed_creds, selinux_bprm_committed_creds), > > diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c > index 8c61d175e195..0ac8f4518d07 100644 > --- a/security/smack/smack_lsm.c > +++ b/security/smack/smack_lsm.c > @@ -891,12 +891,12 @@ static int smack_sb_statfs(struct dentry *dentry) > */ > > /** > - * smack_bprm_set_creds - set creds for exec > + * smack_bprm_creds_for_exec - Update bprm->cred if needed for exec > * @bprm: the exec information > * > * Returns 0 if it gets a blob, -EPERM if exec forbidden and -ENOMEM otherwise > */ > -static int smack_bprm_set_creds(struct linux_binprm *bprm) > +static int smack_bprm_creds_for_exec(struct linux_binprm *bprm) > { > struct inode *inode = file_inode(bprm->file); > struct task_smack *bsp = smack_cred(bprm->cred); > @@ -904,9 +904,6 @@ static int smack_bprm_set_creds(struct linux_binprm *bprm) > struct superblock_smack *sbsp; > int rc; > > - if (bprm->called_set_creds) > - return 0; > - > isp = smack_inode(inode); > if (isp->smk_task == NULL || isp->smk_task == bsp->smk_task) > return 0; > @@ -4598,7 +4595,7 @@ static struct security_hook_list smack_hooks[] __lsm_ro_after_init = { > LSM_HOOK_INIT(sb_statfs, smack_sb_statfs), > LSM_HOOK_INIT(sb_set_mnt_opts, smack_set_mnt_opts), > > - LSM_HOOK_INIT(bprm_set_creds, smack_bprm_set_creds), > + LSM_HOOK_INIT(bprm_creds_for_exec, smack_bprm_creds_for_exec), > > LSM_HOOK_INIT(inode_alloc_security, smack_inode_alloc_security), > LSM_HOOK_INIT(inode_init_security, smack_inode_init_security), > diff --git a/security/tomoyo/tomoyo.c b/security/tomoyo/tomoyo.c > index 716c92ec941a..f9adddc42ac8 100644 > --- a/security/tomoyo/tomoyo.c > +++ b/security/tomoyo/tomoyo.c > @@ -63,20 +63,14 @@ static void tomoyo_bprm_committed_creds(struct linux_binprm *bprm) > > #ifndef CONFIG_SECURITY_TOMOYO_OMIT_USERSPACE_LOADER > /** > - * tomoyo_bprm_set_creds - Target for security_bprm_set_creds(). > + * tomoyo_bprm_for_exec - Target for security_bprm_creds_for_exec(). > * > * @bprm: Pointer to "struct linux_binprm". > * > * Returns 0. > */ > -static int tomoyo_bprm_set_creds(struct linux_binprm *bprm) > +static int tomoyo_bprm_creds_for_exec(struct linux_binprm *bprm) > { > - /* > - * Do only if this function is called for the first time of an execve > - * operation. > - */ > - if (bprm->called_set_creds) > - return 0; > /* > * Load policy if /sbin/tomoyo-init exists and /sbin/init is requested > * for the first time. > @@ -539,7 +533,7 @@ static struct security_hook_list tomoyo_hooks[] __lsm_ro_after_init = { > LSM_HOOK_INIT(task_alloc, tomoyo_task_alloc), > LSM_HOOK_INIT(task_free, tomoyo_task_free), > #ifndef CONFIG_SECURITY_TOMOYO_OMIT_USERSPACE_LOADER > - LSM_HOOK_INIT(bprm_set_creds, tomoyo_bprm_set_creds), > + LSM_HOOK_INIT(bprm_creds_for_exec, tomoyo_bprm_creds_for_exec), > #endif > LSM_HOOK_INIT(bprm_check_security, tomoyo_bprm_check_security), > LSM_HOOK_INIT(file_fcntl, tomoyo_file_fcntl),
On Mon, May 18, 2020 at 07:29:41PM -0500, Eric W. Biederman wrote: > > It is almost possible to use the result of prepare_exec_creds with no > modifications during exec. Update prepare_exec_creds to initialize > the suid and the fsuid to the euid, and the sgid and the fsgid to the > egid. This is all that is needed to handle the common case of exec > when nothing special like a setuid exec is happening. > > That this preserves the existing behavior of exec can be verified > by examing bprm_fill_uid and cap_bprm_set_creds. Yup, agreed. > This change makes it clear that the later parts of exec that > update bprm->cred are just need to handle special cases such > as setuid exec and change of domains. One question, though: why add this, since the repeat calling of the caps LSM hook will do this? Is there a call ordering change here, or is this just to make the new LSM hook more robust? Regardless, this looks correct, if perhaps redundant. :) Reviewed-by: Kees Cook <keescook@chromium.org> -- Kees Cook
On Mon, May 18, 2020 at 07:30:10PM -0500, Eric W. Biederman wrote: > > Today security_bprm_set_creds has several implementations: > apparmor_bprm_set_creds, cap_bprm_set_creds, selinux_bprm_set_creds, > smack_bprm_set_creds, and tomoyo_bprm_set_creds. > > Except for cap_bprm_set_creds they all test bprm->called_set_creds and > return immediately if it is true. The function cap_bprm_set_creds > ignores bprm->calld_sed_creds entirely. > > Create a new LSM hook security_bprm_creds_for_exec that is called just > before prepare_binprm in __do_execve_file, resulting in a LSM hook > that is called exactly once for the entire of exec. Modify the bits > of security_bprm_set_creds that only want to be called once per exec > into security_bprm_creds_for_exec, leaving only cap_bprm_set_creds > behind. > > Remove bprm->called_set_creds all of it's former users have been moved > to security_bprm_creds_for_exec. > > Add or upate comments a appropriate to bring them up to date and > to reflect this change. Yup, awesome. One nit below. Reviewed-by: Kees Cook <keescook@chromium.org> > [...] > diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c > index 0b4e32161b77..718345dd76bb 100644 > --- a/security/selinux/hooks.c > +++ b/security/selinux/hooks.c > [...] > @@ -2297,8 +2297,6 @@ static int selinux_bprm_set_creds(struct linux_binprm *bprm) > > /* SELinux context only depends on initial program or script and not > * the script interpreter */ > - if (bprm->called_set_creds) > - return 0; > > old_tsec = selinux_cred(current_cred()); > new_tsec = selinux_cred(bprm->cred); As you've done in the other LSMs, I think this comment can be removed (or moved to the top of the function) too. -- Kees Cook
On Mon, May 18, 2020 at 07:31:14PM -0500, Eric W. Biederman wrote: > > Rename bprm->cap_elevated to bprm->active_secureexec and initialize it > in prepare_binprm instead of in cap_bprm_set_creds. Initializing > bprm->active_secureexec in prepare_binprm allows multiple > implementations of security_bprm_repopulate_creds to play nicely with > each other. > > Rename security_bprm_set_creds to security_bprm_reopulate_creds to > emphasize that this path recomputes part of bprm->cred. This > recomputation avoids the time of check vs time of use problems that > are inherent in unix #! interpreters. > > In short two renames and a move in the location of initializing > bprm->active_secureexec. I like this much better than the direct call to the capabilities hook. Thanks! Reviewed-by: Kees Cook <keescook@chromium.org> One nit is a bikeshed on the name "active_secureexec", since the word "active" isn't really associated with any other part of the binfmt logic. It's supposed to be "latest state from the binfmt loop", so instead of "active", I considered these words that I also didn't like: "current", "this", "recent", and "now". Is "latest" better than "active"? Probably not. > [...] > diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h > index d1217fcdedea..8605ab4a0f89 100644 > --- a/include/linux/binfmts.h > +++ b/include/linux/binfmts.h > @@ -27,10 +27,10 @@ struct linux_binprm { > unsigned long argmin; /* rlimit marker for copy_strings() */ > unsigned int > /* > - * True if most recent call to cap_bprm_set_creds > + * True if most recent call to security_bprm_set_creds > * resulted in elevated privileges. > */ > - cap_elevated:1, > + active_secureexec:1, Also, I'd like it if this comment could be made more verbose as well, for anyone trying to understand the binfmt execution flow for the first time. Perhaps: /* * Must be set True during the any call to * bprm_set_creds hook where the execution would * reuslt in elevated privileges. (The hook can be * called multiple times during nested interpreter * resolution across binfmt_script, binfmt_misc, etc). */ -- Kees Cook
On Mon, May 18, 2020 at 07:31:51PM -0500, Eric W. Biederman wrote: > > Add a flag preserve_creds that binfmt_misc can set to prevent > credentials from being updated. This allows binfmt_misc to always > call prepare_binfmt. Allowing the credential computation logic to be typo: prepare_binprm() > consolidated. > > Not replacing the credentials with the interpreters credentials is > safe because because an open file descriptor to the executable is > passed to the interpreter. As the interpreter does not need to > reopen the executable it is guaranteed to see the same file that > exec sees. Yup, looks good. Note below on comment. Reviewed-by: Kees Cook <keescook@chromium.org> > [...] > diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h > index 8605ab4a0f89..dbb5614d62a2 100644 > --- a/include/linux/binfmts.h > +++ b/include/linux/binfmts.h > @@ -26,6 +26,8 @@ struct linux_binprm { > unsigned long p; /* current top of mem */ > unsigned long argmin; /* rlimit marker for copy_strings() */ > unsigned int > + /* It is safe to use the creds of a script (see binfmt_misc) */ > + preserve_creds:1, How about: /* * A binfmt handler will set this to True before calling * prepare_binprm() if it is safe to reuse the previous * credentials, based on bprm->file (see binfmt_misc). */ -- Kees Cook
On Mon, May 18, 2020 at 07:32:18PM -0500, Eric W. Biederman wrote:
>
> The code in prepare_binary_handler needs to be run every time
> search_binary_handler is called so move the call into search_binary_handler
> itself to make the code simpler and easier to understand.
>
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
--
Kees Cook
On Tue, May 19, 2020 at 11:03 AM Kees Cook <keescook@chromium.org> wrote:
>
> One question, though: why add this, since the repeat calling of the caps
> LSM hook will do this?
I assume it's for the "preserve_creds" case where we don't even end up
setting creds at all.
Yeah, at some point we'll hit a bprm handler that doesn't set
'preserve_creds', and it all does get set in the end, but that's not
statically all that obvious.
I think it makes sense to initialize as much as possible from the
generic code, and rely as little as possible on what the binfmt
handlers end up actually doing.
Linus
Linus Torvalds <torvalds@linux-foundation.org> writes:
> On Tue, May 19, 2020 at 11:03 AM Kees Cook <keescook@chromium.org> wrote:
>>
>> One question, though: why add this, since the repeat calling of the caps
>> LSM hook will do this?
>
> I assume it's for the "preserve_creds" case where we don't even end up
> setting creds at all.
>
> Yeah, at some point we'll hit a bprm handler that doesn't set
> 'preserve_creds', and it all does get set in the end, but that's not
> statically all that obvious.
>
> I think it makes sense to initialize as much as possible from the
> generic code, and rely as little as possible on what the binfmt
> handlers end up actually doing.
Where this initially came from was I was looking at how to clean up the
case of no_new_privs/ptrace of a suid executable when we don't have
enough permissions. Just being able to create creds that kept
everything as they were looked very useful and there was just this one
little bit missing.
I included the change to prepare_exec_creds in this patchset to
emphasize that neither security_bprm_creds_for_exec nor
security_bprm_repopulate_creds need to do anything if there is nothing
special going on.
At the very least that helps me think through what the LSMs are required
to do, and what those hooks are for. AKA privilege changing execs.
So I was thinking rely on the LSMs as little as possible rather than
rely on the binfmt handlers as little as possible. But it is the same
idea.
And yes it makes everything easier to analyze if everything starts off
in a known good state.
Eric
Kees Cook <keescook@chromium.org> writes: > On Mon, May 18, 2020 at 07:31:14PM -0500, Eric W. Biederman wrote: >> >> Rename bprm->cap_elevated to bprm->active_secureexec and initialize it >> in prepare_binprm instead of in cap_bprm_set_creds. Initializing >> bprm->active_secureexec in prepare_binprm allows multiple >> implementations of security_bprm_repopulate_creds to play nicely with >> each other. >> >> Rename security_bprm_set_creds to security_bprm_reopulate_creds to >> emphasize that this path recomputes part of bprm->cred. This >> recomputation avoids the time of check vs time of use problems that >> are inherent in unix #! interpreters. >> >> In short two renames and a move in the location of initializing >> bprm->active_secureexec. > > I like this much better than the direct call to the capabilities hook. > Thanks! > > Reviewed-by: Kees Cook <keescook@chromium.org> > > One nit is a bikeshed on the name "active_secureexec", since > the word "active" isn't really associated with any other part of the > binfmt logic. It's supposed to be "latest state from the binfmt loop", > so instead of "active", I considered these words that I also didn't > like: "current", "this", "recent", and "now". Is "latest" better than > "active"? Probably not. I had pretty much the same problem. Active at least conveys that it is still malleable and might change. >> [...] >> diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h >> index d1217fcdedea..8605ab4a0f89 100644 >> --- a/include/linux/binfmts.h >> +++ b/include/linux/binfmts.h >> @@ -27,10 +27,10 @@ struct linux_binprm { >> unsigned long argmin; /* rlimit marker for copy_strings() */ >> unsigned int >> /* >> - * True if most recent call to cap_bprm_set_creds >> + * True if most recent call to security_bprm_set_creds >> * resulted in elevated privileges. >> */ >> - cap_elevated:1, >> + active_secureexec:1, > > Also, I'd like it if this comment could be made more verbose as well, for > anyone trying to understand the binfmt execution flow for the first time. > Perhaps: > > /* > * Must be set True during the any call to > * bprm_set_creds hook where the execution would > * reuslt in elevated privileges. (The hook can be > * called multiple times during nested interpreter > * resolution across binfmt_script, binfmt_misc, etc). > */ Well it is not during but after the call that it becomes true. I think most recent covers the case of multiple calls. I think having the loop explicitly in the code a few patches later makes it clear that there is a loop dealing with interpreters. Conciseness has a virtue in that it is easy to absorb. Seeing active says most recent and secureexec does not is enough to ask questions and look at the code. Eric
On Mon, May 18, 2020 at 07:33:21PM -0500, Eric W. Biederman wrote: > > The return code -ENOEXEC serves to tell search_binary_handler that it > should continue searching for the binfmt to handle a given file. This > makes return -ENOEXEC with a bprm->buf that is needed to continue the > search problematic. > > The current binfmt_script manages to escape problems as it closes and > clears bprm->file before return -ENOEXEC with bprm->buf modified. > This prevents search_binary_handler from looping as it explicitly > handles a NULL bprm->file. > > I plan on moving all of the bprm->file managment into fs/exec.c and out > of the binary handlers so this will become a problem. > > Move closing bprm->file and the test for BINPRM_PATH_INACCESSIBLE > down below the last return of -ENOEXEC. > > Introduce i_sep and i_end to track the end of the first argument and > the end of the parameters respectively. Using those, constification > of all char * pointers, and the helpers next_terminator and > next_non_spacetab guarantee the parameter parsing will not modify > bprm->buf. I'm quite pleased this could be implemented using the existing helpers! It seems Linus and I were on the right track with these. :) > > Only modify bprm->buf to terminate the strings i_arg and i_name with > '\0' for passing to copy_strings_kernel. > > When replacing loops with next_non_spacetab and next_terminator care > has been take that the logic of the parsing code (short of replacing > characters by '\0') remains the same. Ah, interesting. As in, bprm->buf must not be modified unless the binfmt handler is going to succeed. I think this requirement should be documented in the binfmt struct header file. > [...] > diff --git a/fs/binfmt_script.c b/fs/binfmt_script.c > index 8d718d8fd0fe..85e0ef86eb11 100644 > --- a/fs/binfmt_script.c > +++ b/fs/binfmt_script.c > @@ -71,39 +56,48 @@ static int load_script(struct linux_binprm *bprm) > * parse them on its own. > */ > buf_end = bprm->buf + sizeof(bprm->buf) - 1; > - cp = strnchr(bprm->buf, sizeof(bprm->buf), '\n'); > - if (!cp) { > - cp = next_non_spacetab(bprm->buf + 2, buf_end); > - if (!cp) > + i_end = strnchr(bprm->buf, sizeof(bprm->buf), '\n'); > + if (!i_end) { > + i_end = next_non_spacetab(bprm->buf + 2, buf_end); > + if (!i_end) > return -ENOEXEC; /* Entire buf is spaces/tabs */ > /* > * If there is no later space/tab/NUL we must assume the > * interpreter path is truncated. > */ > - if (!next_terminator(cp, buf_end)) > + if (!next_terminator(i_end, buf_end)) > return -ENOEXEC; > - cp = buf_end; > + i_end = buf_end; > } > - /* NUL-terminate the buffer and any trailing spaces/tabs. */ > - *cp = '\0'; > - while (cp > bprm->buf) { > - cp--; > - if ((*cp == ' ') || (*cp == '\t')) > - *cp = '\0'; > - else > - break; > - } > - for (cp = bprm->buf+2; (*cp == ' ') || (*cp == '\t'); cp++); > - if (*cp == '\0') > + /* Trim any trailing spaces/tabs from i_end */ > + while (spacetab(i_end[-1])) > + i_end--; > + > + /* Skip over leading spaces/tabs */ > + i_name = next_non_spacetab(bprm->buf+2, i_end); > + if (!i_name || (i_name == i_end)) > return -ENOEXEC; /* No interpreter name found */ > - i_name = cp; > + > + /* Is there an optional argument? */ > i_arg = NULL; > - for ( ; *cp && (*cp != ' ') && (*cp != '\t'); cp++) > - /* nothing */ ; > - while ((*cp == ' ') || (*cp == '\t')) > - *cp++ = '\0'; > - if (*cp) > - i_arg = cp; > + i_sep = next_terminator(i_name, i_end); > + if (i_sep && (*i_sep != '\0')) > + i_arg = next_non_spacetab(i_sep, i_end); > + > + /* > + * If the script filename will be inaccessible after exec, typically > + * because it is a "/dev/fd/<fd>/.." path against an O_CLOEXEC fd, give > + * up now (on the assumption that the interpreter will want to load > + * this file). > + */ > + if (bprm->interp_flags & BINPRM_FLAGS_PATH_INACCESSIBLE) > + return -ENOENT; > + > + /* Release since we are not mapping a binary into memory. */ > + allow_write_access(bprm->file); > + fput(bprm->file); > + bprm->file = NULL; > + > /* > * OK, we've parsed out the interpreter name and > * (optional) argument. > @@ -121,7 +115,9 @@ static int load_script(struct linux_binprm *bprm) > if (retval < 0) > return retval; > bprm->argc++; > + *((char *)i_end) = '\0'; > if (i_arg) { > + *((char *)i_sep) = '\0'; > retval = copy_strings_kernel(1, &i_arg, bprm); > if (retval < 0) > return retval; I think this is all correct, though I'm always suspicious of my visual inspection of string parsers. ;) I had a worry the \n was not handled correctly in some case. I.e. before any \n was converted into \0, and so next_terminator() didn't need to consider \n separately. (next_non_spacetab() doesn't care since \n and \0 are both not ' ' nor '\t'.) For next_terminator(), though, I was worried there was a case where *i_end == '\n', and next_terminator() will return NULL instead of "last" due to *last being '\n' instead of '\0', causing a problem, but you're using the adjusted i_end so I think it's correct. And you've handled i_name == i_end. I will see if I can find my testing scripts I used when commit b5372fe5dc84 originally landed to double-check... until then: Reviewed-by: Kees Cook <keescook@chromium.org> -- Kees Cook
Kees Cook <keescook@chromium.org> writes: > On Mon, May 18, 2020 at 07:31:51PM -0500, Eric W. Biederman wrote: >> >> Add a flag preserve_creds that binfmt_misc can set to prevent >> credentials from being updated. This allows binfmt_misc to always >> call prepare_binfmt. Allowing the credential computation logic to be > > typo: prepare_binprm() Thank you. >> consolidated. >> >> Not replacing the credentials with the interpreters credentials is >> safe because because an open file descriptor to the executable is >> passed to the interpreter. As the interpreter does not need to >> reopen the executable it is guaranteed to see the same file that >> exec sees. > > Yup, looks good. Note below on comment. > > Reviewed-by: Kees Cook <keescook@chromium.org> > >> [...] >> diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h >> index 8605ab4a0f89..dbb5614d62a2 100644 >> --- a/include/linux/binfmts.h >> +++ b/include/linux/binfmts.h >> @@ -26,6 +26,8 @@ struct linux_binprm { >> unsigned long p; /* current top of mem */ >> unsigned long argmin; /* rlimit marker for copy_strings() */ >> unsigned int >> + /* It is safe to use the creds of a script (see binfmt_misc) */ >> + preserve_creds:1, > > How about: > > /* > * A binfmt handler will set this to True before calling > * prepare_binprm() if it is safe to reuse the previous > * credentials, based on bprm->file (see binfmt_misc). > */ I think that is more words saying less. While I agree it might be better. I don't see what your comment adds to the understanding. What do you see my comment not saying that is important? Eric
On Tue, May 19, 2020 at 02:03:23PM -0500, Eric W. Biederman wrote: > Kees Cook <keescook@chromium.org> writes: > > > On Mon, May 18, 2020 at 07:31:14PM -0500, Eric W. Biederman wrote: > >> [...] > >> diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h > >> index d1217fcdedea..8605ab4a0f89 100644 > >> --- a/include/linux/binfmts.h > >> +++ b/include/linux/binfmts.h > >> @@ -27,10 +27,10 @@ struct linux_binprm { > >> unsigned long argmin; /* rlimit marker for copy_strings() */ > >> unsigned int > >> /* > >> - * True if most recent call to cap_bprm_set_creds > >> + * True if most recent call to security_bprm_set_creds > >> * resulted in elevated privileges. > >> */ > >> - cap_elevated:1, > >> + active_secureexec:1, > > > > Also, I'd like it if this comment could be made more verbose as well, for > > anyone trying to understand the binfmt execution flow for the first time. > > Perhaps: > > > > /* > > * Must be set True during the any call to > > * bprm_set_creds hook where the execution would > > * reuslt in elevated privileges. (The hook can be > > * called multiple times during nested interpreter > > * resolution across binfmt_script, binfmt_misc, etc). > > */ > Well it is not during but after the call that it becomes true. > I think most recent covers the case of multiple calls. I'm thinking of an LSM writing reading these comments to decide what they need to do to the flags, so it's a direction to them to set it to true if they have determined that privilege was gained. (Though in theory, this is all moot since only the commoncap hook cares.) > I think having the loop explicitly in the code a few patches > later makes it clear that there is a loop dealing with interpreters. > > Conciseness has a virtue in that it is easy to absorb. Seeing > active says most recent and secureexec does not is enough to ask > questions and look at the code. I still think a hint about the nature of nested exec resolution would be nice in here somewhere, especially given that this value is zeroed before each call to the hook. -- Kees Cook
On Tue, May 19, 2020 at 02:08:34PM -0500, Eric W. Biederman wrote:
> Kees Cook <keescook@chromium.org> writes:
>
> > On Mon, May 18, 2020 at 07:31:51PM -0500, Eric W. Biederman wrote:
> >> [...]
> >> diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
> >> index 8605ab4a0f89..dbb5614d62a2 100644
> >> --- a/include/linux/binfmts.h
> >> +++ b/include/linux/binfmts.h
> >> @@ -26,6 +26,8 @@ struct linux_binprm {
> >> unsigned long p; /* current top of mem */
> >> unsigned long argmin; /* rlimit marker for copy_strings() */
> >> unsigned int
> >> + /* It is safe to use the creds of a script (see binfmt_misc) */
> >> + preserve_creds:1,
> >
> > How about:
> >
> > /*
> > * A binfmt handler will set this to True before calling
> > * prepare_binprm() if it is safe to reuse the previous
> > * credentials, based on bprm->file (see binfmt_misc).
> > */
>
> I think that is more words saying less.
>
> While I agree it might be better. I don't see what your comment adds to
> the understanding. What do you see my comment not saying that is important?
I think your comment is aimed at the consumer of preserve_creds (i.e.
the fs/exec.c code), whereas I think the comment should be directed at
a binfmt author, who wants to answer the question "why would I set this
flag?" Though I strongly hope we never have new binfmts. ;)
--
Kees Cook
Kees Cook <keescook@chromium.org> writes: > On Mon, May 18, 2020 at 07:33:21PM -0500, Eric W. Biederman wrote: >> >> When replacing loops with next_non_spacetab and next_terminator care >> has been take that the logic of the parsing code (short of replacing >> characters by '\0') remains the same. > > Ah, interesting. As in, bprm->buf must not be modified unless the binfmt > handler is going to succeed. I think this requirement should be > documented in the binfmt struct header file. I think the best way to document this is to modify bprm->buf to be "const char buf[BINPRM_BUF_SIZE]" or something like that and not allow any modifications by anything except for the code that initially reads in contets of the file. That unfortunately requires copy_strings_kernel which has become copy_string_kernel to take a length. Then I don't need to modify the buffer at all here. I believe binfmt_scripts is a bit unique in wanting to modify the buffer because it is parsing strings. The requirement is that a binfmt should not modify bprm unless it will succeed or fail with an error that is not -ENOEXEC. The fundamental issue is that search_binary_handler will reuse bprm if -ENOEXEC is returned. Until the next patch there is an escape hatch by clearing and closing bprm->file but that goes away. Which is why I need this patch. I guess I can see adding a comment about the general case of not changing bprm unless you are doing something other than returning -ENOEXEC and letting the search continue. Eric >> [...] >> diff --git a/fs/binfmt_script.c b/fs/binfmt_script.c >> index 8d718d8fd0fe..85e0ef86eb11 100644 >> --- a/fs/binfmt_script.c >> +++ b/fs/binfmt_script.c >> @@ -71,39 +56,48 @@ static int load_script(struct linux_binprm *bprm) >> * parse them on its own. >> */ >> buf_end = bprm->buf + sizeof(bprm->buf) - 1; >> - cp = strnchr(bprm->buf, sizeof(bprm->buf), '\n'); >> - if (!cp) { >> - cp = next_non_spacetab(bprm->buf + 2, buf_end); >> - if (!cp) >> + i_end = strnchr(bprm->buf, sizeof(bprm->buf), '\n'); >> + if (!i_end) { >> + i_end = next_non_spacetab(bprm->buf + 2, buf_end); >> + if (!i_end) >> return -ENOEXEC; /* Entire buf is spaces/tabs */ >> /* >> * If there is no later space/tab/NUL we must assume the >> * interpreter path is truncated. >> */ >> - if (!next_terminator(cp, buf_end)) >> + if (!next_terminator(i_end, buf_end)) >> return -ENOEXEC; >> - cp = buf_end; >> + i_end = buf_end; >> } >> - /* NUL-terminate the buffer and any trailing spaces/tabs. */ >> - *cp = '\0'; >> - while (cp > bprm->buf) { >> - cp--; >> - if ((*cp == ' ') || (*cp == '\t')) >> - *cp = '\0'; >> - else >> - break; >> - } >> - for (cp = bprm->buf+2; (*cp == ' ') || (*cp == '\t'); cp++); >> - if (*cp == '\0') >> + /* Trim any trailing spaces/tabs from i_end */ >> + while (spacetab(i_end[-1])) >> + i_end--; >> + >> + /* Skip over leading spaces/tabs */ >> + i_name = next_non_spacetab(bprm->buf+2, i_end); >> + if (!i_name || (i_name == i_end)) >> return -ENOEXEC; /* No interpreter name found */ >> - i_name = cp; >> + >> + /* Is there an optional argument? */ >> i_arg = NULL; >> - for ( ; *cp && (*cp != ' ') && (*cp != '\t'); cp++) >> - /* nothing */ ; >> - while ((*cp == ' ') || (*cp == '\t')) >> - *cp++ = '\0'; >> - if (*cp) >> - i_arg = cp; >> + i_sep = next_terminator(i_name, i_end); >> + if (i_sep && (*i_sep != '\0')) >> + i_arg = next_non_spacetab(i_sep, i_end); >> + >> + /* >> + * If the script filename will be inaccessible after exec, typically >> + * because it is a "/dev/fd/<fd>/.." path against an O_CLOEXEC fd, give >> + * up now (on the assumption that the interpreter will want to load >> + * this file). >> + */ >> + if (bprm->interp_flags & BINPRM_FLAGS_PATH_INACCESSIBLE) >> + return -ENOENT; >> + >> + /* Release since we are not mapping a binary into memory. */ >> + allow_write_access(bprm->file); >> + fput(bprm->file); >> + bprm->file = NULL; >> + >> /* >> * OK, we've parsed out the interpreter name and >> * (optional) argument. >> @@ -121,7 +115,9 @@ static int load_script(struct linux_binprm *bprm) >> if (retval < 0) >> return retval; >> bprm->argc++; >> + *((char *)i_end) = '\0'; >> if (i_arg) { >> + *((char *)i_sep) = '\0'; >> retval = copy_strings_kernel(1, &i_arg, bprm); >> if (retval < 0) >> return retval; > > I think this is all correct, though I'm always suspicious of my visual > inspection of string parsers. ;) > > I had a worry the \n was not handled correctly in some case. I.e. before > any \n was converted into \0, and so next_terminator() didn't need to > consider \n separately. (next_non_spacetab() doesn't care since \n and \0 > are both not ' ' nor '\t'.) For next_terminator(), though, I was worried > there was a case where *i_end == '\n', and next_terminator() > will return NULL instead of "last" due to *last being '\n' instead of > '\0', causing a problem, but you're using the adjusted i_end so I think > it's correct. And you've handled i_name == i_end. > > I will see if I can find my testing scripts I used when commit > b5372fe5dc84 originally landed to double-check... until then: > > Reviewed-by: Kees Cook <keescook@chromium.org>
On Mon, May 18, 2020 at 07:33:46PM -0500, Eric W. Biederman wrote: > > Most of the support for passing the file descriptor of an executable > to an interpreter already lives in the generic code and in binfmt_elf. > Rework the fields in binfmt_elf that deal with executable file > descriptor passing to make executable file descriptor passing a first > class concept. > > Move the fd_install from binfmt_misc into begin_new_exec after the new > creds have been installed. This means that accessing the file through > /proc/<pid>/fd/N is able to see the creds for the new executable > before allowing access to the new executables files. > > Performing the install of the executables file descriptor after > the point of no return also means that nothing special needs to > be done on error. The exiting of the process will close all > of it's open files. > > Move the would_dump from binfmt_misc into begin_new_exec right > after would_dump is called on the bprm->file. This makes it > obvious this case exists and that no nesting of bprm->file is > currently supported. > > In binfmt_misc the movement of fd_install into generic code means > that it's special error exit path is no longer needed. > > Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Yes, this is so much nicer. :) My head did spin a little between changing the management of bprm->executable between this patch and the next, but I'm okay now. ;) Reviewed-by: Kees Cook <keescook@chromium.org> nits/thoughts below... > [...] > diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h > index 8c7779d6bf19..653508b25815 100644 > --- a/include/linux/binfmts.h > +++ b/include/linux/binfmts.h > [...] > @@ -48,6 +51,7 @@ struct linux_binprm { > unsigned int taso:1; > #endif > unsigned int recursion_depth; /* only for search_binary_handler() */ > + struct file * executable; /* Executable to pass to the interpreter */ > struct file * file; > struct cred *cred; /* new credentials */ nit: can we fix the "* " stuff here? This should be *file and *executable. > [...] > @@ -69,10 +73,6 @@ struct linux_binprm { > #define BINPRM_FLAGS_ENFORCE_NONDUMP_BIT 0 > #define BINPRM_FLAGS_ENFORCE_NONDUMP (1 << BINPRM_FLAGS_ENFORCE_NONDUMP_BIT) > > -/* fd of the binary should be passed to the interpreter */ > -#define BINPRM_FLAGS_EXECFD_BIT 1 > -#define BINPRM_FLAGS_EXECFD (1 << BINPRM_FLAGS_EXECFD_BIT) > - > /* filename of the binary will be inaccessible after exec */ > #define BINPRM_FLAGS_PATH_INACCESSIBLE_BIT 2 > #define BINPRM_FLAGS_PATH_INACCESSIBLE (1 << BINPRM_FLAGS_PATH_INACCESSIBLE_BIT) nit: may as well renumber BINPRM_FLAGS_PATH_INACCESSIBLE_BIT to 1, they're not UAPI. And, actually, nothing uses the *_BIT defines, so probably the entire chunk of code could just be reduced to: /* either interpreter or executable was unreadable */ #define BINPRM_FLAGS_ENFORCE_NONDUMP BIT(0) /* filename of the binary will be inaccessible after exec */ #define BINPRM_FLAGS_PATH_INACCESSIBLE BIT(1) Though frankly, I wonder if interp_flags could just be removed in favor of two new bit members, especially since interp_data is gone: + /* Either interpreter or executable was unreadable. */ + nondumpable:1; + /* Filename of the binary will be inaccessible after exec. */ + path_inaccessible:1; ... - unsigned interp_flags; ...etc -- Kees Cook
On Tue, May 19, 2020 at 12:46 PM Kees Cook <keescook@chromium.org> wrote:
>
> Though frankly, I wonder if interp_flags could just be removed in favor
> of two new bit members, especially since interp_data is gone:
Yeah, I think that might be a good cleanup - but please keep it as a
separate thing at the end of the series (or maybe the beginning)
Linus
Linus Torvalds <torvalds@linux-foundation.org> writes:
> On Tue, May 19, 2020 at 12:46 PM Kees Cook <keescook@chromium.org> wrote:
>>
>> Though frankly, I wonder if interp_flags could just be removed in favor
>> of two new bit members, especially since interp_data is gone:
>
> Yeah, I think that might be a good cleanup - but please keep it as a
> separate thing at the end of the series (or maybe the beginning)
I will.
With a little care we can replace setting BINPRM_FLAGS_ENFORCE_NONDUMP
and clearing bprm->mm->dumpable.
Which is the direction I have been looking.
Now that I think about it I believe that the loop in exec_binprm should
be clearing BINPRM_FLAGS_PATH_INACCESSIBLE as it is only relevant to
fexec/execveat with a close on exec file descriptor.
Eric
On Mon, May 18, 2020 at 07:34:19PM -0500, Eric W. Biederman wrote: > > Recursion in kernel code is generally a bad idea as it can overflow > the kernel stack. Recursion in exec also hides that the code is > looping and that the loop changes bprm->file. > > Instead of recursing in search_binary_handler have the methods that > would recurse set bprm->interpreter and return 0. Modify exec_binprm > to loop when bprm->interpreter is set. Consolidate all of the > reassignments of bprm->file in that loop to make it clear what is > going on. > > The structure of the new loop in exec_binprm is that all errors return > immediately, while successful completion (ret == 0 && > !bprm->interpreter) just breaks out of the loop and runs what > exec_bprm has always run upon successful completion. > > Fail if the an interpreter is being call after execfd has been set. > The code has never properly handled an interpreter being called with > execfd being set and with reassignments of bprm->file and the > assignment of bprm->executable in generic code it has finally become > possible to test and fail when if this problematic condition happens. > > With the reassignments of bprm->file and the assignment of > bprm->executable moved into the generic code add a test to see if > bprm->executable is being reassigned. > > In search_binary_handler remove the test for !bprm->file. With all > reassignments of bprm->file moved to exec_binprm bprm->file can never > be NULL in search_binary_handler. > > Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Lovely! Reviewed-by: Kees Cook <keescook@chromium.org> I spent some time following the file lifetimes of deny/allow_write_access() and the fget/fput() paths. It all looks correct to me; it's tricky (especially bprm->executable) but so very much cleaner than before. :) The only suggestion I could come up with is more comments (surprise) to help anyone new to this loop realize what the "common" path is (and similarly, a compiler hint too): diff --git a/fs/exec.c b/fs/exec.c index a9f421ec9e27..738051a698e1 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1790,15 +1790,19 @@ static int exec_binprm(struct linux_binprm *bprm) /* This allows 4 levels of binfmt rewrites before failing hard. */ for (depth = 0;; depth++) { struct file *exec; + if (depth > 5) return -ELOOP; ret = search_binary_handler(bprm); + /* Unrecoverable error, give up. */ if (ret < 0) return ret; - if (!bprm->interpreter) + /* Found final handler, start execution. */ + if (likely(!bprm->interpreter)) break; + /* Found an interpreter, so try again and attempt to run it. */ exec = bprm->file; bprm->file = bprm->interpreter; bprm->interpreter = NULL; -- Kees Cook
On Tue, 19 May 2020, Kees Cook wrote:
> > /* SELinux context only depends on initial program or script and not
> > * the script interpreter */
> > - if (bprm->called_set_creds)
> > - return 0;
> >
> > old_tsec = selinux_cred(current_cred());
> > new_tsec = selinux_cred(bprm->cred);
>
> As you've done in the other LSMs, I think this comment can be removed
> (or moved to the top of the function) too.
I'd prefer moved to top of the function.
--
James Morris
<jmorris@namei.org>
On Mon, 18 May 2020, Eric W. Biederman wrote:
>
> The code in prepare_binary_handler needs to be run every time
> search_binary_handler is called so move the call into search_binary_handler
> itself to make the code simpler and easier to understand.
>
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Nice cleanup.
Reviewed-by: James Morris <jamorris@linux.microsoft.com>
--
James Morris
<jmorris@namei.org>
On Mon, 18 May 2020, Eric W. Biederman wrote:
> diff --git a/fs/exec.c b/fs/exec.c
> index 9e70da47f8d9..8e3b93d51d31 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1366,7 +1366,7 @@ int begin_new_exec(struct linux_binprm * bprm)
> * the final state of setuid/setgid/fscaps can be merged into the
> * secureexec flag.
> */
> - bprm->secureexec |= bprm->cap_elevated;
> + bprm->secureexec |= bprm->active_secureexec;
Which kernel tree are these patches for? Seems like begin_new_exec() is
from a prerequisite patchset.
--
James Morris
<jmorris@namei.org>
On Mon, May 18, 2020 at 07:29:00PM -0500, Eric W. Biederman wrote: > arch/alpha/kernel/binfmt_loader.c | 11 +---- > fs/binfmt_elf.c | 4 +- > fs/binfmt_elf_fdpic.c | 4 +- > fs/binfmt_em86.c | 13 +---- > fs/binfmt_misc.c | 69 ++++----------------------- > fs/binfmt_script.c | 82 ++++++++++++++------------------ > fs/exec.c | 97 ++++++++++++++++++++++++++------------ > include/linux/binfmts.h | 36 ++++++-------- > include/linux/lsm_hook_defs.h | 3 +- > include/linux/lsm_hooks.h | 52 +++++++++++--------- > include/linux/security.h | 14 ++++-- > kernel/cred.c | 3 ++ > security/apparmor/domain.c | 7 +-- > security/apparmor/include/domain.h | 2 +- > security/apparmor/lsm.c | 2 +- > security/commoncap.c | 9 ++-- > security/security.c | 9 +++- > security/selinux/hooks.c | 8 ++-- > security/smack/smack_lsm.c | 9 ++-- > security/tomoyo/tomoyo.c | 12 ++--- > 20 files changed, 202 insertions(+), 244 deletions(-) Oh, BTW, heads up on this (trivially but annoyingly) conflicting with the copy_strings_kernel/copy_string/kernel change: https://ozlabs.org/~akpm/mmotm/broken-out/exec-simplify-the-copy_strings_kernel-calling-convention.patch Is it worth pulling that and these into your tree? https://ozlabs.org/~akpm/mmotm/broken-out/exec-open-code-copy_string_kernel.patch https://ozlabs.org/~akpm/mmotm/broken-out/umh-fix-refcount-underflow-in-fork_usermode_blob.patch -- Kees Cook
On 5/18/20 7:33 PM, Eric W. Biederman wrote: > > Most of the support for passing the file descriptor of an executable > to an interpreter already lives in the generic code and in binfmt_elf. > Rework the fields in binfmt_elf that deal with executable file > descriptor passing to make executable file descriptor passing a first > class concept. I was reading this to try to figure out how to do execve(NULL, argv[], envp) to re-exec self after a vfork() in a chroot with no /proc, and hit the most trivial quibble ever: > --- a/fs/exec.c > +++ b/fs/exec.c > @@ -1323,7 +1323,10 @@ int begin_new_exec(struct linux_binprm * bprm) > */ > set_mm_exe_file(bprm->mm, bprm->file); > > + /* If the binary is not readable than enforce mm->dumpable=0 */ then Rob
James Morris <jmorris@namei.org> writes:
> On Mon, 18 May 2020, Eric W. Biederman wrote:
>
>> diff --git a/fs/exec.c b/fs/exec.c
>> index 9e70da47f8d9..8e3b93d51d31 100644
>> --- a/fs/exec.c
>> +++ b/fs/exec.c
>> @@ -1366,7 +1366,7 @@ int begin_new_exec(struct linux_binprm * bprm)
>> * the final state of setuid/setgid/fscaps can be merged into the
>> * secureexec flag.
>> */
>> - bprm->secureexec |= bprm->cap_elevated;
>> + bprm->secureexec |= bprm->active_secureexec;
>
> Which kernel tree are these patches for? Seems like begin_new_exec() is
> from a prerequisite patchset.
The base is:
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git exec-next
I should have mentioned. I am several round deep in cleaning up exec
already.
begin_new_exec is essentially forget_old_exec.
Eric
Kees Cook <keescook@chromium.org> writes:
> On Mon, May 18, 2020 at 07:29:00PM -0500, Eric W. Biederman wrote:
>> arch/alpha/kernel/binfmt_loader.c | 11 +----
>> fs/binfmt_elf.c | 4 +-
>> fs/binfmt_elf_fdpic.c | 4 +-
>> fs/binfmt_em86.c | 13 +----
>> fs/binfmt_misc.c | 69 ++++-----------------------
>> fs/binfmt_script.c | 82 ++++++++++++++------------------
>> fs/exec.c | 97 ++++++++++++++++++++++++++------------
>> include/linux/binfmts.h | 36 ++++++--------
>> include/linux/lsm_hook_defs.h | 3 +-
>> include/linux/lsm_hooks.h | 52 +++++++++++---------
>> include/linux/security.h | 14 ++++--
>> kernel/cred.c | 3 ++
>> security/apparmor/domain.c | 7 +--
>> security/apparmor/include/domain.h | 2 +-
>> security/apparmor/lsm.c | 2 +-
>> security/commoncap.c | 9 ++--
>> security/security.c | 9 +++-
>> security/selinux/hooks.c | 8 ++--
>> security/smack/smack_lsm.c | 9 ++--
>> security/tomoyo/tomoyo.c | 12 ++---
>> 20 files changed, 202 insertions(+), 244 deletions(-)
>
> Oh, BTW, heads up on this (trivially but annoyingly) conflicting with
> the copy_strings_kernel/copy_string/kernel change:
>
> https://ozlabs.org/~akpm/mmotm/broken-out/exec-simplify-the-copy_strings_kernel-calling-convention.patch
>
> Is it worth pulling that and these into your tree?
>
> https://ozlabs.org/~akpm/mmotm/broken-out/exec-open-code-copy_string_kernel.patch
>
> https://ozlabs.org/~akpm/mmotm/broken-out/umh-fix-refcount-underflow-in-fork_usermode_blob.patch
Good question. It is part of the greater set_fs removal work, and I
don't want to mess that up.
I would love to give copy_string_kernel a length parameter so
binfmt_script did not have to modify it's buffer or copy the string,
before calling copy_string_kernel.
Hmm. I already have to call strdup on i_name in brpm_change_interp.
So I probably just want to bite the bullet and figure out a way to do
strdup earlier.
So unless it makes things easier for Andrew I think it is probably
easier to live with the conflict for now, and use this conversation
as inspiration for my next round of cleanups of binfmt_misc.
Eric
Rob Landley <rob@landley.net> writes: > On 5/18/20 7:33 PM, Eric W. Biederman wrote: >> >> Most of the support for passing the file descriptor of an executable >> to an interpreter already lives in the generic code and in binfmt_elf. >> Rework the fields in binfmt_elf that deal with executable file >> descriptor passing to make executable file descriptor passing a first >> class concept. > > I was reading this to try to figure out how to do execve(NULL, argv[], envp) to > re-exec self after a vfork() in a chroot with no /proc, and hit the most trivial > quibble ever: We have /proc/self/exe today. If I understand you correctly you would like to do the equivalent of 'execve("/proc/self/exe", argv[], envp[])' without having proc mounted. The file descriptor is stored in mm->exe_file. Probably the most straight forward implementation is to allow execveat(AT_EXE_FILE, ...). You can look at binfmt_misc for how to reopen an open file descriptor. >> --- a/fs/exec.c >> +++ b/fs/exec.c >> @@ -1323,7 +1323,10 @@ int begin_new_exec(struct linux_binprm * bprm) >> */ >> set_mm_exe_file(bprm->mm, bprm->file); >> >> + /* If the binary is not readable than enforce mm->dumpable=0 */ > > then It took me a minute yes good catch. Eric
Kees Cook <keescook@chromium.org> writes:
> On Tue, May 19, 2020 at 02:03:23PM -0500, Eric W. Biederman wrote:
>> Kees Cook <keescook@chromium.org> writes:
>>
>> > On Mon, May 18, 2020 at 07:31:14PM -0500, Eric W. Biederman wrote:
>> >> [...]
>> >> diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
>> >> index d1217fcdedea..8605ab4a0f89 100644
>> >> --- a/include/linux/binfmts.h
>> >> +++ b/include/linux/binfmts.h
>> >> @@ -27,10 +27,10 @@ struct linux_binprm {
>> >> unsigned long argmin; /* rlimit marker for copy_strings() */
>> >> unsigned int
>> >> /*
>> >> - * True if most recent call to cap_bprm_set_creds
>> >> + * True if most recent call to security_bprm_set_creds
>> >> * resulted in elevated privileges.
>> >> */
>> >> - cap_elevated:1,
>> >> + active_secureexec:1,
>> >
>> > Also, I'd like it if this comment could be made more verbose as well, for
>> > anyone trying to understand the binfmt execution flow for the first time.
>> > Perhaps:
>> >
>> > /*
>> > * Must be set True during the any call to
>> > * bprm_set_creds hook where the execution would
>> > * reuslt in elevated privileges. (The hook can be
>> > * called multiple times during nested interpreter
>> > * resolution across binfmt_script, binfmt_misc, etc).
>> > */
>> Well it is not during but after the call that it becomes true.
>> I think most recent covers the case of multiple calls.
>
> I'm thinking of an LSM writing reading these comments to decide what
> they need to do to the flags, so it's a direction to them to set it to
> true if they have determined that privilege was gained. (Though in
> theory, this is all moot since only the commoncap hook cares.)
The comments for an LSM writer are in include/linux/lsm_hooks.h
* @bprm_repopulate_creds:
* Assuming that the relevant bits of @bprm->cred->security have been
* previously set, examine @bprm->file and regenerate them. This is
* so that the credentials derived from the interpreter the code is
* actually going to run are used rather than credentials derived
* from a script. This done because the interpreter binary needs to
* reopen script, and may end up opening something completely different.
* This hook may also optionally check permissions (e.g. for
* transitions between security domains).
* The hook must set @bprm->active_secureexec to 1 if AT_SECURE should be set to
* request libc enable secure mode.
* @bprm contains the linux_binprm structure.
* Return 0 if the hook is successful and permission is granted.
I hope that is detailed enough.
I will leave the rest of the comments for the maintainer of the code.
I really don't think we should duplicate the prescriptive comments in
multiple locations.
Eric
On Wed, May 20, 2020 at 03:22:38PM -0500, Eric W. Biederman wrote:
> Kees Cook <keescook@chromium.org> writes:
>
> > On Tue, May 19, 2020 at 02:03:23PM -0500, Eric W. Biederman wrote:
> >> Kees Cook <keescook@chromium.org> writes:
> >>
> >> > On Mon, May 18, 2020 at 07:31:14PM -0500, Eric W. Biederman wrote:
> >> >> [...]
> >> >> diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
> >> >> index d1217fcdedea..8605ab4a0f89 100644
> >> >> --- a/include/linux/binfmts.h
> >> >> +++ b/include/linux/binfmts.h
> >> >> @@ -27,10 +27,10 @@ struct linux_binprm {
> >> >> unsigned long argmin; /* rlimit marker for copy_strings() */
> >> >> unsigned int
> >> >> /*
> >> >> - * True if most recent call to cap_bprm_set_creds
> >> >> + * True if most recent call to security_bprm_set_creds
> >> >> * resulted in elevated privileges.
> >> >> */
> >> >> - cap_elevated:1,
> >> >> + active_secureexec:1,
> >> >
> >> > Also, I'd like it if this comment could be made more verbose as well, for
> >> > anyone trying to understand the binfmt execution flow for the first time.
> >> > Perhaps:
> >> >
> >> > /*
> >> > * Must be set True during the any call to
> >> > * bprm_set_creds hook where the execution would
> >> > * reuslt in elevated privileges. (The hook can be
> >> > * called multiple times during nested interpreter
> >> > * resolution across binfmt_script, binfmt_misc, etc).
> >> > */
> >> Well it is not during but after the call that it becomes true.
> >> I think most recent covers the case of multiple calls.
> >
> > I'm thinking of an LSM writing reading these comments to decide what
> > they need to do to the flags, so it's a direction to them to set it to
> > true if they have determined that privilege was gained. (Though in
> > theory, this is all moot since only the commoncap hook cares.)
>
> The comments for an LSM writer are in include/linux/lsm_hooks.h
>
> * @bprm_repopulate_creds:
> * Assuming that the relevant bits of @bprm->cred->security have been
> * previously set, examine @bprm->file and regenerate them. This is
> * so that the credentials derived from the interpreter the code is
> * actually going to run are used rather than credentials derived
> * from a script. This done because the interpreter binary needs to
> * reopen script, and may end up opening something completely different.
> * This hook may also optionally check permissions (e.g. for
> * transitions between security domains).
> * The hook must set @bprm->active_secureexec to 1 if AT_SECURE should be set to
> * request libc enable secure mode.
> * @bprm contains the linux_binprm structure.
> * Return 0 if the hook is successful and permission is granted.
>
> I hope that is detailed enough.
>
> I will leave the rest of the comments for the maintainer of the code.
>
> I really don't think we should duplicate the prescriptive comments in
> multiple locations.
Okay, that's fair enough. Thanks!
--
Kees Cook
I have pushed this out to: git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git exec-next I have collected up the acks and reviewed-by's, and fixed a couple of typos but that is it. If we need comment fixes or additional cleanups we can apply that on top of this series. This way the code can sit in linux-next until the merge window opens. Before I pushed this out I also tested this with Kees new test of binfmt_misc and did not find any problems. Eric The git range-diff of the changes I applied before pushing this out: 1: f6bb0d6563ca ! 1: 87b047d2be41 exec: Teach prepare_exec_creds how exec treats uids & gids @@ Commit message update bprm->cred are just need to handle special cases such as setuid exec and change of domains. + Link: https://lkml.kernel.org/r/871rng22dm.fsf_-_@x220.int.ebiederm.org + Acked-by: Linus Torvalds <torvalds@linux-foundation.org> + Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> ## kernel/cred.c ## 2: d3b3594be22f ! 2: b8bff599261c exec: Factor security_bprm_creds_for_exec out of security_bprm_set_creds @@ Commit message Add or upate comments a appropriate to bring them up to date and to reflect this change. + Link: https://lkml.kernel.org/r/87v9kszrzh.fsf_-_@x220.int.ebiederm.org + Acked-by: Linus Torvalds <torvalds@linux-foundation.org> + Acked-by: Casey Schaufler <casey@schaufler-ca.com> # For the LSM and Smack bits + Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> ## fs/exec.c ## 3: 65c651a77967 ! 3: d9d67b76eed6 exec: Convert security_bprm_set_creds into security_bprm_repopulate_creds @@ Commit message In short two renames and a move in the location of initializing bprm->active_secureexec. + Link: https://lkml.kernel.org/r/87o8qkzrxp.fsf_-_@x220.int.ebiederm.org + Acked-by: Linus Torvalds <torvalds@linux-foundation.org> + Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> ## fs/exec.c ## 4: 6d0d5da2b45e ! 4: dbf17e846ea9 exec: Allow load_misc_binary to call prepare_binfmt unconditionally @@ Metadata Author: Eric W. Biederman <ebiederm@xmission.com> ## Commit message ## - exec: Allow load_misc_binary to call prepare_binfmt unconditionally + exec: Allow load_misc_binary to call prepare_binprm unconditionally Add a flag preserve_creds that binfmt_misc can set to prevent credentials from being updated. This allows binfmt_misc to always - call prepare_binfmt. Allowing the credential computation logic to be + call prepare_binprm. Allowing the credential computation logic to be consolidated. Not replacing the credentials with the interpreters credentials is @@ Commit message exec sees. Ref: c407c033de84 ("[PATCH] binfmt_misc: improve calculation of interpreter's credentials") + Link: https://lkml.kernel.org/r/87imgszrwo.fsf_-_@x220.int.ebiederm.org + Acked-by: Linus Torvalds <torvalds@linux-foundation.org> + Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> ## fs/binfmt_misc.c ## 5: af7db65c2483 ! 5: 8a8f3bb8ec41 exec: Move the call of prepare_binprm into search_binary_handler @@ Commit message search_binary_handler is called so move the call into search_binary_handler itself to make the code simpler and easier to understand. + Link: https://lkml.kernel.org/r/87d070zrvx.fsf_-_@x220.int.ebiederm.org + Acked-by: Linus Torvalds <torvalds@linux-foundation.org> + Reviewed-by: Kees Cook <keescook@chromium.org> + Reviewed-by: James Morris <jamorris@linux.microsoft.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> ## arch/alpha/kernel/binfmt_loader.c ## 6: 69fccdf33a87 ! 6: 01dbc34d75bf exec/binfmt_script: Don't modify bprm->buf and then return -ENOEXEC @@ Commit message has been take that the logic of the parsing code (short of replacing characters by '\0') remains the same. + Link: https://lkml.kernel.org/r/874ksczru6.fsf_-_@x220.int.ebiederm.org + Acked-by: Linus Torvalds <torvalds@linux-foundation.org> + Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> ## fs/binfmt_script.c ## 7: 30fe957c6dce ! 7: 6962a6b4de92 exec: Generic execfd support @@ Commit message In binfmt_misc the movement of fd_install into generic code means that it's special error exit path is no longer needed. + Link: https://lkml.kernel.org/r/87y2poyd91.fsf_-_@x220.int.ebiederm.org + Acked-by: Linus Torvalds <torvalds@linux-foundation.org> + Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> ## fs/binfmt_elf.c ## @@ fs/exec.c: int begin_new_exec(struct linux_binprm * bprm) */ set_mm_exe_file(bprm->mm, bprm->file); -+ /* If the binary is not readable than enforce mm->dumpable=0 */ ++ /* If the binary is not readable then enforce mm->dumpable=0 */ would_dump(bprm, bprm->file); + if (bprm->have_execfd) + would_dump(bprm, bprm->executable); 8: f0a27d0fde69 ! 8: 226ce5863881 exec: Remove recursion from search_binary_handler @@ Commit message reassignments of bprm->file moved to exec_binprm bprm->file can never be NULL in search_binary_handler. + Link: https://lkml.kernel.org/r/87sgfwyd84.fsf_-_@x220.int.ebiederm.org + Acked-by: Linus Torvalds <torvalds@linux-foundation.org> + Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> ## arch/alpha/kernel/binfmt_loader.c ##
On Wed, May 20, 2020 at 05:12:10PM -0500, Eric W. Biederman wrote: > > I have pushed this out to: > > git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git exec-next > > I have collected up the acks and reviewed-by's, and fixed a couple of > typos but that is it. Awesome! > If we need comment fixes or additional cleanups we can apply that on top > of this series. This way the code can sit in linux-next until the > merge window opens. > > Before I pushed this out I also tested this with Kees new test of > binfmt_misc and did not find any problems. Did this mean to say binfmt_script? It'd be nice to get a binfmt_misc test too, though. Thanks! -Kees -- Kees Cook
Kees Cook <keescook@chromium.org> writes:
> On Wed, May 20, 2020 at 05:12:10PM -0500, Eric W. Biederman wrote:
>>
>> I have pushed this out to:
>>
>> git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git exec-next
>>
>> I have collected up the acks and reviewed-by's, and fixed a couple of
>> typos but that is it.
>
> Awesome!
>
>> If we need comment fixes or additional cleanups we can apply that on top
>> of this series. This way the code can sit in linux-next until the
>> merge window opens.
>>
>> Before I pushed this out I also tested this with Kees new test of
>> binfmt_misc and did not find any problems.
>
> Did this mean to say binfmt_script? It'd be nice to get a binfmt_misc
> test too, though.
Yes. Sorry. I meant your binfmt_script test.
Eric
On 5/20/20 11:05 AM, Eric W. Biederman wrote: > Rob Landley <rob@landley.net> writes: > >> On 5/18/20 7:33 PM, Eric W. Biederman wrote: >>> >>> Most of the support for passing the file descriptor of an executable >>> to an interpreter already lives in the generic code and in binfmt_elf. >>> Rework the fields in binfmt_elf that deal with executable file >>> descriptor passing to make executable file descriptor passing a first >>> class concept. >> >> I was reading this to try to figure out how to do execve(NULL, argv[], envp) to >> re-exec self after a vfork() in a chroot with no /proc, and hit the most trivial >> quibble ever: > > We have /proc/self/exe today. Not when you first enter a container that's just created a new namespace, or initramfs first launches PID 1 and runs a shell script to set up the environment and your (subshell) and background& support only has vfork and not fork, or just plain "somebody did a chroot"... (Yes a nommu system with range registers can want _security_ without _address_translation_. Strange but true! I haven't actually sat down to try to implement nommu containers yet, but I've done worse things on many occasions. Remember: the S in IoT stands for Security.) > If I understand you correctly you would > like to do the equivalent of 'execve("/proc/self/exe", argv[], envp[])' > without having proc mounted. Toybox would _like_ proc mounted, but can't assume it. I'm writing a new bash-compatible shell with nommu support, which means in order to do subshell and background tasks if (!CONFIG_FORK) I need to create a pipe pair, vfork(), have the child exec itself to unblock the parent, and then read the context data that just got discarded through the pipe from the parent. ("Wheee." And you can quote me on that.) I've implemented that already (https://github.com/landley/toybox/blob/0.8.3/toys/pending/sh.c#L674 and reentry is L2516, yeah it's a work in progress), but "exec self" requires /proc/self/exe and since I gave up on getting http://lkml.iu.edu/hypermail/linux/kernel/2005.1/09399.html in (I should apologize to Randy but I just haven't got the spoons to face https://landley.net/notes-2017.html#14-09-2017 again; three strikes and the patch stays out) I need /init to be a shell script to set up an initramfs that's made by pointing CONFIG_INITRAMFS_SOURCE at a directory that was made without running the build as root, because there's no /dev/console and you can't mknod as a non-root user. Maybe instead of fixing CONFIG_DEVTMPFS_MOUNT to apply to initramfs I could instead add a CONFIG_INITRAMFS_EXTRA=blah.txt to usr/{Kconfig,Makefile} to append user-supplied extra lines to the end of the gen_initramfs.sh output and make a /dev/console that way (kinda like genext2fs and mksquashfs), but getting that in through the linux-kernel bureaucracy means consulting a 27 step checklist supplementing the basic 17 step submission procedure (with bibliographic references) explaining how to fill out the forms, perform the validation steps, go through the proper channels, and get the appropriate series of signatures and approvals, and I just haven't got the stomach for it anymore. I was participating here as a hobbyist. Linux-kernel has aged into a rigid bureaucracy. It's no fun anymore. Which means any kernel patch I write I have to forward port regularly, sometimes for a very long time. Heck, I gave linux-kernel three strikes at miniconfig fifteen years ago now: http://lkml.iu.edu/hypermail/linux/kernel/0511.2/0479.html https://lwn.net/Articles/161086/ https://lkml.org/lkml/2006/7/6/404 And was still maintaining it out of tree a decade later: https://landley.net/aboriginal/FAQ.html#dev_miniconfig https://github.com/landley/aboriginal/blob/master/more/miniconfig.sh These days I've moved on to a microconfig format that mostly fits on one line, ala the KCONF= stuff in toybox's built in: https://github.com/landley/toybox/blob/master/scripts/mkroot.sh#L136 For example, the User Mode Linux miniconfig from my ancient https://landley.net/writing/docs/UML.html would translate to microconfig as: BINFMT_ELF,HOSTFS,LBD,BLK_DEV,BLK_DEV_LOOP,STDERR_CONSOLE,UNIX98_PTYS,EXT2_FS The current kernel also needs "64BIT" because my host toolchain doesn't have the -m32 headers installed, but then it builds fine ala: make ARCH=um allnoconfig KCONFIG_ALLCONFIG=<(echo BINFMT_ELF,HOSTFS,LBD,BLK_DEV,BLK_DEV_LOOP,STDERR_CONSOLE,UNIX98_PTYS,EXT2_FS,64BIT | sed -E 's/([^,]*)(,|$)/CONFIG_\1=y\n/g') Of course running the resulting ./linux says: Checking PROT_EXEC mmap in /dev/shm...Operation not permitted /dev/shm must be not mounted noexec But *shrug*, Devuan did that not me. I haven't really used UML since QEMU started working. Shouldn't the old "create file, map file, delete file" trick stop flushing the data to backing store no matter where the file lives? I mean, that trick dates back to the VAX, and we argued about it on the UML list a decade ago (circa https://sourceforge.net/p/user-mode-linux/mailman/message/14000710/) but... fixing random things that are wrong with Linux is not my problem anymore. I'm only in this thread because I'm cc'd. Spending five years repeatedly posting perl removal patches and ending up with intentional sabotage at the end from the guy who'd added perl in the first place when the Gratuitous Build Dependency Removal patches finally got traction (https://landley.net/notes-2013.html#28-03-2013) kinda put me off doing that again. > The file descriptor is stored in mm->exe_file. > Probably the most straight forward implementation is to allow > execveat(AT_EXE_FILE, ...). Cool, that works. > You can look at binfmt_misc for how to reopen an open file descriptor. Added to the todo heap. Thanks, Rob
Rob Landley <rob@landley.net> writes: > On 5/20/20 11:05 AM, Eric W. Biederman wrote: > Toybox would _like_ proc mounted, but can't assume it. I'm writing a new > bash-compatible shell with nommu support, which means in order to do subshell > and background tasks if (!CONFIG_FORK) I need to create a pipe pair, vfork(), > have the child exec itself to unblock the parent, and then read the context data > that just got discarded through the pipe from the parent. ("Wheee." And you can > quote me on that.) Do you have clone(CLONE_VM) ? If my quick skim of the kernel sources is correct that should be the same as vfork except without causing the parent to wait for you. Which I think would remove the need to reexec yourself. >> The file descriptor is stored in mm->exe_file. >> Probably the most straight forward implementation is to allow >> execveat(AT_EXE_FILE, ...). > > Cool, that works. > >> You can look at binfmt_misc for how to reopen an open file descriptor. > > Added to the todo heap. Yes I don't think it would be a lot of code. I think you might be better served with clone(CLONE_VM) as it doesn't block so you don't need to feed yourself your context over a pipe. Eric
On 5/21/20 10:28 PM, Eric W. Biederman wrote: > > Rob Landley <rob@landley.net> writes: > >> On 5/20/20 11:05 AM, Eric W. Biederman wrote: > >> Toybox would _like_ proc mounted, but can't assume it. I'm writing a new >> bash-compatible shell with nommu support, which means in order to do subshell >> and background tasks if (!CONFIG_FORK) I need to create a pipe pair, vfork(), >> have the child exec itself to unblock the parent, and then read the context data >> that just got discarded through the pipe from the parent. ("Wheee." And you can >> quote me on that.) > > Do you have clone(CLONE_VM) ? If my quick skim of the kernel sources is > correct that should be the same as vfork except without causing the > parent to wait for you. Which I think would remove the need to reexec > yourself. As with perpetual motion, that only seems like it would work if you don't understand what's going on. A nommu system uses physical addresses, not virtual ones, so every process sees the same addresses. So if I allocate a new block of memory and memcpy the contents of the old one into the new one, any pointers in the copy point back into the ORIGINAL block of memory. Trying to adjust the pointers in the copy is the exact same problem as trying to do garbage collection in C: it's an AI complete problem. Any attempt to "implement a full fork" on nommu hits this problem: copying an existing mapping to a new address range means any address values in the new mapping point into the OLD mapping. Things like fdpic fix this up at exec time (traversing elf tables and relocating), but not at runtime. If you can solve the "relocate at runtime all addresses within an existing mapping, and all other mappings that might point to this mapping, including local variables on the stack that point to a structure member or halfway into a string rather than the start of an allocation, without adjusting unrelated values coincidentally within RANGE of a mapping" problem, THEN you can fork on a nommu system. What vfork() does is pause the parent and have the child continue AS the parent for a bit (with the system call returning 0). The child starts with all the same memory mappings the parent has (usually not even a new stack). The child has a new PID and new resources like its own file descriptor table so close() and open() don't affect the parent, but if you change a global that's visible to the parent when it resumes (ant often local variables too: don't return from the function that called vfork() because if you DON'T have a new stack it'll stomp the return address the parent needs when IT does it). If the child calls malloc() the parent needs to free it because it's same heap (because same mapping of the same physical memory). Then when the child is ready to discard all those mappings (due to calling either execve() or _exit(), those are the only two options), the parent resumes from where it left off with the PID of the child as the system call return value. The reason the child pauses the parent is so only one process is ever using those mappings at a given time. Otherwise they're acting like threads without locking, and usually both are sharing a stack. P.S. You can use threads _instead_ of fork for some stuff on nommu, but that's its own can of worms. You still need to vfork() when you do create a child process you're going to exec, so it doesn't go away, you're just requiring multiple techniques simultaneously to handle a special case. P.P.S. vfork() is useful on mmu systems to solve the "don't fork from a thread" problem. You can vfork() from a thread cheaply and reliably and it only pauses the one thread you forked from, not every thread in the whole process. If you fork() from a heavily threadded process you can cause a multi-milisecond latency spike because even with an mmu the copy on write "keep track of what's shared by what" generally can't handle the "threads AND processes sharing mappings" case, so it just gives up and copies it all at fork time, in one go, holding a big lock while doing so. This causes a large latency spike which vfork() avoids. (And can cause a large wasteful allocation and memory dirtying which is immediately freed.) >>> The file descriptor is stored in mm->exe_file. >>> Probably the most straight forward implementation is to allow >>> execveat(AT_EXE_FILE, ...). >> >> Cool, that works. >> >>> You can look at binfmt_misc for how to reopen an open file descriptor. >> >> Added to the todo heap. > > Yes I don't think it would be a lot of code. > > I think you might be better served with clone(CLONE_VM) as it doesn't > block so you don't need to feed yourself your context over a pipe. Except that doesn't fix it. Yes I could use threads instead, but the cure is worse than the disease and the result is your shell background processes are threads rather than independent processes (is $$ reporting PID or TID, I really don't want to go there). > Eric Rob
Rob Landley <rob@landley.net> writes:
> On 5/21/20 10:28 PM, Eric W. Biederman wrote:
>>
>> Rob Landley <rob@landley.net> writes:
>>
>>> On 5/20/20 11:05 AM, Eric W. Biederman wrote:
>>
>>>> The file descriptor is stored in mm->exe_file.
>>>> Probably the most straight forward implementation is to allow
>>>> execveat(AT_EXE_FILE, ...).
>>>
>>> Cool, that works.
>>>
>>>> You can look at binfmt_misc for how to reopen an open file descriptor.
>>>
>>> Added to the todo heap.
>>
>> Yes I don't think it would be a lot of code.
>>
>> I think you might be better served with clone(CLONE_VM) as it doesn't
>> block so you don't need to feed yourself your context over a pipe.
>
> Except that doesn't fix it.
>
> Yes I could use threads instead, but the cure is worse than the disease and the
> result is your shell background processes are threads rather than independent
> processes (is $$ reporting PID or TID, I really don't want to go
> there).
I was just suggesting clone(CLONE_VM) because it creates a thread in a
separate process. Which on nommu sounds like it could be almost exactly
what you want.
If you need the separate copies of all of your global variables etc,
re-exec'ing your self could be the easier way to go.
Eric
Recomputing the uids, gids, capabilities, and related flags each time a new bprm->file is set is error prone, and as it turns out unnecessary. Further our decisions on when to clear personality bits and when to tell userspace privileges have been gained so please be extra careful, is imperfect and our current code overshoots in inconsistent ways making it hard to understand what is happening, and why. Building upon my previous exec clean up work this set of changes moves the bprm->cred calculations a little later so they only need to be done once, moves all of the uid and gid handling into bprm_fill_uid, and then cleans up setting secureexec and per_clear so they happen when they make sense from a semantic perspective. One of the largest challenges is dealing with how we revert the credential change if it is discovered the process calling exec is ptraced and the tracer does not have enough credentials. It looks like that code was tacked on as an after thought to a bug fix that went into 2.4.0-prerelease. I don't know if we have ever gotten all of the details just right when the credentials are rolled back. So this set of changes causes the credentials not to be changed when ptraced, instead of attempting to rollback the credential change. Folks please give this code a review and let me know if you see anything. Eric W. Biederman (11): exec: Reduce bprm->per_clear to a single bit exec: Introduce active_per_clear the per file version of per_clear exec: Compute file based creds only once exec: Move uid/gid handling from creds_from_file into bprm_fill_uid exec: In bprm_fill_uid use CAP_SETGID to see if a gid change is safe exec: Don't set secureexec when the uid or gid changes are abandoned exec: Set saved, fs, and effective ids together in bprm_fill_uid exec: In bprm_fill_uid remove unnecessary no new privs check exec: In bprm_fill_uid only set per_clear when honoring suid or sgid exec: In bprm_fill_uid set secureexec at same time as per_clear exec: Remove the label after_setid from bprm_fill_uid fs/binfmt_misc.c | 2 +- fs/exec.c | 95 +++++++++++++++++++++++++------------------ include/linux/binfmts.h | 13 +++--- include/linux/lsm_hook_defs.h | 2 +- include/linux/lsm_hooks.h | 21 ++++++---- include/linux/security.h | 8 ++-- security/apparmor/domain.c | 2 +- security/commoncap.c | 37 ++++++----------- security/security.c | 4 +- security/selinux/hooks.c | 2 +- security/smack/smack_lsm.c | 2 +- 11 files changed, 98 insertions(+), 90 deletions(-) --- This builds upon my previous exec cleanup work at: git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git exec-next Thank you, Eric
The bprm->per_clear field only takes the values 0 and PER_CLEAR_ON_SETID. Reduce the field to a signle bit to make it clear that the only question is should the dangerous personality bits be cleared or not. Update the documentation of the security lsm hooks. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- fs/exec.c | 7 ++++--- include/linux/binfmts.h | 4 +++- include/linux/lsm_hooks.h | 4 ++++ security/apparmor/domain.c | 2 +- security/commoncap.c | 2 +- security/selinux/hooks.c | 2 +- security/smack/smack_lsm.c | 2 +- 7 files changed, 15 insertions(+), 8 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index c3c879a55d65..51fab62b9fca 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1354,7 +1354,8 @@ int begin_new_exec(struct linux_binprm * bprm) me->flags &= ~(PF_RANDOMIZE | PF_FORKNOEXEC | PF_KTHREAD | PF_NOFREEZE | PF_NO_SETAFFINITY); flush_thread(); - me->personality &= ~bprm->per_clear; + if (bprm->per_clear) + me->personality &= ~PER_CLEAR_ON_SETID; /* * We have to apply CLOEXEC before we change whether the process is @@ -1628,12 +1629,12 @@ static void bprm_fill_uid(struct linux_binprm *bprm) return; if (mode & S_ISUID) { - bprm->per_clear |= PER_CLEAR_ON_SETID; + bprm->per_clear = 1; bprm->cred->euid = uid; } if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) { - bprm->per_clear |= PER_CLEAR_ON_SETID; + bprm->per_clear = 1; bprm->cred->egid = gid; } } diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h index 7fc05929c967..e7959a6a895a 100644 --- a/include/linux/binfmts.h +++ b/include/linux/binfmts.h @@ -26,6 +26,9 @@ struct linux_binprm { unsigned long p; /* current top of mem */ unsigned long argmin; /* rlimit marker for copy_strings() */ unsigned int + /* Should unsafe personality bits be cleared? */ + per_clear:1, + /* Should an execfd be passed to userspace? */ have_execfd:1, @@ -55,7 +58,6 @@ struct linux_binprm { struct file * file; struct cred *cred; /* new credentials */ int unsafe; /* how unsafe this exec is (mask of LSM_UNSAFE_*) */ - unsigned int per_clear; /* bits to clear in current->personality */ int argc, envc; const char * filename; /* Name of binary as seen by procps */ const char * interp; /* Name of the binary really executed. Most diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index d618ecc4d660..0ca68ad53592 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -42,6 +42,8 @@ * (e.g. for transitions between security domains). * The hook must set @bprm->secureexec to 1 if AT_SECURE should be set to * request libc enable secure mode. + * The hook must set @bprm->per_clear to 1 if the dangerous personality + * bits must be cleared from current->personality. * @bprm contains the linux_binprm structure. * Return 0 if the hook is successful and permission is granted. * @bprm_repopulate_creds: @@ -55,6 +57,8 @@ * transitions between security domains). * The hook must set @bprm->active_secureexec to 1 if AT_SECURE should be set to * request libc enable secure mode. + * The hook must set @bprm->per_clear to 1 if the dangerous personality + * bits must be cleared from current->personality. * @bprm contains the linux_binprm structure. * Return 0 if the hook is successful and permission is granted. * @bprm_check_security: diff --git a/security/apparmor/domain.c b/security/apparmor/domain.c index 0b870a647488..c6d00735a40a 100644 --- a/security/apparmor/domain.c +++ b/security/apparmor/domain.c @@ -962,7 +962,7 @@ int apparmor_bprm_creds_for_exec(struct linux_binprm *bprm) aa_label_printk(new, GFP_KERNEL); dbg_printk("\n"); } - bprm->per_clear |= PER_CLEAR_ON_SETID; + bprm->per_clear = 1; } aa_put_label(cred_label(bprm->cred)); /* transfer reference, released when cred is freed */ diff --git a/security/commoncap.c b/security/commoncap.c index 77b04cb6feac..48b556046483 100644 --- a/security/commoncap.c +++ b/security/commoncap.c @@ -826,7 +826,7 @@ int cap_bprm_repopulate_creds(struct linux_binprm *bprm) /* if we have fs caps, clear dangerous personality flags */ if (__cap_gained(permitted, new, old)) - bprm->per_clear |= PER_CLEAR_ON_SETID; + bprm->per_clear = 1; /* Don't let someone trace a set[ug]id/setpcap binary with the revised * credentials unless they have the appropriate permit. diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index 718345dd76bb..6bea1b879fdb 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -2385,7 +2385,7 @@ static int selinux_bprm_creds_for_exec(struct linux_binprm *bprm) } /* Clear any possibly unsafe personality bits on exec: */ - bprm->per_clear |= PER_CLEAR_ON_SETID; + bprm->per_clear = 1; /* Enable secure mode for SIDs transitions unless the noatsecure permission is granted between diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c index 0ac8f4518d07..a0d2fad27b33 100644 --- a/security/smack/smack_lsm.c +++ b/security/smack/smack_lsm.c @@ -933,7 +933,7 @@ static int smack_bprm_creds_for_exec(struct linux_binprm *bprm) return -EPERM; bsp->smk_task = isp->smk_task; - bprm->per_clear |= PER_CLEAR_ON_SETID; + bprm->per_clear = 1; /* Decide if this is a secure exec. */ if (bsp->smk_task != bsp->smk_forked) -- 2.25.0
When the credentials have been recomputed per file the per_clear status has not been recomputed. Update the per file calcuations to recompute per_clear on a per file basis in a separate variable and to combine that variable into the final per_clear value. This makes which personality bits are clear not depend on the permissions of shell scripts with interpreters, but instead only on the final bprm->file that bprm_fill_uid and security_bprm_repopulate_creds are called upon. History Tree: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git Fixes: 1bb0fa189c6a ("[PATCH] NX: clean up legacy binary support") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- fs/exec.c | 7 ++++--- include/linux/binfmts.h | 3 +++ include/linux/lsm_hooks.h | 2 +- security/commoncap.c | 2 +- 4 files changed, 9 insertions(+), 5 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index 51fab62b9fca..221d12dcaa3e 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1354,7 +1354,7 @@ int begin_new_exec(struct linux_binprm * bprm) me->flags &= ~(PF_RANDOMIZE | PF_FORKNOEXEC | PF_KTHREAD | PF_NOFREEZE | PF_NO_SETAFFINITY); flush_thread(); - if (bprm->per_clear) + if (bprm->per_clear || bprm->active_per_clear) me->personality &= ~PER_CLEAR_ON_SETID; /* @@ -1629,12 +1629,12 @@ static void bprm_fill_uid(struct linux_binprm *bprm) return; if (mode & S_ISUID) { - bprm->per_clear = 1; + bprm->active_per_clear = 1; bprm->cred->euid = uid; } if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) { - bprm->per_clear = 1; + bprm->active_per_clear = 1; bprm->cred->egid = gid; } } @@ -1655,6 +1655,7 @@ static int prepare_binprm(struct linux_binprm *bprm) /* Recompute parts of bprm->cred based on bprm->file */ bprm->active_secureexec = 0; + bprm->active_per_clear = 0; bprm_fill_uid(bprm); retval = security_bprm_repopulate_creds(bprm); if (retval) diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h index e7959a6a895a..89231a689957 100644 --- a/include/linux/binfmts.h +++ b/include/linux/binfmts.h @@ -26,6 +26,9 @@ struct linux_binprm { unsigned long p; /* current top of mem */ unsigned long argmin; /* rlimit marker for copy_strings() */ unsigned int + /* Does bprm->file warrant clearing personality bits? */ + active_per_clear:1, + /* Should unsafe personality bits be cleared? */ per_clear:1, diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index 0ca68ad53592..62e60e55cb99 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -57,7 +57,7 @@ * transitions between security domains). * The hook must set @bprm->active_secureexec to 1 if AT_SECURE should be set to * request libc enable secure mode. - * The hook must set @bprm->per_clear to 1 if the dangerous personality + * The hook must set @bprm->active_per_clear to 1 if the dangerous personality * bits must be cleared from current->personality. * @bprm contains the linux_binprm structure. * Return 0 if the hook is successful and permission is granted. diff --git a/security/commoncap.c b/security/commoncap.c index 48b556046483..0b72d7bf23e1 100644 --- a/security/commoncap.c +++ b/security/commoncap.c @@ -826,7 +826,7 @@ int cap_bprm_repopulate_creds(struct linux_binprm *bprm) /* if we have fs caps, clear dangerous personality flags */ if (__cap_gained(permitted, new, old)) - bprm->per_clear = 1; + bprm->active_per_clear = 1; /* Don't let someone trace a set[ug]id/setpcap binary with the revised * credentials unless they have the appropriate permit. -- 2.25.0
Move the computation of creds from prepare_binfmt into begin_new_exec so that the creds can be computed only onc. I have looked through the kernel and verified none of the binfmts look at bprm->cred directly so computing the bprm->cred later should be safe. Rename preserve_creds to execfd_creds to make it clear that the creds should be derived from the executable file descriptor. Remove active_secureexec and active_per_clear and use secureexec and per_clear respectively. The active versions of these variables were only necessary to allow their values to be recomputed from scratch for each value of bprm->file. Remove the now unnecessary work from bprm_fill_uid to reset the bprm->cred->euid and bprm->cred->egid, and add a small comment about what bprm_fill_uid now does. Remove the now unnecessary work in cap_bprm_creds_from_file to reset the ambient capabilities, and add a small comment about what cap_bprm_creds_from_file does. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- fs/binfmt_misc.c | 2 +- fs/exec.c | 65 +++++++++++++++++------------------ include/linux/binfmts.h | 12 ++----- include/linux/lsm_hook_defs.h | 2 +- include/linux/lsm_hooks.h | 19 +++++----- include/linux/security.h | 8 ++--- security/commoncap.c | 12 +++---- security/security.c | 4 +-- 8 files changed, 57 insertions(+), 67 deletions(-) diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c index 53968ea07b57..bc5506619b7e 100644 --- a/fs/binfmt_misc.c +++ b/fs/binfmt_misc.c @@ -192,7 +192,7 @@ static int load_misc_binary(struct linux_binprm *bprm) bprm->interpreter = interp_file; if (fmt->flags & MISC_FMT_CREDENTIALS) - bprm->preserve_creds = 1; + bprm->execfd_creds = 1; retval = 0; ret: diff --git a/fs/exec.c b/fs/exec.c index 221d12dcaa3e..091ff6269610 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -72,6 +72,8 @@ #include <trace/events/sched.h> +static int bprm_creds_from_file(struct linux_binprm *bprm); + int suid_dumpable = 0; static LIST_HEAD(formats); @@ -1304,6 +1306,11 @@ int begin_new_exec(struct linux_binprm * bprm) struct task_struct *me = current; int retval; + /* Once we are committed compute the creds */ + retval = bprm_creds_from_file(bprm); + if (retval) + return retval; + /* * Ensure all future errors are fatal. */ @@ -1354,7 +1361,7 @@ int begin_new_exec(struct linux_binprm * bprm) me->flags &= ~(PF_RANDOMIZE | PF_FORKNOEXEC | PF_KTHREAD | PF_NOFREEZE | PF_NO_SETAFFINITY); flush_thread(); - if (bprm->per_clear || bprm->active_per_clear) + if (bprm->per_clear) me->personality &= ~PER_CLEAR_ON_SETID; /* @@ -1365,13 +1372,6 @@ int begin_new_exec(struct linux_binprm * bprm) */ do_close_on_exec(me->files); - /* - * Once here, prepare_binrpm() will not be called any more, so - * the final state of setuid/setgid/fscaps can be merged into the - * secureexec flag. - */ - bprm->secureexec |= bprm->active_secureexec; - if (bprm->secureexec) { /* Make sure parent cannot signal privileged process. */ me->pdeath_signal = 0; @@ -1589,20 +1589,12 @@ static void check_unsafe_exec(struct linux_binprm *bprm) static void bprm_fill_uid(struct linux_binprm *bprm) { + /* Handle suid and sgid on files */ struct inode *inode; unsigned int mode; kuid_t uid; kgid_t gid; - /* - * Since this can be called multiple times (via prepare_binprm), - * we must clear any previous work done when setting set[ug]id - * bits from any earlier bprm->file uses (for example when run - * first for a setuid script then again for its interpreter). - */ - bprm->cred->euid = current_euid(); - bprm->cred->egid = current_egid(); - if (!mnt_may_suid(bprm->file->f_path.mnt)) return; @@ -1629,19 +1621,38 @@ static void bprm_fill_uid(struct linux_binprm *bprm) return; if (mode & S_ISUID) { - bprm->active_per_clear = 1; + bprm->per_clear = 1; bprm->cred->euid = uid; } if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) { - bprm->active_per_clear = 1; + bprm->per_clear = 1; bprm->cred->egid = gid; } } +/* + * Compute brpm->cred based upon the final binary. + */ +static int bprm_creds_from_file(struct linux_binprm *bprm) +{ + struct file *file = bprm->file; + int retval; + + /* Compute creds from the executable passed to userspace? */ + if (bprm->execfd_creds) + bprm->file = bprm->executable; + + bprm_fill_uid(bprm); + retval = security_bprm_creds_from_file(bprm); + bprm->file = file; + + return retval; +} + /* * Fill the binprm structure from the inode. - * Check permissions, then read the first BINPRM_BUF_SIZE bytes + * Read the first BINPRM_BUF_SIZE bytes * * This may be called multiple times for binary chains (scripts for example). */ @@ -1649,20 +1660,6 @@ static int prepare_binprm(struct linux_binprm *bprm) { loff_t pos = 0; - /* Can the interpreter get to the executable without races? */ - if (!bprm->preserve_creds) { - int retval; - - /* Recompute parts of bprm->cred based on bprm->file */ - bprm->active_secureexec = 0; - bprm->active_per_clear = 0; - bprm_fill_uid(bprm); - retval = security_bprm_repopulate_creds(bprm); - if (retval) - return retval; - } - bprm->preserve_creds = 0; - memset(bprm->buf, 0, BINPRM_BUF_SIZE); return kernel_read(bprm->file, bprm->buf, BINPRM_BUF_SIZE, &pos); } diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h index 89231a689957..39f6b5a7ace7 100644 --- a/include/linux/binfmts.h +++ b/include/linux/binfmts.h @@ -26,22 +26,14 @@ struct linux_binprm { unsigned long p; /* current top of mem */ unsigned long argmin; /* rlimit marker for copy_strings() */ unsigned int - /* Does bprm->file warrant clearing personality bits? */ - active_per_clear:1, - /* Should unsafe personality bits be cleared? */ per_clear:1, /* Should an execfd be passed to userspace? */ have_execfd:1, - /* It is safe to use the creds of a script (see binfmt_misc) */ - preserve_creds:1, - /* - * True if most recent call to security_bprm_set_creds - * resulted in elevated privileges. - */ - active_secureexec:1, + /* Use the creds of a script (see binfmt_misc) */ + execfd_creds:1, /* * Set by bprm_creds_for_exec hook to indicate a * privilege-gaining exec has happened. Used to set diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h index 1e295ba12c0d..36b07c1eb0f1 100644 --- a/include/linux/lsm_hook_defs.h +++ b/include/linux/lsm_hook_defs.h @@ -50,7 +50,7 @@ LSM_HOOK(int, 0, settime, const struct timespec64 *ts, const struct timezone *tz) LSM_HOOK(int, 0, vm_enough_memory, struct mm_struct *mm, long pages) LSM_HOOK(int, 0, bprm_creds_for_exec, struct linux_binprm *bprm) -LSM_HOOK(int, 0, bprm_repopulate_creds, struct linux_binprm *bprm) +LSM_HOOK(int, 0, bprm_creds_from_file, struct linux_binprm *bprm) LSM_HOOK(int, 0, bprm_check_security, struct linux_binprm *bprm) LSM_HOOK(void, LSM_RET_VOID, bprm_committing_creds, struct linux_binprm *bprm) LSM_HOOK(void, LSM_RET_VOID, bprm_committed_creds, struct linux_binprm *bprm) diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index 62e60e55cb99..0aeaa3de69b2 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -46,18 +46,19 @@ * bits must be cleared from current->personality. * @bprm contains the linux_binprm structure. * Return 0 if the hook is successful and permission is granted. - * @bprm_repopulate_creds: - * Assuming that the relevant bits of @bprm->cred->security have been - * previously set, examine @bprm->file and regenerate them. This is - * so that the credentials derived from the interpreter the code is - * actually going to run are used rather than credentials derived - * from a script. This done because the interpreter binary needs to - * reopen script, and may end up opening something completely different. + * @bprm_creds_from_file: + * If @bprm->file is setpcap, suid, sgid or otherwise marked to + * change the privilege level upon exec update @bprm->cred to + * handle the marking on the file. This is called after finding + * the native code binary that will be executed. This ensures that + * the credentials will not be derived from a script that the binary + * will need to reopen, which when reopend may end up being a completely + * different file. * This hook may also optionally check permissions (e.g. for * transitions between security domains). - * The hook must set @bprm->active_secureexec to 1 if AT_SECURE should be set to + * The hook must set @bprm->secureexec to 1 if AT_SECURE should be set to * request libc enable secure mode. - * The hook must set @bprm->active_per_clear to 1 if the dangerous personality + * The hook must set @bprm->per_clear to 1 if the dangerous personality * bits must be cleared from current->personality. * @bprm contains the linux_binprm structure. * Return 0 if the hook is successful and permission is granted. diff --git a/include/linux/security.h b/include/linux/security.h index 6dcec9375e8f..df8ad2fb7374 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -140,7 +140,7 @@ extern int cap_capset(struct cred *new, const struct cred *old, const kernel_cap_t *effective, const kernel_cap_t *inheritable, const kernel_cap_t *permitted); -extern int cap_bprm_repopulate_creds(struct linux_binprm *bprm); +extern int cap_bprm_creds_from_file(struct linux_binprm *bprm); extern int cap_inode_setxattr(struct dentry *dentry, const char *name, const void *value, size_t size, int flags); extern int cap_inode_removexattr(struct dentry *dentry, const char *name); @@ -277,7 +277,7 @@ int security_syslog(int type); int security_settime64(const struct timespec64 *ts, const struct timezone *tz); int security_vm_enough_memory_mm(struct mm_struct *mm, long pages); int security_bprm_creds_for_exec(struct linux_binprm *bprm); -int security_bprm_repopulate_creds(struct linux_binprm *bprm); +int security_bprm_creds_from_file(struct linux_binprm *bprm); int security_bprm_check(struct linux_binprm *bprm); void security_bprm_committing_creds(struct linux_binprm *bprm); void security_bprm_committed_creds(struct linux_binprm *bprm); @@ -575,9 +575,9 @@ static inline int security_bprm_creds_for_exec(struct linux_binprm *bprm) return 0; } -static inline int security_bprm_repopulate_creds(struct linux_binprm *bprm) +static inline int security_bprm_creds_from_file(struct linux_binprm *bprm) { - return cap_bprm_repopulate_creds(bprm); + return cap_bprm_creds_from_file(bprm); } static inline int security_bprm_check(struct linux_binprm *bprm) diff --git a/security/commoncap.c b/security/commoncap.c index 0b72d7bf23e1..2bd1f24f3796 100644 --- a/security/commoncap.c +++ b/security/commoncap.c @@ -797,22 +797,22 @@ static inline bool nonroot_raised_pE(struct cred *new, const struct cred *old, } /** - * cap_bprm_repopulate_creds - Set up the proposed credentials for execve(). + * cap_bprm_creds_from_file - Set up the proposed credentials for execve(). * @bprm: The execution parameters, including the proposed creds * * Set up the proposed credentials for a new execution context being * constructed by execve(). The proposed creds in @bprm->cred is altered, * which won't take effect immediately. Returns 0 if successful, -ve on error. */ -int cap_bprm_repopulate_creds(struct linux_binprm *bprm) +int cap_bprm_creds_from_file(struct linux_binprm *bprm) { + /* Process setpcap binaries and capabilities for uid 0 */ const struct cred *old = current_cred(); struct cred *new = bprm->cred; bool effective = false, has_fcap = false, is_setid; int ret; kuid_t root_uid; - new->cap_ambient = old->cap_ambient; if (WARN_ON(!cap_ambient_invariant_ok(old))) return -EPERM; @@ -826,7 +826,7 @@ int cap_bprm_repopulate_creds(struct linux_binprm *bprm) /* if we have fs caps, clear dangerous personality flags */ if (__cap_gained(permitted, new, old)) - bprm->active_per_clear = 1; + bprm->per_clear = 1; /* Don't let someone trace a set[ug]id/setpcap binary with the revised * credentials unless they have the appropriate permit. @@ -889,7 +889,7 @@ int cap_bprm_repopulate_creds(struct linux_binprm *bprm) (!__is_real(root_uid, new) && (effective || __cap_grew(permitted, ambient, new)))) - bprm->active_secureexec = 1; + bprm->secureexec = 1; return 0; } @@ -1346,7 +1346,7 @@ static struct security_hook_list capability_hooks[] __lsm_ro_after_init = { LSM_HOOK_INIT(ptrace_traceme, cap_ptrace_traceme), LSM_HOOK_INIT(capget, cap_capget), LSM_HOOK_INIT(capset, cap_capset), - LSM_HOOK_INIT(bprm_repopulate_creds, cap_bprm_repopulate_creds), + LSM_HOOK_INIT(bprm_creds_from_file, cap_bprm_creds_from_file), LSM_HOOK_INIT(inode_need_killpriv, cap_inode_need_killpriv), LSM_HOOK_INIT(inode_killpriv, cap_inode_killpriv), LSM_HOOK_INIT(inode_getsecurity, cap_inode_getsecurity), diff --git a/security/security.c b/security/security.c index b890b7e2a765..0688359bf8f4 100644 --- a/security/security.c +++ b/security/security.c @@ -828,9 +828,9 @@ int security_bprm_creds_for_exec(struct linux_binprm *bprm) return call_int_hook(bprm_creds_for_exec, 0, bprm); } -int security_bprm_repopulate_creds(struct linux_binprm *bprm) +int security_bprm_creds_from_file(struct linux_binprm *bprm) { - return call_int_hook(bprm_repopulate_creds, 0, bprm); + return call_int_hook(bprm_creds_from_file, 0, bprm); } int security_bprm_check(struct linux_binprm *bprm) -- 2.25.0
The logic in cap_bprm_creds_from_file is difficult to follow in part because it handles both uids/gids and capabilities. That difficulty in following the code has resulted in several small bugs. Move the handling of uids/gids into bprm_fill_uid to make the code clearer. A small bug is fixed where the ambient capabilities were unnecessarily cleared when the presence of a ptracer or a shared fs_struct resulted in the setuid or setgid not being honored. This bug was not possible to leave in place with the movement of the uids and gids handling out of cap_bprm_repopultate_creds. The rest of the bugs I have tried to make more apparent but left in tact when moving the code into bprm_fill_uid. Ref: ee67ae7ef6ff ("commoncap: Move cap_elevated calculation into bprm_set_creds") Fixes: 58319057b784 ("capabilities: ambient capabilities") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- fs/exec.c | 49 ++++++++++++++++++++++++++++++++++++-------- security/commoncap.c | 25 +++++++--------------- 2 files changed, 48 insertions(+), 26 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index 091ff6269610..956ee3a0d824 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1590,21 +1590,23 @@ static void check_unsafe_exec(struct linux_binprm *bprm) static void bprm_fill_uid(struct linux_binprm *bprm) { /* Handle suid and sgid on files */ + struct cred *new = bprm->cred; struct inode *inode; unsigned int mode; + bool need_cap; kuid_t uid; kgid_t gid; if (!mnt_may_suid(bprm->file->f_path.mnt)) - return; + goto after_setid; if (task_no_new_privs(current)) - return; + goto after_setid; inode = bprm->file->f_path.dentry->d_inode; mode = READ_ONCE(inode->i_mode); if (!(mode & (S_ISUID|S_ISGID))) - return; + goto after_setid; /* Be careful if suid/sgid is set */ inode_lock(inode); @@ -1616,19 +1618,50 @@ static void bprm_fill_uid(struct linux_binprm *bprm) inode_unlock(inode); /* We ignore suid/sgid if there are no mappings for them in the ns */ - if (!kuid_has_mapping(bprm->cred->user_ns, uid) || - !kgid_has_mapping(bprm->cred->user_ns, gid)) - return; + if (!kuid_has_mapping(new->user_ns, uid) || + !kgid_has_mapping(new->user_ns, gid)) + goto after_setid; if (mode & S_ISUID) { bprm->per_clear = 1; - bprm->cred->euid = uid; + new->euid = uid; } if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) { bprm->per_clear = 1; - bprm->cred->egid = gid; + new->egid = gid; + } + +after_setid: + /* Will the new creds have multiple uids or gids? */ + if (!uid_eq(new->euid, new->uid) || !gid_eq(new->egid, new->gid)) { + bprm->secureexec = 1; + + /* + * Is the root directory and working directory shared or is + * the process traced and the tracing process does not have + * CAP_SYS_PTRACE? + * + * In either case it is not safe to change the euid or egid + * unless the current process has the appropriate cap and so + * chaning the euid or egid was already possible. + */ + need_cap = bprm->unsafe & LSM_UNSAFE_SHARE || + !ptracer_capable(current, new->user_ns); + if (need_cap && !uid_eq(new->euid, new->uid) && + (!ns_capable(new->user_ns, CAP_SETUID) || + (bprm->unsafe & LSM_UNSAFE_NO_NEW_PRIVS))) { + new->euid = new->uid; + } + if (need_cap && !gid_eq(new->egid, new->gid) && + (!ns_capable(new->user_ns, CAP_SETUID) || + (bprm->unsafe & LSM_UNSAFE_NO_NEW_PRIVS))) { + new->egid = new->gid; + } } + + new->suid = new->fsuid = new->euid; + new->sgid = new->fsgid = new->egid; } /* diff --git a/security/commoncap.c b/security/commoncap.c index 2bd1f24f3796..b39c7511862e 100644 --- a/security/commoncap.c +++ b/security/commoncap.c @@ -809,7 +809,7 @@ int cap_bprm_creds_from_file(struct linux_binprm *bprm) /* Process setpcap binaries and capabilities for uid 0 */ const struct cred *old = current_cred(); struct cred *new = bprm->cred; - bool effective = false, has_fcap = false, is_setid; + bool effective = false, has_fcap = false; int ret; kuid_t root_uid; @@ -828,31 +828,21 @@ int cap_bprm_creds_from_file(struct linux_binprm *bprm) if (__cap_gained(permitted, new, old)) bprm->per_clear = 1; - /* Don't let someone trace a set[ug]id/setpcap binary with the revised + /* Don't let someone trace a setpcap binary with the revised * credentials unless they have the appropriate permit. * * In addition, if NO_NEW_PRIVS, then ensure we get no new privs. */ - is_setid = __is_setuid(new, old) || __is_setgid(new, old); - - if ((is_setid || __cap_gained(permitted, new, old)) && + if (__cap_gained(permitted, new, old) && ((bprm->unsafe & ~LSM_UNSAFE_PTRACE) || !ptracer_capable(current, new->user_ns))) { /* downgrade; they get no more than they had, and maybe less */ - if (!ns_capable(new->user_ns, CAP_SETUID) || - (bprm->unsafe & LSM_UNSAFE_NO_NEW_PRIVS)) { - new->euid = new->uid; - new->egid = new->gid; - } new->cap_permitted = cap_intersect(new->cap_permitted, old->cap_permitted); } - new->suid = new->fsuid = new->euid; - new->sgid = new->fsgid = new->egid; - /* File caps or setid cancels ambient. */ - if (has_fcap || is_setid) + if (has_fcap || __is_setuid(new, old) || __is_setgid(new, old)) cap_clear(new->cap_ambient); /* @@ -885,10 +875,9 @@ int cap_bprm_creds_from_file(struct linux_binprm *bprm) return -EPERM; /* Check for privilege-elevated exec. */ - if (is_setid || - (!__is_real(root_uid, new) && - (effective || - __cap_grew(permitted, ambient, new)))) + if (!__is_real(root_uid, new) && + (effective || + __cap_grew(permitted, ambient, new))) bprm->secureexec = 1; return 0; -- 2.25.0
The logic in cap_bprm_creds_from_file is difficult to follow in part because it handles both uids/gids and capabilities. That difficulty in following the code has resulted in several small bugs. Move the handling of uids/gids into bprm_fill_uid to make the code clearer. A small bug is fixed where the ambient capabilities were unnecessarily cleared when the presence of a ptracer or a shared fs_struct resulted in the setuid or setgid not being honored. This bug was not possible to leave in place with the movement of the uids and gids handling out of cap_bprm_repopultate_creds. The rest of the bugs I have tried to make more apparent but left in tact when moving the code into bprm_fill_uid. Ref: ee67ae7ef6ff ("commoncap: Move cap_elevated calculation into bprm_set_creds") Fixes: 58319057b784 ("capabilities: ambient capabilities") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- fs/exec.c | 49 ++++++++++++++++++++++++++++++++++++-------- security/commoncap.c | 25 +++++++--------------- 2 files changed, 48 insertions(+), 26 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index 091ff6269610..956ee3a0d824 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1590,21 +1590,23 @@ static void check_unsafe_exec(struct linux_binprm *bprm) static void bprm_fill_uid(struct linux_binprm *bprm) { /* Handle suid and sgid on files */ + struct cred *new = bprm->cred; struct inode *inode; unsigned int mode; + bool need_cap; kuid_t uid; kgid_t gid; if (!mnt_may_suid(bprm->file->f_path.mnt)) - return; + goto after_setid; if (task_no_new_privs(current)) - return; + goto after_setid; inode = bprm->file->f_path.dentry->d_inode; mode = READ_ONCE(inode->i_mode); if (!(mode & (S_ISUID|S_ISGID))) - return; + goto after_setid; /* Be careful if suid/sgid is set */ inode_lock(inode); @@ -1616,19 +1618,50 @@ static void bprm_fill_uid(struct linux_binprm *bprm) inode_unlock(inode); /* We ignore suid/sgid if there are no mappings for them in the ns */ - if (!kuid_has_mapping(bprm->cred->user_ns, uid) || - !kgid_has_mapping(bprm->cred->user_ns, gid)) - return; + if (!kuid_has_mapping(new->user_ns, uid) || + !kgid_has_mapping(new->user_ns, gid)) + goto after_setid; if (mode & S_ISUID) { bprm->per_clear = 1; - bprm->cred->euid = uid; + new->euid = uid; } if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) { bprm->per_clear = 1; - bprm->cred->egid = gid; + new->egid = gid; + } + +after_setid: + /* Will the new creds have multiple uids or gids? */ + if (!uid_eq(new->euid, new->uid) || !gid_eq(new->egid, new->gid)) { + bprm->secureexec = 1; + + /* + * Is the root directory and working directory shared or is + * the process traced and the tracing process does not have + * CAP_SYS_PTRACE? + * + * In either case it is not safe to change the euid or egid + * unless the current process has the appropriate cap and so + * chaning the euid or egid was already possible. + */ + need_cap = bprm->unsafe & LSM_UNSAFE_SHARE || + !ptracer_capable(current, new->user_ns); + if (need_cap && !uid_eq(new->euid, new->uid) && + (!ns_capable(new->user_ns, CAP_SETUID) || + (bprm->unsafe & LSM_UNSAFE_NO_NEW_PRIVS))) { + new->euid = new->uid; + } + if (need_cap && !gid_eq(new->egid, new->gid) && + (!ns_capable(new->user_ns, CAP_SETUID) || + (bprm->unsafe & LSM_UNSAFE_NO_NEW_PRIVS))) { + new->egid = new->gid; + } } + + new->suid = new->fsuid = new->euid; + new->sgid = new->fsgid = new->egid; } /* diff --git a/security/commoncap.c b/security/commoncap.c index 2bd1f24f3796..b39c7511862e 100644 --- a/security/commoncap.c +++ b/security/commoncap.c @@ -809,7 +809,7 @@ int cap_bprm_creds_from_file(struct linux_binprm *bprm) /* Process setpcap binaries and capabilities for uid 0 */ const struct cred *old = current_cred(); struct cred *new = bprm->cred; - bool effective = false, has_fcap = false, is_setid; + bool effective = false, has_fcap = false; int ret; kuid_t root_uid; @@ -828,31 +828,21 @@ int cap_bprm_creds_from_file(struct linux_binprm *bprm) if (__cap_gained(permitted, new, old)) bprm->per_clear = 1; - /* Don't let someone trace a set[ug]id/setpcap binary with the revised + /* Don't let someone trace a setpcap binary with the revised * credentials unless they have the appropriate permit. * * In addition, if NO_NEW_PRIVS, then ensure we get no new privs. */ - is_setid = __is_setuid(new, old) || __is_setgid(new, old); - - if ((is_setid || __cap_gained(permitted, new, old)) && + if (__cap_gained(permitted, new, old) && ((bprm->unsafe & ~LSM_UNSAFE_PTRACE) || !ptracer_capable(current, new->user_ns))) { /* downgrade; they get no more than they had, and maybe less */ - if (!ns_capable(new->user_ns, CAP_SETUID) || - (bprm->unsafe & LSM_UNSAFE_NO_NEW_PRIVS)) { - new->euid = new->uid; - new->egid = new->gid; - } new->cap_permitted = cap_intersect(new->cap_permitted, old->cap_permitted); } - new->suid = new->fsuid = new->euid; - new->sgid = new->fsgid = new->egid; - /* File caps or setid cancels ambient. */ - if (has_fcap || is_setid) + if (has_fcap || __is_setuid(new, old) || __is_setgid(new, old)) cap_clear(new->cap_ambient); /* @@ -885,10 +875,9 @@ int cap_bprm_creds_from_file(struct linux_binprm *bprm) return -EPERM; /* Check for privilege-elevated exec. */ - if (is_setid || - (!__is_real(root_uid, new) && - (effective || - __cap_grew(permitted, ambient, new)))) + if (!__is_real(root_uid, new) && + (effective || + __cap_grew(permitted, ambient, new))) bprm->secureexec = 1; return 0; -- 2.25.0
If the task has CAP_SETGID and a shared fs struct or is being ptraced than it is clear that nothing new is being introduced when the gid changes, and so it is safe to honor a setgid executable. However if all we know is that the task has CAP_SETUID things are less clear. This bug looks like it was introduced in v2.1.100 when !suser was replaced by !capable(CAP_SETUID). It appears to have been an oversight at that time. Fixing this 22 years later seems weird but even now it still looks worth fixing. As conceptually what is happening is testing to see if the process already had the potential to make a gid change or if the trancer needs permissions in addition to the permissions needed to trace the process to trace the process through a gid change. Fixes: v2.1.100 Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- fs/exec.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/exec.c b/fs/exec.c index 956ee3a0d824..bac8db14f30d 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1654,7 +1654,7 @@ static void bprm_fill_uid(struct linux_binprm *bprm) new->euid = new->uid; } if (need_cap && !gid_eq(new->egid, new->gid) && - (!ns_capable(new->user_ns, CAP_SETUID) || + (!ns_capable(new->user_ns, CAP_SETGID) || (bprm->unsafe & LSM_UNSAFE_NO_NEW_PRIVS))) { new->egid = new->gid; } -- 2.25.0
When the is_secureexec test was removed from cap_bprm_set_creds the test was modified so that it based the status of secureexec on a version of the euid and egid before ptrace and shared fs tests possibly reverted them. The effect of which is that secureexec continued to be set when the euid and egid change were abandoned because the executable was being ptraced to secureexec being set in that same situation. As far as I can tell it is just an oversight and very poor quality of implementation to set AT_SECURE when it is not ncessary. So improve the quality of the implementation by only setting secureexec when there will be multiple uids or gids in the final cred structure. Fixes: ee67ae7ef6ff ("commoncap: Move cap_elevated calculation into bprm_set_creds") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- fs/exec.c | 48 +++++++++++++++++++++--------------------------- 1 file changed, 21 insertions(+), 27 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index bac8db14f30d..123402f218fe 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1622,44 +1622,38 @@ static void bprm_fill_uid(struct linux_binprm *bprm) !kgid_has_mapping(new->user_ns, gid)) goto after_setid; + /* + * Is the root directory and working directory shared or is + * the process traced and the tracing process does not have + * CAP_SYS_PTRACE? + * + * In either case it is not safe to change the euid or egid + * unless the current process has the appropriate cap and so + * chaning the euid or egid was already possible. + */ + need_cap = bprm->unsafe & LSM_UNSAFE_SHARE || + !ptracer_capable(current, new->user_ns); + if (mode & S_ISUID) { bprm->per_clear = 1; - new->euid = uid; + if (!need_cap || + (ns_capable(new->user_ns, CAP_SETUID) && + !(bprm->unsafe & LSM_UNSAFE_NO_NEW_PRIVS))) + new->euid = uid; } - if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) { bprm->per_clear = 1; - new->egid = gid; + if (!need_cap || + (ns_capable(new->user_ns, CAP_SETGID) && + !(bprm->unsafe & LSM_UNSAFE_NO_NEW_PRIVS))) + new->egid = gid; } after_setid: /* Will the new creds have multiple uids or gids? */ - if (!uid_eq(new->euid, new->uid) || !gid_eq(new->egid, new->gid)) { + if (!uid_eq(new->euid, new->uid) || !gid_eq(new->egid, new->gid)) bprm->secureexec = 1; - /* - * Is the root directory and working directory shared or is - * the process traced and the tracing process does not have - * CAP_SYS_PTRACE? - * - * In either case it is not safe to change the euid or egid - * unless the current process has the appropriate cap and so - * chaning the euid or egid was already possible. - */ - need_cap = bprm->unsafe & LSM_UNSAFE_SHARE || - !ptracer_capable(current, new->user_ns); - if (need_cap && !uid_eq(new->euid, new->uid) && - (!ns_capable(new->user_ns, CAP_SETUID) || - (bprm->unsafe & LSM_UNSAFE_NO_NEW_PRIVS))) { - new->euid = new->uid; - } - if (need_cap && !gid_eq(new->egid, new->gid) && - (!ns_capable(new->user_ns, CAP_SETGID) || - (bprm->unsafe & LSM_UNSAFE_NO_NEW_PRIVS))) { - new->egid = new->gid; - } - } - new->suid = new->fsuid = new->euid; new->sgid = new->fsgid = new->egid; } -- 2.25.0
Now that there is only one place in bprm_fill_uid where the euid and the egid are set, move setting of the saved, and the fs ids to that place. This makes it clear that this is the only location in the function that changes these ids. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- fs/exec.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index 123402f218fe..8dd7254931dc 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1639,23 +1639,20 @@ static void bprm_fill_uid(struct linux_binprm *bprm) if (!need_cap || (ns_capable(new->user_ns, CAP_SETUID) && !(bprm->unsafe & LSM_UNSAFE_NO_NEW_PRIVS))) - new->euid = uid; + new->suid = new->fsuid = new->euid = uid; } if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) { bprm->per_clear = 1; if (!need_cap || (ns_capable(new->user_ns, CAP_SETGID) && !(bprm->unsafe & LSM_UNSAFE_NO_NEW_PRIVS))) - new->egid = gid; + new->sgid = new->fsgid = new->egid = gid; } after_setid: /* Will the new creds have multiple uids or gids? */ if (!uid_eq(new->euid, new->uid) || !gid_eq(new->egid, new->gid)) bprm->secureexec = 1; - - new->suid = new->fsuid = new->euid; - new->sgid = new->fsgid = new->egid; } /* -- 2.25.0
When the no new privs code was added[1], a test was added to cap_bprm_set_creds to ensure that the credential change were always reverted if no new privs was set. That test has been refactored into a test to not make the credential change in bprm_fill_uid when no new privs is set. Remove that unncessary test as it can now been seen by a quick inspection that execution can never make it to the test with no new privs set. The same change[1] also added a test that guaranteed the credentials would never change when no_new_privs was set, so the test I am removing was never necessary but historically that was far from obvious. [1]: 259e5e6c75a9 ("Add PR_{GET,SET}_NO_NEW_PRIVS to prevent execve from granting privs") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- fs/exec.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index 8dd7254931dc..af108ecf9632 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1636,16 +1636,12 @@ static void bprm_fill_uid(struct linux_binprm *bprm) if (mode & S_ISUID) { bprm->per_clear = 1; - if (!need_cap || - (ns_capable(new->user_ns, CAP_SETUID) && - !(bprm->unsafe & LSM_UNSAFE_NO_NEW_PRIVS))) + if (!need_cap || ns_capable(new->user_ns, CAP_SETUID)) new->suid = new->fsuid = new->euid = uid; } if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) { bprm->per_clear = 1; - if (!need_cap || - (ns_capable(new->user_ns, CAP_SETGID) && - !(bprm->unsafe & LSM_UNSAFE_NO_NEW_PRIVS))) + if (!need_cap || ns_capable(new->user_ns, CAP_SETGID)) new->sgid = new->fsgid = new->egid = gid; } -- 2.25.0
It makes no sense to set active_per_clear when the kernel decides not to honor the executables setuid or or setgid bits. Instead set active_per_clear when the kernel actually decides to honor the suid or sgid permission bits of an executable. As far as I can tell this was the intended behavior but with the ptrace logic hiding out in security/commcap.c:cap_bprm_apply_creds I believe it was just overlooked that the setuid or setgid operation could be cancelled. History Tree: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git Fixes: 1bb0fa189c6a ("[PATCH] NX: clean up legacy binary support") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- fs/exec.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index af108ecf9632..347dade4bc54 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1634,15 +1634,16 @@ static void bprm_fill_uid(struct linux_binprm *bprm) need_cap = bprm->unsafe & LSM_UNSAFE_SHARE || !ptracer_capable(current, new->user_ns); - if (mode & S_ISUID) { + if ((mode & S_ISUID) && + (!need_cap || ns_capable(new->user_ns, CAP_SETUID))) { bprm->per_clear = 1; - if (!need_cap || ns_capable(new->user_ns, CAP_SETUID)) - new->suid = new->fsuid = new->euid = uid; + new->suid = new->fsuid = new->euid = uid; } - if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) { + + if (((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) && + (!need_cap || ns_capable(new->user_ns, CAP_SETGID))) { bprm->per_clear = 1; - if (!need_cap || ns_capable(new->user_ns, CAP_SETGID)) - new->sgid = new->fsgid = new->egid = gid; + new->sgid = new->fsgid = new->egid = gid; } after_setid: -- 2.25.0
We currently have two different policies for setting per_clear and for setting secureexec. For per_clear the policy is if the setxid bits on a file are honored. For secureexec the policy is if the resulting credentials will have multiple uids or gids. Looking closely the policy for setting AT_SECURE and asking userspace not to trust our caller in all cases where we have multiple uids or gids does not make sense. In some of those cases it is the caller of exec that provides multiple uids and gids. The point of setting AT_SECURE is so that the called application or it's libraries can take defensive measures to guard against a lesser privileged program which calls it via exec. If all of your privilege comes from your caller there is no point in taking defensive measures, against them. Further the only way that libc or other userspace can know that the privilege came from the caller of exec and not from the exec being suid or sgid is by the kernel telling it. As userspace does not have enough information to distinguish between these two cases. So set secureexec when the exec itself results in multiple uids or gids, not when we happen to have mulitple ids because the binary was called that way. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- fs/exec.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index 347dade4bc54..fc4edc7517a6 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1637,19 +1637,19 @@ static void bprm_fill_uid(struct linux_binprm *bprm) if ((mode & S_ISUID) && (!need_cap || ns_capable(new->user_ns, CAP_SETUID))) { bprm->per_clear = 1; + bprm->secureexec = 1; new->suid = new->fsuid = new->euid = uid; } if (((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) && (!need_cap || ns_capable(new->user_ns, CAP_SETGID))) { bprm->per_clear = 1; + bprm->secureexec = 1; new->sgid = new->fsgid = new->egid = gid; } after_setid: - /* Will the new creds have multiple uids or gids? */ - if (!uid_eq(new->euid, new->uid) || !gid_eq(new->egid, new->gid)) - bprm->secureexec = 1; + ; } /* -- 2.25.0
There is nothing past the label after_setid in bprm_fill_uid so replace code that jumps to it with return, and delete the label entirely. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- fs/exec.c | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index fc4edc7517a6..ccb552fcdcff 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1598,15 +1598,15 @@ static void bprm_fill_uid(struct linux_binprm *bprm) kgid_t gid; if (!mnt_may_suid(bprm->file->f_path.mnt)) - goto after_setid; + return; if (task_no_new_privs(current)) - goto after_setid; + return; inode = bprm->file->f_path.dentry->d_inode; mode = READ_ONCE(inode->i_mode); if (!(mode & (S_ISUID|S_ISGID))) - goto after_setid; + return; /* Be careful if suid/sgid is set */ inode_lock(inode); @@ -1620,7 +1620,7 @@ static void bprm_fill_uid(struct linux_binprm *bprm) /* We ignore suid/sgid if there are no mappings for them in the ns */ if (!kuid_has_mapping(new->user_ns, uid) || !kgid_has_mapping(new->user_ns, gid)) - goto after_setid; + return; /* * Is the root directory and working directory shared or is @@ -1647,9 +1647,6 @@ static void bprm_fill_uid(struct linux_binprm *bprm) bprm->secureexec = 1; new->sgid = new->fsgid = new->egid = gid; } - -after_setid: - ; } /* -- 2.25.0
On Thu, May 28, 2020 at 8:45 AM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> - me->personality &= ~bprm->per_clear;
> + if (bprm->per_clear)
> + me->personality &= ~PER_CLEAR_ON_SETID;\
My only problem with this patch is that I find that 'per_clear' thing
to be a horrid horrid name,
Obviously the name didn't change, but the use *did* change, and as
such the name got worse. It used do do things like
bprm->per_clear |= PER_CLEAR_ON_SETID;
and now it does
bprm->per_clear = 1;
and honestly, there's a lot more semantic context in the old code that
is now missing entirely. At least you used to be able to grep for
PER_CLEAR_ON_SETID and it would make you go "Ahh.."
Put another way, I can kind of see what a line like
bprm->per_clear |= PER_CLEAR_ON_SETID;
does, simply because now it kind of hints at what is up.
But what the heck does
bprm->per_clear = 1;
mean? Nothing. You have to really know the code. "per_clear" makes no
sense, and now it's a short line that doesn't need to be that short.
I think "bprm->clear_personality_bits" would maybe describe what the
_effect_ of that field is. It doesn't explain _why_, but it at least
explains "what" much better than "per_clear", which just makes me go
"per what?".
Alternatively, "bprm->creds_changed" would describe what the bit
conceptually is about, and code like
if (bprm->creds_changed)
me->personality &= ~PER_CLEAR_ON_SETID;\
looks sensible to me and kind of matches the comment about the
PER_CLEAR_ON_SETID bits are.
So I think that using a bitfield is fine, but I'd really like it to be
named something much better.
Plus changing the name means that you can't have any code that now
mistakenly uses the new semantics but expects the old bitmask.
Generally when something changes semantics that radically, you want to
make sure the type changes sufficiently that any out-of-tree patch
that hasn't been merged yet will get a clear warning or error if
people don't realize.
Please?
Linus
As per the naming of "per_clear", I find the "active_per_clear" name even more confusing. It has all the same issues, but doubled down. What does "active" mean? Linus
On Thu, May 28, 2020 at 8:53 AM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> It makes no sense to set active_per_clear when the kernel decides not
> to honor the executables setuid or or setgid bits. Instead set
> active_per_clear when the kernel actually decides to honor the suid or
> sgid permission bits of an executable.
You seem to be confused about the naming yourself.
You talk about "active_per_clear", but the code is about "per_clear". WTF?
Linus
Linus Torvalds <torvalds@linux-foundation.org> writes:
> On Thu, May 28, 2020 at 8:45 AM Eric W. Biederman <ebiederm@xmission.com> wrote:
>>
>> - me->personality &= ~bprm->per_clear;
>> + if (bprm->per_clear)
>> + me->personality &= ~PER_CLEAR_ON_SETID;\
>
> My only problem with this patch is that I find that 'per_clear' thing
> to be a horrid horrid name,
>
> Obviously the name didn't change, but the use *did* change, and as
> such the name got worse. It used do do things like
>
> bprm->per_clear |= PER_CLEAR_ON_SETID;
>
> and now it does
>
> bprm->per_clear = 1;
>
> and honestly, there's a lot more semantic context in the old code that
> is now missing entirely. At least you used to be able to grep for
> PER_CLEAR_ON_SETID and it would make you go "Ahh.."
>
> Put another way, I can kind of see what a line like
>
> bprm->per_clear |= PER_CLEAR_ON_SETID;
>
> does, simply because now it kind of hints at what is up.
>
> But what the heck does
>
> bprm->per_clear = 1;
>
> mean? Nothing. You have to really know the code. "per_clear" makes no
> sense, and now it's a short line that doesn't need to be that short.
>
> I think "bprm->clear_personality_bits" would maybe describe what the
> _effect_ of that field is. It doesn't explain _why_, but it at least
> explains "what" much better than "per_clear", which just makes me go
> "per what?".
>
> Alternatively, "bprm->creds_changed" would describe what the bit
> conceptually is about, and code like
>
> if (bprm->creds_changed)
> me->personality &= ~PER_CLEAR_ON_SETID;\
>
> looks sensible to me and kind of matches the comment about the
> PER_CLEAR_ON_SETID bits are.
>
> So I think that using a bitfield is fine, but I'd really like it to be
> named something much better.
>
> Plus changing the name means that you can't have any code that now
> mistakenly uses the new semantics but expects the old bitmask.
> Generally when something changes semantics that radically, you want to
> make sure the type changes sufficiently that any out-of-tree patch
> that hasn't been merged yet will get a clear warning or error if
> people don't realize.
>
> Please?
Yes. That will make a very nice change to the patch.
I think I will go with bprm->clear_unsafe_personality_bits or
something to that effect.
I would really love to have a bit that means creds_changes or
privilegeds_elevated. But right now we have 2 of two fields that mean
essentially that (per_clear and secureexec) and they don't agree on when
they get set.
I will make them agree as much as possible, and this patchset is a first
step in that direction but until we can actually make them agree, I want
to keep them both grounded in what they do. That way it is possible to
have a reasonable discussion on when they should be set.
Eric
Linus Torvalds <torvalds@linux-foundation.org> writes:
> On Thu, May 28, 2020 at 8:53 AM Eric W. Biederman <ebiederm@xmission.com> wrote:
>>
>> It makes no sense to set active_per_clear when the kernel decides not
>> to honor the executables setuid or or setgid bits. Instead set
>> active_per_clear when the kernel actually decides to honor the suid or
>> sgid permission bits of an executable.
>
> You seem to be confused about the naming yourself.
>
> You talk about "active_per_clear", but the code is about "per_clear". WTF?
I figured out how to kill active_per_clear see (3/11) and I failed to
update the patch description here.
I think active_ is a louzy suffix but since it all goes away in patch 3
when I remove the recomputation and the need to have two versions of the
setting I think it is probably good enough.
Eric
My last chunk of cleanups was clearly too a bit too big, with too many issues going on so let's try this again with just the most important cleanup. Recomputing the uids, gids, capabilities, and related flags each time a new bprm->file is set is error prone, and as it turns out unnecessary. Building upon my previous exec clean up work this set of changes splits per_clear temporarily into two separate flags which is the last step in causing the code to recompute everything each time a new bprm->file is considered. Then the code is refactored to run the credential from file calculation later so that recomputation is not necessary. Doing this in two steps should allow anyone who has problems later to bisect and tell if it was the semantic change or the refactoring that caused them problems. Eric W. Biederman (2): exec: Add a per bprm->file version of per_clear exec: Compute file based creds only once fs/binfmt_misc.c | 2 +- fs/exec.c | 57 ++++++++++++++++++------------------------- include/linux/binfmts.h | 9 ++----- include/linux/lsm_hook_defs.h | 2 +- include/linux/lsm_hooks.h | 22 +++++++++-------- include/linux/security.h | 9 ++++--- security/commoncap.c | 22 +++++++++-------- security/security.c | 4 +-- 8 files changed, 59 insertions(+), 68 deletions(-) --- This builds upon my previous exec cleanup work at: git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git exec-next Thank you, Eric
There is a small bug in the code that recomputes parts of bprm->cred for every bprm->file. The code never recomputes the part of clear_dangerous_personality_flags it is responsible for. Which means that in practice if someone creates a sgid script the interpreter will not be able to use any of: READ_IMPLIES_EXEC ADDR_NO_RANDOMIZE ADDR_COMPAT_LAYOUT MMAP_PAGE_ZERO. This accentially clearing of personality flags probably does not matter in practice because no one has complained but it does make the code more difficult to understand. Further remaining bug compatible prevents the recomputation from being removed and replaced by simply computing bprm->cred once from the final bprm->file. Making this change removes the last behavior difference between computing bprm->creds from the final file and recomputing bprm->cred several times. Which allows this behavior change to be justified for it's own reasons, and for any but hunts looking into why the behavior changed to wind up here instead of in the code that will follow that computes bprm->cred from the final bprm->file. This small logic bug appears to have existed since the code started clearing dangerous personality bits. History Tree: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git Fixes: 1bb0fa189c6a ("[PATCH] NX: clean up legacy binary support") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- fs/exec.c | 6 ++++-- include/linux/binfmts.h | 5 +++++ include/linux/lsm_hooks.h | 2 ++ security/commoncap.c | 2 +- 4 files changed, 12 insertions(+), 3 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index c3c879a55d65..0f793536e393 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1354,6 +1354,7 @@ int begin_new_exec(struct linux_binprm * bprm) me->flags &= ~(PF_RANDOMIZE | PF_FORKNOEXEC | PF_KTHREAD | PF_NOFREEZE | PF_NO_SETAFFINITY); flush_thread(); + bprm->per_clear |= bprm->pf_per_clear; me->personality &= ~bprm->per_clear; /* @@ -1628,12 +1629,12 @@ static void bprm_fill_uid(struct linux_binprm *bprm) return; if (mode & S_ISUID) { - bprm->per_clear |= PER_CLEAR_ON_SETID; + bprm->pf_per_clear |= PER_CLEAR_ON_SETID; bprm->cred->euid = uid; } if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) { - bprm->per_clear |= PER_CLEAR_ON_SETID; + bprm->pf_per_clear |= PER_CLEAR_ON_SETID; bprm->cred->egid = gid; } } @@ -1654,6 +1655,7 @@ static int prepare_binprm(struct linux_binprm *bprm) /* Recompute parts of bprm->cred based on bprm->file */ bprm->active_secureexec = 0; + bprm->pf_per_clear = 0; bprm_fill_uid(bprm); retval = security_bprm_repopulate_creds(bprm); if (retval) diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h index 7fc05929c967..50025ead0b72 100644 --- a/include/linux/binfmts.h +++ b/include/linux/binfmts.h @@ -55,6 +55,11 @@ struct linux_binprm { struct file * file; struct cred *cred; /* new credentials */ int unsafe; /* how unsafe this exec is (mask of LSM_UNSAFE_*) */ + /* + * bits to clear in current->personality + * recalculated for each bprm->file. + */ + unsigned int pf_per_clear; unsigned int per_clear; /* bits to clear in current->personality */ int argc, envc; const char * filename; /* Name of binary as seen by procps */ diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index d618ecc4d660..cd3dd0afceb5 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -55,6 +55,8 @@ * transitions between security domains). * The hook must set @bprm->active_secureexec to 1 if AT_SECURE should be set to * request libc enable secure mode. + * The hook must set @bprm->pf_per_clear to the personality flags that + * should be cleared from current->personality. * @bprm contains the linux_binprm structure. * Return 0 if the hook is successful and permission is granted. * @bprm_check_security: diff --git a/security/commoncap.c b/security/commoncap.c index 77b04cb6feac..6de72d22dc6c 100644 --- a/security/commoncap.c +++ b/security/commoncap.c @@ -826,7 +826,7 @@ int cap_bprm_repopulate_creds(struct linux_binprm *bprm) /* if we have fs caps, clear dangerous personality flags */ if (__cap_gained(permitted, new, old)) - bprm->per_clear |= PER_CLEAR_ON_SETID; + bprm->pf_per_clear |= PER_CLEAR_ON_SETID; /* Don't let someone trace a set[ug]id/setpcap binary with the revised * credentials unless they have the appropriate permit. -- 2.25.0
Move the computation of creds from prepare_binfmt into begin_new_exec so that the creds need only be computed once. This is just code reorganization no semantic changes of any kind are made. Moving the computation is safe. I have looked through the kernel and verified none of the binfmts look at bprm->cred directly, and that there are no helpers that look at bprm->cred indirectly. Which means that it is not a problem to compute the bprm->cred later in the execution flow as it is not used until it becomes current->cred. A new function bprm_creds_from_file is added to contain the work that needs to be done. bprm_creds_from_file first computes which file bprm->executable or most likely bprm->file that the bprm->creds will be computed from. The funciton bprm_fill_uid is updated to receive the file instead of accessing bprm->file. The now unnecessary work needed to reset the bprm->cred->euid, and bprm->cred->egid is removed from brpm_fill_uid. A small comment to document that bprm_fill_uid now only deals with the work to handle suid and sgid files. The default case is already heandled by prepare_exec_creds. The function security_bprm_repopulate_creds is renamed security_bprm_creds_from_file and now is explicitly passed the file from which to compute the creds. The documentation of the bprm_creds_from_file security hook is updated to explain when the hook is called and what it needs to do. The file is passed from cap_bprm_creds_from_file into get_file_caps so that the caps are computed for the appropriate file. The now unnecessary work in cap_bprm_creds_from_file to reset the ambient capabilites has been removed. A small comment to document that the work of cap_bprm_creds_from_file is to read capabilities from the files secureity attribute and derive capabilities from the fact the user had uid 0 has been added. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- fs/binfmt_misc.c | 2 +- fs/exec.c | 63 +++++++++++++++-------------------- include/linux/binfmts.h | 14 ++------ include/linux/lsm_hook_defs.h | 2 +- include/linux/lsm_hooks.h | 22 ++++++------ include/linux/security.h | 9 ++--- security/commoncap.c | 24 +++++++------ security/security.c | 4 +-- 8 files changed, 61 insertions(+), 79 deletions(-) diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c index 53968ea07b57..bc5506619b7e 100644 --- a/fs/binfmt_misc.c +++ b/fs/binfmt_misc.c @@ -192,7 +192,7 @@ static int load_misc_binary(struct linux_binprm *bprm) bprm->interpreter = interp_file; if (fmt->flags & MISC_FMT_CREDENTIALS) - bprm->preserve_creds = 1; + bprm->execfd_creds = 1; retval = 0; ret: diff --git a/fs/exec.c b/fs/exec.c index 0f793536e393..e8599236290d 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -72,6 +72,8 @@ #include <trace/events/sched.h> +static int bprm_creds_from_file(struct linux_binprm *bprm); + int suid_dumpable = 0; static LIST_HEAD(formats); @@ -1304,6 +1306,11 @@ int begin_new_exec(struct linux_binprm * bprm) struct task_struct *me = current; int retval; + /* Once we are committed compute the creds */ + retval = bprm_creds_from_file(bprm); + if (retval) + return retval; + /* * Ensure all future errors are fatal. */ @@ -1354,7 +1361,6 @@ int begin_new_exec(struct linux_binprm * bprm) me->flags &= ~(PF_RANDOMIZE | PF_FORKNOEXEC | PF_KTHREAD | PF_NOFREEZE | PF_NO_SETAFFINITY); flush_thread(); - bprm->per_clear |= bprm->pf_per_clear; me->personality &= ~bprm->per_clear; /* @@ -1365,13 +1371,6 @@ int begin_new_exec(struct linux_binprm * bprm) */ do_close_on_exec(me->files); - /* - * Once here, prepare_binrpm() will not be called any more, so - * the final state of setuid/setgid/fscaps can be merged into the - * secureexec flag. - */ - bprm->secureexec |= bprm->active_secureexec; - if (bprm->secureexec) { /* Make sure parent cannot signal privileged process. */ me->pdeath_signal = 0; @@ -1587,29 +1586,21 @@ static void check_unsafe_exec(struct linux_binprm *bprm) spin_unlock(&p->fs->lock); } -static void bprm_fill_uid(struct linux_binprm *bprm) +static void bprm_fill_uid(struct linux_binprm *bprm, struct file *file) { + /* Handle suid and sgid on files */ struct inode *inode; unsigned int mode; kuid_t uid; kgid_t gid; - /* - * Since this can be called multiple times (via prepare_binprm), - * we must clear any previous work done when setting set[ug]id - * bits from any earlier bprm->file uses (for example when run - * first for a setuid script then again for its interpreter). - */ - bprm->cred->euid = current_euid(); - bprm->cred->egid = current_egid(); - - if (!mnt_may_suid(bprm->file->f_path.mnt)) + if (!mnt_may_suid(file->f_path.mnt)) return; if (task_no_new_privs(current)) return; - inode = bprm->file->f_path.dentry->d_inode; + inode = file->f_path.dentry->d_inode; mode = READ_ONCE(inode->i_mode); if (!(mode & (S_ISUID|S_ISGID))) return; @@ -1629,19 +1620,31 @@ static void bprm_fill_uid(struct linux_binprm *bprm) return; if (mode & S_ISUID) { - bprm->pf_per_clear |= PER_CLEAR_ON_SETID; + bprm->per_clear |= PER_CLEAR_ON_SETID; bprm->cred->euid = uid; } if ((mode & (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) { - bprm->pf_per_clear |= PER_CLEAR_ON_SETID; + bprm->per_clear |= PER_CLEAR_ON_SETID; bprm->cred->egid = gid; } } +/* + * Compute brpm->cred based upon the final binary. + */ +static int bprm_creds_from_file(struct linux_binprm *bprm) +{ + /* Compute creds based on which file? */ + struct file *file = bprm->execfd_creds ? bprm->executable : bprm->file; + + bprm_fill_uid(bprm, file); + return security_bprm_creds_from_file(bprm, file); +} + /* * Fill the binprm structure from the inode. - * Check permissions, then read the first BINPRM_BUF_SIZE bytes + * Read the first BINPRM_BUF_SIZE bytes * * This may be called multiple times for binary chains (scripts for example). */ @@ -1649,20 +1652,6 @@ static int prepare_binprm(struct linux_binprm *bprm) { loff_t pos = 0; - /* Can the interpreter get to the executable without races? */ - if (!bprm->preserve_creds) { - int retval; - - /* Recompute parts of bprm->cred based on bprm->file */ - bprm->active_secureexec = 0; - bprm->pf_per_clear = 0; - bprm_fill_uid(bprm); - retval = security_bprm_repopulate_creds(bprm); - if (retval) - return retval; - } - bprm->preserve_creds = 0; - memset(bprm->buf, 0, BINPRM_BUF_SIZE); return kernel_read(bprm->file, bprm->buf, BINPRM_BUF_SIZE, &pos); } diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h index 50025ead0b72..aece1b340e7d 100644 --- a/include/linux/binfmts.h +++ b/include/linux/binfmts.h @@ -29,13 +29,8 @@ struct linux_binprm { /* Should an execfd be passed to userspace? */ have_execfd:1, - /* It is safe to use the creds of a script (see binfmt_misc) */ - preserve_creds:1, - /* - * True if most recent call to security_bprm_set_creds - * resulted in elevated privileges. - */ - active_secureexec:1, + /* Use the creds of a script (see binfmt_misc) */ + execfd_creds:1, /* * Set by bprm_creds_for_exec hook to indicate a * privilege-gaining exec has happened. Used to set @@ -55,11 +50,6 @@ struct linux_binprm { struct file * file; struct cred *cred; /* new credentials */ int unsafe; /* how unsafe this exec is (mask of LSM_UNSAFE_*) */ - /* - * bits to clear in current->personality - * recalculated for each bprm->file. - */ - unsigned int pf_per_clear; unsigned int per_clear; /* bits to clear in current->personality */ int argc, envc; const char * filename; /* Name of binary as seen by procps */ diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h index 1e295ba12c0d..adbc6603abba 100644 --- a/include/linux/lsm_hook_defs.h +++ b/include/linux/lsm_hook_defs.h @@ -50,7 +50,7 @@ LSM_HOOK(int, 0, settime, const struct timespec64 *ts, const struct timezone *tz) LSM_HOOK(int, 0, vm_enough_memory, struct mm_struct *mm, long pages) LSM_HOOK(int, 0, bprm_creds_for_exec, struct linux_binprm *bprm) -LSM_HOOK(int, 0, bprm_repopulate_creds, struct linux_binprm *bprm) +LSM_HOOK(int, 0, bprm_creds_from_file, struct linux_binprm *bprm, struct file *file) LSM_HOOK(int, 0, bprm_check_security, struct linux_binprm *bprm) LSM_HOOK(void, LSM_RET_VOID, bprm_committing_creds, struct linux_binprm *bprm) LSM_HOOK(void, LSM_RET_VOID, bprm_committed_creds, struct linux_binprm *bprm) diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index cd3dd0afceb5..37bb3df751c6 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -44,18 +44,18 @@ * request libc enable secure mode. * @bprm contains the linux_binprm structure. * Return 0 if the hook is successful and permission is granted. - * @bprm_repopulate_creds: - * Assuming that the relevant bits of @bprm->cred->security have been - * previously set, examine @bprm->file and regenerate them. This is - * so that the credentials derived from the interpreter the code is - * actually going to run are used rather than credentials derived - * from a script. This done because the interpreter binary needs to - * reopen script, and may end up opening something completely different. - * This hook may also optionally check permissions (e.g. for - * transitions between security domains). - * The hook must set @bprm->active_secureexec to 1 if AT_SECURE should be set to + * @bprm_creds_from_file: + * If @file is setpcap, suid, sgid or otherwise marked to change + * privilege upon exec, update @bprm->cred to reflect that change. + * This is called after finding the binary that will be executed. + * without an interpreter. This ensures that the credentials will not + * be derived from a script that the binary will need to reopen, which + * when reopend may end up being a completely different file. This + * hook may also optionally check permissions (e.g. for transitions + * between security domains). + * The hook must set @bprm->secureexec to 1 if AT_SECURE should be set to * request libc enable secure mode. - * The hook must set @bprm->pf_per_clear to the personality flags that + * The hook must set @bprm->per_clear to the personality flags that * should be cleared from current->personality. * @bprm contains the linux_binprm structure. * Return 0 if the hook is successful and permission is granted. diff --git a/include/linux/security.h b/include/linux/security.h index 6dcec9375e8f..8444fae7c5b9 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -140,7 +140,7 @@ extern int cap_capset(struct cred *new, const struct cred *old, const kernel_cap_t *effective, const kernel_cap_t *inheritable, const kernel_cap_t *permitted); -extern int cap_bprm_repopulate_creds(struct linux_binprm *bprm); +extern int cap_bprm_creds_from_file(struct linux_binprm *bprm, struct file *file); extern int cap_inode_setxattr(struct dentry *dentry, const char *name, const void *value, size_t size, int flags); extern int cap_inode_removexattr(struct dentry *dentry, const char *name); @@ -277,7 +277,7 @@ int security_syslog(int type); int security_settime64(const struct timespec64 *ts, const struct timezone *tz); int security_vm_enough_memory_mm(struct mm_struct *mm, long pages); int security_bprm_creds_for_exec(struct linux_binprm *bprm); -int security_bprm_repopulate_creds(struct linux_binprm *bprm); +int security_bprm_creds_from_file(struct linux_binprm *bprm, struct file *file); int security_bprm_check(struct linux_binprm *bprm); void security_bprm_committing_creds(struct linux_binprm *bprm); void security_bprm_committed_creds(struct linux_binprm *bprm); @@ -575,9 +575,10 @@ static inline int security_bprm_creds_for_exec(struct linux_binprm *bprm) return 0; } -static inline int security_bprm_repopulate_creds(struct linux_binprm *bprm) +static inline int security_bprm_creds_from_file(struct linux_binprm *bprm, + struct file *file) { - return cap_bprm_repopulate_creds(bprm); + return cap_bprm_creds_from_file(bprm, file); } static inline int security_bprm_check(struct linux_binprm *bprm) diff --git a/security/commoncap.c b/security/commoncap.c index 6de72d22dc6c..59bf3c1674c8 100644 --- a/security/commoncap.c +++ b/security/commoncap.c @@ -647,7 +647,8 @@ int get_vfs_caps_from_disk(const struct dentry *dentry, struct cpu_vfs_cap_data * its xattrs and, if present, apply them to the proposed credentials being * constructed by execve(). */ -static int get_file_caps(struct linux_binprm *bprm, bool *effective, bool *has_fcap) +static int get_file_caps(struct linux_binprm *bprm, struct file *file, + bool *effective, bool *has_fcap) { int rc = 0; struct cpu_vfs_cap_data vcaps; @@ -657,7 +658,7 @@ static int get_file_caps(struct linux_binprm *bprm, bool *effective, bool *has_f if (!file_caps_enabled) return 0; - if (!mnt_may_suid(bprm->file->f_path.mnt)) + if (!mnt_may_suid(file->f_path.mnt)) return 0; /* @@ -665,10 +666,10 @@ static int get_file_caps(struct linux_binprm *bprm, bool *effective, bool *has_f * explicit that capability bits are limited to s_user_ns and its * descendants. */ - if (!current_in_userns(bprm->file->f_path.mnt->mnt_sb->s_user_ns)) + if (!current_in_userns(file->f_path.mnt->mnt_sb->s_user_ns)) return 0; - rc = get_vfs_caps_from_disk(bprm->file->f_path.dentry, &vcaps); + rc = get_vfs_caps_from_disk(file->f_path.dentry, &vcaps); if (rc < 0) { if (rc == -EINVAL) printk(KERN_NOTICE "Invalid argument reading file caps for %s\n", @@ -797,26 +798,27 @@ static inline bool nonroot_raised_pE(struct cred *new, const struct cred *old, } /** - * cap_bprm_repopulate_creds - Set up the proposed credentials for execve(). + * cap_bprm_creds_from_file - Set up the proposed credentials for execve(). * @bprm: The execution parameters, including the proposed creds + * @file: The file to pull the credentials from * * Set up the proposed credentials for a new execution context being * constructed by execve(). The proposed creds in @bprm->cred is altered, * which won't take effect immediately. Returns 0 if successful, -ve on error. */ -int cap_bprm_repopulate_creds(struct linux_binprm *bprm) +int cap_bprm_creds_from_file(struct linux_binprm *bprm, struct file *file) { + /* Process setpcap binaries and capabilities for uid 0 */ const struct cred *old = current_cred(); struct cred *new = bprm->cred; bool effective = false, has_fcap = false, is_setid; int ret; kuid_t root_uid; - new->cap_ambient = old->cap_ambient; if (WARN_ON(!cap_ambient_invariant_ok(old))) return -EPERM; - ret = get_file_caps(bprm, &effective, &has_fcap); + ret = get_file_caps(bprm, file, &effective, &has_fcap); if (ret < 0) return ret; @@ -826,7 +828,7 @@ int cap_bprm_repopulate_creds(struct linux_binprm *bprm) /* if we have fs caps, clear dangerous personality flags */ if (__cap_gained(permitted, new, old)) - bprm->pf_per_clear |= PER_CLEAR_ON_SETID; + bprm->per_clear |= PER_CLEAR_ON_SETID; /* Don't let someone trace a set[ug]id/setpcap binary with the revised * credentials unless they have the appropriate permit. @@ -889,7 +891,7 @@ int cap_bprm_repopulate_creds(struct linux_binprm *bprm) (!__is_real(root_uid, new) && (effective || __cap_grew(permitted, ambient, new)))) - bprm->active_secureexec = 1; + bprm->secureexec = 1; return 0; } @@ -1346,7 +1348,7 @@ static struct security_hook_list capability_hooks[] __lsm_ro_after_init = { LSM_HOOK_INIT(ptrace_traceme, cap_ptrace_traceme), LSM_HOOK_INIT(capget, cap_capget), LSM_HOOK_INIT(capset, cap_capset), - LSM_HOOK_INIT(bprm_repopulate_creds, cap_bprm_repopulate_creds), + LSM_HOOK_INIT(bprm_creds_from_file, cap_bprm_creds_from_file), LSM_HOOK_INIT(inode_need_killpriv, cap_inode_need_killpriv), LSM_HOOK_INIT(inode_killpriv, cap_inode_killpriv), LSM_HOOK_INIT(inode_getsecurity, cap_inode_getsecurity), diff --git a/security/security.c b/security/security.c index b890b7e2a765..259b8e750aa2 100644 --- a/security/security.c +++ b/security/security.c @@ -828,9 +828,9 @@ int security_bprm_creds_for_exec(struct linux_binprm *bprm) return call_int_hook(bprm_creds_for_exec, 0, bprm); } -int security_bprm_repopulate_creds(struct linux_binprm *bprm) +int security_bprm_creds_from_file(struct linux_binprm *bprm, struct file *file) { - return call_int_hook(bprm_repopulate_creds, 0, bprm); + return call_int_hook(bprm_creds_from_file, 0, bprm, file); } int security_bprm_check(struct linux_binprm *bprm) -- 2.25.0
On Fri, May 29, 2020 at 11:46:40AM -0500, Eric W. Biederman wrote:
>
> There is a small bug in the code that recomputes parts of bprm->cred
> for every bprm->file. The code never recomputes the part of
> clear_dangerous_personality_flags it is responsible for.
>
> Which means that in practice if someone creates a sgid script
> the interpreter will not be able to use any of:
> READ_IMPLIES_EXEC
> ADDR_NO_RANDOMIZE
> ADDR_COMPAT_LAYOUT
> MMAP_PAGE_ZERO.
>
> This accentially clearing of personality flags probably does
> not matter in practice because no one has complained
> but it does make the code more difficult to understand.
>
> Further remaining bug compatible prevents the recomputation from being
> removed and replaced by simply computing bprm->cred once from the
> final bprm->file.
>
> Making this change removes the last behavior difference between
> computing bprm->creds from the final file and recomputing
> bprm->cred several times. Which allows this behavior change
> to be justified for it's own reasons, and for any but hunts
> looking into why the behavior changed to wind up here instead
> of in the code that will follow that computes bprm->cred
> from the final bprm->file.
>
> This small logic bug appears to have existed since the code
> started clearing dangerous personality bits.
>
> History Tree: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
> Fixes: 1bb0fa189c6a ("[PATCH] NX: clean up legacy binary support")
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Yup, this looks good. Pointless nit because it's removed in the next
patch, but pf_per_clear is following the same behavioral pattern as
active_secureexec, it could be named active_per_clear, but since this
already been bikeshed in v1, it's fine! :)
Reviewed-by: Kees Cook <keescook@chromium.org>
I wish we had more robust execve tests. :(
--
Kees Cook
On Fri, May 29, 2020 at 11:47:29AM -0500, Eric W. Biederman wrote: > Move the computation of creds from prepare_binfmt into begin_new_exec > so that the creds need only be computed once. This is just code > reorganization no semantic changes of any kind are made. > > Moving the computation is safe. I have looked through the kernel and > verified none of the binfmts look at bprm->cred directly, and that > there are no helpers that look at bprm->cred indirectly. Which means > that it is not a problem to compute the bprm->cred later in the > execution flow as it is not used until it becomes current->cred. > > A new function bprm_creds_from_file is added to contain the work that > needs to be done. bprm_creds_from_file first computes which file > bprm->executable or most likely bprm->file that the bprm->creds > will be computed from. > > The funciton bprm_fill_uid is updated to receive the file instead of > accessing bprm->file. The now unnecessary work needed to reset the > bprm->cred->euid, and bprm->cred->egid is removed from brpm_fill_uid. > A small comment to document that bprm_fill_uid now only deals with the > work to handle suid and sgid files. The default case is already > heandled by prepare_exec_creds. > > The function security_bprm_repopulate_creds is renamed > security_bprm_creds_from_file and now is explicitly passed the file > from which to compute the creds. The documentation of the > bprm_creds_from_file security hook is updated to explain when the hook > is called and what it needs to do. The file is passed from > cap_bprm_creds_from_file into get_file_caps so that the caps are > computed for the appropriate file. The now unnecessary work in > cap_bprm_creds_from_file to reset the ambient capabilites has been > removed. A small comment to document that the work of > cap_bprm_creds_from_file is to read capabilities from the files > secureity attribute and derive capabilities from the fact the > user had uid 0 has been added. > > Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> This all looks good to me. Small notes below... Reviewed-by: Kees Cook <keescook@chromium.org> > diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h > index cd3dd0afceb5..37bb3df751c6 100644 > --- a/include/linux/lsm_hooks.h > +++ b/include/linux/lsm_hooks.h > @@ -44,18 +44,18 @@ > * request libc enable secure mode. > * @bprm contains the linux_binprm structure. > * Return 0 if the hook is successful and permission is granted. > - * @bprm_repopulate_creds: > - * Assuming that the relevant bits of @bprm->cred->security have been > - * previously set, examine @bprm->file and regenerate them. This is > - * so that the credentials derived from the interpreter the code is > - * actually going to run are used rather than credentials derived > - * from a script. This done because the interpreter binary needs to > - * reopen script, and may end up opening something completely different. > - * This hook may also optionally check permissions (e.g. for > - * transitions between security domains). > - * The hook must set @bprm->active_secureexec to 1 if AT_SECURE should be set to > + * @bprm_creds_from_file: > + * If @file is setpcap, suid, sgid or otherwise marked to change > + * privilege upon exec, update @bprm->cred to reflect that change. > + * This is called after finding the binary that will be executed. > + * without an interpreter. This ensures that the credentials will not > + * be derived from a script that the binary will need to reopen, which > + * when reopend may end up being a completely different file. This > + * hook may also optionally check permissions (e.g. for transitions > + * between security domains). > + * The hook must set @bprm->secureexec to 1 if AT_SECURE should be set to > * request libc enable secure mode. > - * The hook must set @bprm->pf_per_clear to the personality flags that > + * The hook must set @bprm->per_clear to the personality flags that Here and the other per_clear comment have language that doesn't quite line up with how hooks should deal with the bits. They should not "set it to" the personality flags they want clear, they need to "add the bits" they want to see cleared. i.e I don't want something thinking they're the only one touching per_clear, so they should never do: bprm->per_clear = PER_CLEAR_ON_SETID; but always: bprm->per_clear |= PER_CLEAR_ON_SETID; How about: The hook must set @bprm->per_clear with any personality flag bits that > diff --git a/security/commoncap.c b/security/commoncap.c Not about this patch, but while looking through this file, I see: int cap_bprm_set_creds(struct linux_binprm *bprm) { ... *capability manipulations* if (WARN_ON(!cap_ambient_invariant_ok(new))) return -EPERM; if (nonroot_raised_pE(new, old, root_uid, has_fcap)) { ret = audit_log_bprm_fcaps(bprm, new, old); if (ret < 0) return ret; } new->securebits &= ~issecure_mask(SECURE_KEEP_CAPS); if (WARN_ON(!cap_ambient_invariant_ok(new))) return -EPERM; ... } The cap_ambient_invariant_ok() test is needlessly repeated: it doesn't examine securebits, and nonroot_raised_pE appears to have no side-effects. One of those can be dropped, yes? -- Kees Cook
Kees Cook <keescook@chromium.org> writes: > On Fri, May 29, 2020 at 11:46:40AM -0500, Eric W. Biederman wrote: >> >> There is a small bug in the code that recomputes parts of bprm->cred >> for every bprm->file. The code never recomputes the part of >> clear_dangerous_personality_flags it is responsible for. >> >> Which means that in practice if someone creates a sgid script >> the interpreter will not be able to use any of: >> READ_IMPLIES_EXEC >> ADDR_NO_RANDOMIZE >> ADDR_COMPAT_LAYOUT >> MMAP_PAGE_ZERO. >> >> This accentially clearing of personality flags probably does >> not matter in practice because no one has complained >> but it does make the code more difficult to understand. >> >> Further remaining bug compatible prevents the recomputation from being >> removed and replaced by simply computing bprm->cred once from the >> final bprm->file. >> >> Making this change removes the last behavior difference between >> computing bprm->creds from the final file and recomputing >> bprm->cred several times. Which allows this behavior change >> to be justified for it's own reasons, and for any but hunts >> looking into why the behavior changed to wind up here instead >> of in the code that will follow that computes bprm->cred >> from the final bprm->file. >> >> This small logic bug appears to have existed since the code >> started clearing dangerous personality bits. >> >> History Tree: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git >> Fixes: 1bb0fa189c6a ("[PATCH] NX: clean up legacy binary support") >> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> > > Yup, this looks good. Pointless nit because it's removed in the next > patch, but pf_per_clear is following the same behavioral pattern as > active_secureexec, it could be named active_per_clear, but since this > already been bikeshed in v1, it's fine! :) That plus it is very much true that active_ isn't a particularly good prefix. pf_ for per_file seems slightly better. The only time I can imagine this patch seeing the light of day is if someone happens to discover that this fixes a bug for them and just this patch is backported. At which point pf_per_clear pairs with cap_elevated. So I don't think it hurts. *Shrug* The next patch is my long term solution to the mess. > Reviewed-by: Kees Cook <keescook@chromium.org> > > I wish we had more robust execve tests. :( I think you have more skill at writing automated tests than I do. So feel free to write some. Eric
Kees Cook <keescook@chromium.org> writes: > On Fri, May 29, 2020 at 11:47:29AM -0500, Eric W. Biederman wrote: >> diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h >> index cd3dd0afceb5..37bb3df751c6 100644 >> --- a/include/linux/lsm_hooks.h >> +++ b/include/linux/lsm_hooks.h >> @@ -44,18 +44,18 @@ >> * request libc enable secure mode. >> - * The hook must set @bprm->pf_per_clear to the personality flags that >> + * The hook must set @bprm->per_clear to the personality flags that > > Here and the other per_clear comment have language that doesn't quite > line up with how hooks should deal with the bits. They should not "set > it to" the personality flags they want clear, they need to "add the > bits" they want to see cleared. i.e I don't want something thinking > they're the only one touching per_clear, so they should never do: > bprm->per_clear = PER_CLEAR_ON_SETID; > but always: > bprm->per_clear |= PER_CLEAR_ON_SETID; > > How about: > > The hook must set @bprm->per_clear with any personality flag bits that Sounds good: The range-diff winds up being: 1: c9258ef4879b ! 1: a7868323c263 exec: Add a per bprm->file version of per_clear @@ Commit message History Tree: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git Fixes: 1bb0fa189c6a ("[PATCH] NX: clean up legacy binary support") + Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> ## fs/exec.c ## @@ include/linux/lsm_hooks.h * transitions between security domains). * The hook must set @bprm->active_secureexec to 1 if AT_SECURE should be set to * request libc enable secure mode. -+ * The hook must set @bprm->pf_per_clear to the personality flags that -+ * should be cleared from current->personality. ++ * The hook must add to @bprm->pf_per_clear any personality flags that ++ * should be cleared from current->personality. * @bprm contains the linux_binprm structure. * Return 0 if the hook is successful and permission is granted. * @bprm_check_security: 2: e6f20c69b96e ! 2: 56305aa9b6fa exec: Compute file based creds only once @@ Commit message secureity attribute and derive capabilities from the fact the user had uid 0 has been added. + Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> ## fs/binfmt_misc.c ## @@ include/linux/lsm_hooks.h + * between security domains). + * The hook must set @bprm->secureexec to 1 if AT_SECURE should be set to * request libc enable secure mode. -- * The hook must set @bprm->pf_per_clear to the personality flags that -+ * The hook must set @bprm->per_clear to the personality flags that - * should be cleared from current->personality. +- * The hook must add to @bprm->pf_per_clear any personality flags that ++ * The hook must add to @bprm->per_clear any personality flags that + * should be cleared from current->personality. * @bprm contains the linux_binprm structure. * Return 0 if the hook is successful and permission is granted. >> diff --git a/security/commoncap.c b/security/commoncap.c > > Not about this patch, but while looking through this file, I see: > > int cap_bprm_set_creds(struct linux_binprm *bprm) > { > ... > *capability manipulations* > > if (WARN_ON(!cap_ambient_invariant_ok(new))) > return -EPERM; > > if (nonroot_raised_pE(new, old, root_uid, has_fcap)) { > ret = audit_log_bprm_fcaps(bprm, new, old); > if (ret < 0) > return ret; > } > > new->securebits &= ~issecure_mask(SECURE_KEEP_CAPS); > > if (WARN_ON(!cap_ambient_invariant_ok(new))) > return -EPERM; > ... > } > > The cap_ambient_invariant_ok() test is needlessly repeated: it doesn't > examine securebits, and nonroot_raised_pE appears to have no > side-effects. > > One of those can be dropped, yes? That is what it looks like to me. I am hoping to take a deep dive into this function after I finish with bprm_fill_uid (the patches that were dropped). My brain bends on little details like is_setid not testing if the excutable was suid or sgid, but instead is testing something close but unrelated. I hope that when the dust clears the function can become a straightforward implementation of the capability equations. We will see. Eric
On Fri, May 29, 2020 at 10:23:58PM -0500, Eric W. Biederman wrote:
> Kees Cook <keescook@chromium.org> writes:
> > I wish we had more robust execve tests. :(
>
> I think you have more skill at writing automated tests than I do. So
> feel free to write some.
Yeah, my limiting factor is available time. No worries; I didn't mean
it as a request to you -- it was more a commiseration. :)
--
Kees Cook
On Fri, May 29, 2020 at 10:28:41PM -0500, Eric W. Biederman wrote: > The range-diff winds up being: > 1: c9258ef4879b ! 1: a7868323c263 exec: Add a per bprm->file version of per_clear > @@ Commit message > > History Tree: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git > Fixes: 1bb0fa189c6a ("[PATCH] NX: clean up legacy binary support") > + Reviewed-by: Kees Cook <keescook@chromium.org> > Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> > > ## fs/exec.c ## > @@ include/linux/lsm_hooks.h > * transitions between security domains). > * The hook must set @bprm->active_secureexec to 1 if AT_SECURE should be set to > * request libc enable secure mode. > -+ * The hook must set @bprm->pf_per_clear to the personality flags that > -+ * should be cleared from current->personality. > ++ * The hook must add to @bprm->pf_per_clear any personality flags that > ++ * should be cleared from current->personality. > * @bprm contains the linux_binprm structure. > * Return 0 if the hook is successful and permission is granted. > * @bprm_check_security: > 2: e6f20c69b96e ! 2: 56305aa9b6fa exec: Compute file based creds only once > @@ Commit message > secureity attribute and derive capabilities from the fact the > user had uid 0 has been added. > > + Reviewed-by: Kees Cook <keescook@chromium.org> > Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> > > ## fs/binfmt_misc.c ## > @@ include/linux/lsm_hooks.h > + * between security domains). > + * The hook must set @bprm->secureexec to 1 if AT_SECURE should be set to > * request libc enable secure mode. > -- * The hook must set @bprm->pf_per_clear to the personality flags that > -+ * The hook must set @bprm->per_clear to the personality flags that > - * should be cleared from current->personality. > +- * The hook must add to @bprm->pf_per_clear any personality flags that > ++ * The hook must add to @bprm->per_clear any personality flags that > + * should be cleared from current->personality. > * @bprm contains the linux_binprm structure. > * Return 0 if the hook is successful and permission is granted. Awesome; thanks! > > The cap_ambient_invariant_ok() test is needlessly repeated: it doesn't > > examine securebits, and nonroot_raised_pE appears to have no > > side-effects. > > > > One of those can be dropped, yes? > > That is what it looks like to me. Okay, cool. I was worried I was missing something in the mess of tiny helper calls. :) > I hope that when the dust clears the function can become a > straightforward implementation of the capability equations. > We will see. Yeah, this looks better and better every day! I'm glad you're able to dig through all of this. -- Kees Cook