Linux-Fsdevel Archive on lore.kernel.org
 help / color / Atom feed
* exec: Promised cleanups after introducing exec_update_mutex
@ 2020-05-05 19:39 Eric W. Biederman
  2020-05-05 19:41 ` [PATCH 1/7] binfmt: Move install_exec_creds after setup_new_exec to match binfmt_elf Eric W. Biederman
                   ` (8 more replies)
  0 siblings, 9 replies; 122+ messages in thread
From: Eric W. Biederman @ 2020-05-05 19:39 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Oleg Nesterov, Jann Horn, Kees Cook,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton


In the patchset that introduced exec_update_mutex there were a few last
minute discoveries and fixes that left the code in a state that can
be very easily be improved.

During the merge window we discussed the first three of these patches
and I promised I would resend them.

What the first patch does is it makes the the calls in the binfmts:
	flush_old_exec();
        /* set the personality */
        setup_new_exec();
        install_exec_creds();

With no sleeps or anything in between.

At the conclusion of this set of changes the the calls in the binfmts
are:
	begin_new_exec();
        /* set the personality */
        setup_new_exec();

The intent is to make the code easier to follow and easier to change.

Eric W. Biederman (7):
      binfmt: Move install_exec_creds after setup_new_exec to match binfmt_elf
      exec: Make unlocking exec_update_mutex explict
      exec: Rename the flag called_exec_mmap point_of_no_return
      exec: Merge install_exec_creds into setup_new_exec
      exec: In setup_new_exec cache current in the local variable me
      exec: Move most of setup_new_exec into flush_old_exec
      exec: Rename flush_old_exec begin_new_exec

 Documentation/trace/ftrace.rst |   2 +-
 arch/x86/ia32/ia32_aout.c      |   4 +-
 fs/binfmt_aout.c               |   3 +-
 fs/binfmt_elf.c                |   3 +-
 fs/binfmt_elf_fdpic.c          |   3 +-
 fs/binfmt_flat.c               |   4 +-
 fs/exec.c                      | 162 ++++++++++++++++++++---------------------
 include/linux/binfmts.h        |  10 +--
 kernel/events/core.c           |   2 +-
 9 files changed, 92 insertions(+), 101 deletions(-)

---

These changes are against v5.7-rc3.

My intention once everything passes code reveiw is to place these
changes in a topic branch in my tree and then into linux-next, and
eventually to send Linus a pull when the next merge window opens.
Unless someone has a better idea.

I am a little concerned that I might conflict with the ongoing coredump
cleanups.

I have several follow up sets of changes with additional cleanups as
well but I am trying to keep everything small enough that the code can
be reviewed.

After enough cleanups I hope to reopen the conversation of dealing with
the livelock situation with cred_guard_mutex.  As I think figuring out
what to do becomes much easier once several of my planned
cleanups/improvements have been made.

But ultimately I just want to get exec to the point where when
we have disucssions on how to make exec better the code is in good
enough shape we can actually address the issues we see.

Eric

^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH 1/7] binfmt: Move install_exec_creds after setup_new_exec to match binfmt_elf
  2020-05-05 19:39 exec: Promised cleanups after introducing exec_update_mutex Eric W. Biederman
@ 2020-05-05 19:41 ` Eric W. Biederman
  2020-05-05 20:45   ` Kees Cook
  2020-05-06 12:42   ` Greg Ungerer
  2020-05-05 19:41 ` [PATCH 2/7] exec: Make unlocking exec_update_mutex explict Eric W. Biederman
                   ` (7 subsequent siblings)
  8 siblings, 2 replies; 122+ messages in thread
From: Eric W. Biederman @ 2020-05-05 19:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Oleg Nesterov, Jann Horn, Kees Cook,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton


In 2016 Linus moved install_exec_creds immediately after
setup_new_exec, in binfmt_elf as a cleanup and as part of closing a
potential information leak.

Perform the same cleanup for the other binary formats.

Different binary formats doing the same things the same way makes exec
easier to reason about and easier to maintain.

The binfmt_flagt bits were tested by Greg Ungerer <gerg@linux-m68k.org>

Ref: 9f834ec18def ("binfmt_elf: switch to new creds when switching to new mm")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/x86/ia32/ia32_aout.c | 3 +--
 fs/binfmt_aout.c          | 2 +-
 fs/binfmt_elf_fdpic.c     | 2 +-
 fs/binfmt_flat.c          | 3 +--
 4 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/arch/x86/ia32/ia32_aout.c b/arch/x86/ia32/ia32_aout.c
index 9bb71abd66bd..37b36a8ce5fa 100644
--- a/arch/x86/ia32/ia32_aout.c
+++ b/arch/x86/ia32/ia32_aout.c
@@ -140,6 +140,7 @@ static int load_aout_binary(struct linux_binprm *bprm)
 	set_personality_ia32(false);
 
 	setup_new_exec(bprm);
+	install_exec_creds(bprm);
 
 	regs->cs = __USER32_CS;
 	regs->r8 = regs->r9 = regs->r10 = regs->r11 = regs->r12 =
@@ -156,8 +157,6 @@ static int load_aout_binary(struct linux_binprm *bprm)
 	if (retval < 0)
 		return retval;
 
-	install_exec_creds(bprm);
-
 	if (N_MAGIC(ex) == OMAGIC) {
 		unsigned long text_addr, map_size;
 
diff --git a/fs/binfmt_aout.c b/fs/binfmt_aout.c
index 8e8346a81723..ace587b66904 100644
--- a/fs/binfmt_aout.c
+++ b/fs/binfmt_aout.c
@@ -162,6 +162,7 @@ static int load_aout_binary(struct linux_binprm * bprm)
 	set_personality(PER_LINUX);
 #endif
 	setup_new_exec(bprm);
+	install_exec_creds(bprm);
 
 	current->mm->end_code = ex.a_text +
 		(current->mm->start_code = N_TXTADDR(ex));
@@ -174,7 +175,6 @@ static int load_aout_binary(struct linux_binprm * bprm)
 	if (retval < 0)
 		return retval;
 
-	install_exec_creds(bprm);
 
 	if (N_MAGIC(ex) == OMAGIC) {
 		unsigned long text_addr, map_size;
diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c
index 240f66663543..6c94c6d53d97 100644
--- a/fs/binfmt_elf_fdpic.c
+++ b/fs/binfmt_elf_fdpic.c
@@ -353,6 +353,7 @@ static int load_elf_fdpic_binary(struct linux_binprm *bprm)
 		current->personality |= READ_IMPLIES_EXEC;
 
 	setup_new_exec(bprm);
+	install_exec_creds(bprm);
 
 	set_binfmt(&elf_fdpic_format);
 
@@ -434,7 +435,6 @@ static int load_elf_fdpic_binary(struct linux_binprm *bprm)
 	current->mm->start_stack = current->mm->start_brk + stack_size;
 #endif
 
-	install_exec_creds(bprm);
 	if (create_elf_fdpic_tables(bprm, current->mm,
 				    &exec_params, &interp_params) < 0)
 		goto error;
diff --git a/fs/binfmt_flat.c b/fs/binfmt_flat.c
index 831a2b25ba79..1a1d1fcb893f 100644
--- a/fs/binfmt_flat.c
+++ b/fs/binfmt_flat.c
@@ -541,6 +541,7 @@ static int load_flat_file(struct linux_binprm *bprm,
 		/* OK, This is the point of no return */
 		set_personality(PER_LINUX_32BIT);
 		setup_new_exec(bprm);
+		install_exec_creds(bprm);
 	}
 
 	/*
@@ -963,8 +964,6 @@ static int load_flat_binary(struct linux_binprm *bprm)
 		}
 	}
 
-	install_exec_creds(bprm);
-
 	set_binfmt(&flat_format);
 
 #ifdef CONFIG_MMU
-- 
2.20.1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH 2/7] exec: Make unlocking exec_update_mutex explict
  2020-05-05 19:39 exec: Promised cleanups after introducing exec_update_mutex Eric W. Biederman
  2020-05-05 19:41 ` [PATCH 1/7] binfmt: Move install_exec_creds after setup_new_exec to match binfmt_elf Eric W. Biederman
@ 2020-05-05 19:41 ` Eric W. Biederman
  2020-05-05 20:46   ` Kees Cook
  2020-05-05 19:42 ` [PATCH 3/7] exec: Rename the flag called_exec_mmap point_of_no_return Eric W. Biederman
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 122+ messages in thread
From: Eric W. Biederman @ 2020-05-05 19:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Oleg Nesterov, Jann Horn, Kees Cook,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton


With install_exec_creds updated to follow immediately after
setup_new_exec, the failure of unshare_sighand is the only
code path where exec_update_mutex is held but not explicitly
unlocked.

Update that code path to explicitly unlock exec_update_mutex.

Remove the unlocking of exec_update_mutex from free_bprm.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/exec.c               | 6 +++---
 include/linux/binfmts.h | 3 +--
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 06b4c550af5d..6bd82a007bfc 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1344,7 +1344,7 @@ int flush_old_exec(struct linux_binprm * bprm)
 	 */
 	retval = unshare_sighand(me);
 	if (retval)
-		goto out;
+		goto out_unlock;
 
 	set_fs(USER_DS);
 	me->flags &= ~(PF_RANDOMIZE | PF_FORKNOEXEC | PF_KTHREAD |
@@ -1361,6 +1361,8 @@ int flush_old_exec(struct linux_binprm * bprm)
 	do_close_on_exec(me->files);
 	return 0;
 
+out_unlock:
+	mutex_unlock(&me->signal->exec_update_mutex);
 out:
 	return retval;
 }
@@ -1477,8 +1479,6 @@ static void free_bprm(struct linux_binprm *bprm)
 {
 	free_arg_pages(bprm);
 	if (bprm->cred) {
-		if (bprm->called_exec_mmap)
-			mutex_unlock(&current->signal->exec_update_mutex);
 		mutex_unlock(&current->signal->cred_guard_mutex);
 		abort_creds(bprm->cred);
 	}
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index a345d9fed3d8..6f564b9ad882 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -47,8 +47,7 @@ struct linux_binprm {
 		secureexec:1,
 		/*
 		 * Set by flush_old_exec, when exec_mmap has been called.
-		 * This is past the point of no return, when the
-		 * exec_update_mutex has been taken.
+		 * This is past the point of no return.
 		 */
 		called_exec_mmap:1;
 #ifdef __alpha__
-- 
2.20.1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH 3/7] exec: Rename the flag called_exec_mmap point_of_no_return
  2020-05-05 19:39 exec: Promised cleanups after introducing exec_update_mutex Eric W. Biederman
  2020-05-05 19:41 ` [PATCH 1/7] binfmt: Move install_exec_creds after setup_new_exec to match binfmt_elf Eric W. Biederman
  2020-05-05 19:41 ` [PATCH 2/7] exec: Make unlocking exec_update_mutex explict Eric W. Biederman
@ 2020-05-05 19:42 ` Eric W. Biederman
  2020-05-05 20:49   ` Kees Cook
  2020-05-05 19:43 ` [PATCH 4/7] exec: Merge install_exec_creds into setup_new_exec Eric W. Biederman
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 122+ messages in thread
From: Eric W. Biederman @ 2020-05-05 19:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Oleg Nesterov, Jann Horn, Kees Cook,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton


Update the comments and make the code easier to understand by
renaming this flag.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/exec.c               | 12 ++++++------
 include/linux/binfmts.h |  6 +++---
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 6bd82a007bfc..71de9f57ae09 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1326,12 +1326,12 @@ int flush_old_exec(struct linux_binprm * bprm)
 		goto out;
 
 	/*
-	 * After setting bprm->called_exec_mmap (to mark that current is
-	 * using the prepared mm now), we have nothing left of the original
-	 * process. If anything from here on returns an error, the check
-	 * in search_binary_handler() will SEGV current.
+	 * With the new mm installed it is completely impossible to
+	 * fail and return to the original process.  If anything from
+	 * here on returns an error, the check in
+	 * search_binary_handler() will SEGV current.
 	 */
-	bprm->called_exec_mmap = 1;
+	bprm->point_of_no_return = true;
 	bprm->mm = NULL;
 
 #ifdef CONFIG_POSIX_TIMERS
@@ -1720,7 +1720,7 @@ int search_binary_handler(struct linux_binprm *bprm)
 
 		read_lock(&binfmt_lock);
 		put_binfmt(fmt);
-		if (retval < 0 && bprm->called_exec_mmap) {
+		if (retval < 0 && bprm->point_of_no_return) {
 			/* we got to flush_old_exec() and failed after it */
 			read_unlock(&binfmt_lock);
 			force_sigsegv(SIGSEGV);
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index 6f564b9ad882..8f479dad7931 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -46,10 +46,10 @@ struct linux_binprm {
 		 */
 		secureexec:1,
 		/*
-		 * Set by flush_old_exec, when exec_mmap has been called.
-		 * This is past the point of no return.
+		 * Set when errors can no longer be returned to the
+		 * original userspace.
 		 */
-		called_exec_mmap:1;
+		point_of_no_return:1;
 #ifdef __alpha__
 	unsigned int taso:1;
 #endif
-- 
2.20.1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH 4/7] exec: Merge install_exec_creds into setup_new_exec
  2020-05-05 19:39 exec: Promised cleanups after introducing exec_update_mutex Eric W. Biederman
                   ` (2 preceding siblings ...)
  2020-05-05 19:42 ` [PATCH 3/7] exec: Rename the flag called_exec_mmap point_of_no_return Eric W. Biederman
@ 2020-05-05 19:43 ` Eric W. Biederman
  2020-05-05 20:50   ` Kees Cook
  2020-05-05 19:44 ` [PATCH 5/7] exec: In setup_new_exec cache current in the local variable me Eric W. Biederman
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 122+ messages in thread
From: Eric W. Biederman @ 2020-05-05 19:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Oleg Nesterov, Jann Horn, Kees Cook,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton


The two functions are now always called one right after the
other so merge them together to make future maintenance easier.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/x86/ia32/ia32_aout.c |  1 -
 fs/binfmt_aout.c          |  1 -
 fs/binfmt_elf.c           |  1 -
 fs/binfmt_elf_fdpic.c     |  1 -
 fs/binfmt_flat.c          |  1 -
 fs/exec.c                 | 56 ++++++++++++++++++---------------------
 include/linux/binfmts.h   |  1 -
 kernel/events/core.c      |  2 +-
 8 files changed, 27 insertions(+), 37 deletions(-)

diff --git a/arch/x86/ia32/ia32_aout.c b/arch/x86/ia32/ia32_aout.c
index 37b36a8ce5fa..8255fdc3a027 100644
--- a/arch/x86/ia32/ia32_aout.c
+++ b/arch/x86/ia32/ia32_aout.c
@@ -140,7 +140,6 @@ static int load_aout_binary(struct linux_binprm *bprm)
 	set_personality_ia32(false);
 
 	setup_new_exec(bprm);
-	install_exec_creds(bprm);
 
 	regs->cs = __USER32_CS;
 	regs->r8 = regs->r9 = regs->r10 = regs->r11 = regs->r12 =
diff --git a/fs/binfmt_aout.c b/fs/binfmt_aout.c
index ace587b66904..c8ba28f285e5 100644
--- a/fs/binfmt_aout.c
+++ b/fs/binfmt_aout.c
@@ -162,7 +162,6 @@ static int load_aout_binary(struct linux_binprm * bprm)
 	set_personality(PER_LINUX);
 #endif
 	setup_new_exec(bprm);
-	install_exec_creds(bprm);
 
 	current->mm->end_code = ex.a_text +
 		(current->mm->start_code = N_TXTADDR(ex));
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 13f25e241ac4..e6b586623035 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -858,7 +858,6 @@ static int load_elf_binary(struct linux_binprm *bprm)
 		current->flags |= PF_RANDOMIZE;
 
 	setup_new_exec(bprm);
-	install_exec_creds(bprm);
 
 	/* Do this so that we can load the interpreter, if need be.  We will
 	   change some of these later */
diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c
index 6c94c6d53d97..9a1aa61b4cc3 100644
--- a/fs/binfmt_elf_fdpic.c
+++ b/fs/binfmt_elf_fdpic.c
@@ -353,7 +353,6 @@ static int load_elf_fdpic_binary(struct linux_binprm *bprm)
 		current->personality |= READ_IMPLIES_EXEC;
 
 	setup_new_exec(bprm);
-	install_exec_creds(bprm);
 
 	set_binfmt(&elf_fdpic_format);
 
diff --git a/fs/binfmt_flat.c b/fs/binfmt_flat.c
index 1a1d1fcb893f..252878969582 100644
--- a/fs/binfmt_flat.c
+++ b/fs/binfmt_flat.c
@@ -541,7 +541,6 @@ static int load_flat_file(struct linux_binprm *bprm,
 		/* OK, This is the point of no return */
 		set_personality(PER_LINUX_32BIT);
 		setup_new_exec(bprm);
-		install_exec_creds(bprm);
 	}
 
 	/*
diff --git a/fs/exec.c b/fs/exec.c
index 71de9f57ae09..93e40f865523 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1443,6 +1443,31 @@ void setup_new_exec(struct linux_binprm * bprm)
 	   group */
 	WRITE_ONCE(current->self_exec_id, current->self_exec_id + 1);
 	flush_signal_handlers(current, 0);
+
+	/*
+	 * install the new credentials for this executable
+	 */
+	security_bprm_committing_creds(bprm);
+
+	commit_creds(bprm->cred);
+	bprm->cred = NULL;
+
+	/*
+	 * Disable monitoring for regular users
+	 * when executing setuid binaries. Must
+	 * wait until new credentials are committed
+	 * by commit_creds() above
+	 */
+	if (get_dumpable(current->mm) != SUID_DUMP_USER)
+		perf_event_exit_task(current);
+	/*
+	 * cred_guard_mutex must be held at least to this point to prevent
+	 * ptrace_attach() from altering our determination of the task's
+	 * credentials; any time after this it may be unlocked.
+	 */
+	security_bprm_committed_creds(bprm);
+	mutex_unlock(&current->signal->exec_update_mutex);
+	mutex_unlock(&current->signal->cred_guard_mutex);
 }
 EXPORT_SYMBOL(setup_new_exec);
 
@@ -1458,7 +1483,7 @@ EXPORT_SYMBOL(finalize_exec);
 
 /*
  * Prepare credentials and lock ->cred_guard_mutex.
- * install_exec_creds() commits the new creds and drops the lock.
+ * setup_new_exec() commits the new creds and drops the lock.
  * Or, if exec fails before, free_bprm() should release ->cred and
  * and unlock.
  */
@@ -1504,35 +1529,6 @@ int bprm_change_interp(const char *interp, struct linux_binprm *bprm)
 }
 EXPORT_SYMBOL(bprm_change_interp);
 
-/*
- * install the new credentials for this executable
- */
-void install_exec_creds(struct linux_binprm *bprm)
-{
-	security_bprm_committing_creds(bprm);
-
-	commit_creds(bprm->cred);
-	bprm->cred = NULL;
-
-	/*
-	 * Disable monitoring for regular users
-	 * when executing setuid binaries. Must
-	 * wait until new credentials are committed
-	 * by commit_creds() above
-	 */
-	if (get_dumpable(current->mm) != SUID_DUMP_USER)
-		perf_event_exit_task(current);
-	/*
-	 * cred_guard_mutex must be held at least to this point to prevent
-	 * ptrace_attach() from altering our determination of the task's
-	 * credentials; any time after this it may be unlocked.
-	 */
-	security_bprm_committed_creds(bprm);
-	mutex_unlock(&current->signal->exec_update_mutex);
-	mutex_unlock(&current->signal->cred_guard_mutex);
-}
-EXPORT_SYMBOL(install_exec_creds);
-
 /*
  * determine how safe it is to execute the proposed program
  * - the caller must hold ->cred_guard_mutex to protect against
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index 8f479dad7931..2a8fddf3574a 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -145,7 +145,6 @@ extern int transfer_args_to_stack(struct linux_binprm *bprm,
 extern int bprm_change_interp(const char *interp, struct linux_binprm *bprm);
 extern int copy_strings_kernel(int argc, const char *const *argv,
 			       struct linux_binprm *bprm);
-extern void install_exec_creds(struct linux_binprm *bprm);
 extern void set_binfmt(struct linux_binfmt *new);
 extern ssize_t read_code(struct file *, unsigned long, loff_t, size_t);
 
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 633b4ae72ed5..169449b5e56b 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -12217,7 +12217,7 @@ static void perf_event_exit_task_context(struct task_struct *child, int ctxn)
  * When a child task exits, feed back event values to parent events.
  *
  * Can be called with exec_update_mutex held when called from
- * install_exec_creds().
+ * setup_new_exec().
  */
 void perf_event_exit_task(struct task_struct *child)
 {
-- 
2.20.1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH 5/7] exec: In setup_new_exec cache current in the local variable me
  2020-05-05 19:39 exec: Promised cleanups after introducing exec_update_mutex Eric W. Biederman
                   ` (3 preceding siblings ...)
  2020-05-05 19:43 ` [PATCH 4/7] exec: Merge install_exec_creds into setup_new_exec Eric W. Biederman
@ 2020-05-05 19:44 ` Eric W. Biederman
  2020-05-05 20:51   ` Kees Cook
  2020-05-05 19:45 ` [PATCH 6/7] exec: Move most of setup_new_exec into flush_old_exec Eric W. Biederman
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 122+ messages in thread
From: Eric W. Biederman @ 2020-05-05 19:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Oleg Nesterov, Jann Horn, Kees Cook,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton


At least gcc 8.3 when generating code for x86_64 has a hard time
consolidating multiple calls to current aka get_current(), and winds
up unnecessarily rereading %gs:current_task several times in
setup_new_exec.

Caching the value of current in the local variable of me generates
slightly better and shorter assembly.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/exec.c | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 93e40f865523..8c3abafb9bb1 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1391,6 +1391,7 @@ EXPORT_SYMBOL(would_dump);
 
 void setup_new_exec(struct linux_binprm * bprm)
 {
+	struct task_struct *me = current;
 	/*
 	 * Once here, prepare_binrpm() will not be called any more, so
 	 * the final state of setuid/setgid/fscaps can be merged into the
@@ -1400,7 +1401,7 @@ void setup_new_exec(struct linux_binprm * bprm)
 
 	if (bprm->secureexec) {
 		/* Make sure parent cannot signal privileged process. */
-		current->pdeath_signal = 0;
+		me->pdeath_signal = 0;
 
 		/*
 		 * For secureexec, reset the stack limit to sane default to
@@ -1413,9 +1414,9 @@ void setup_new_exec(struct linux_binprm * bprm)
 			bprm->rlim_stack.rlim_cur = _STK_LIM;
 	}
 
-	arch_pick_mmap_layout(current->mm, &bprm->rlim_stack);
+	arch_pick_mmap_layout(me->mm, &bprm->rlim_stack);
 
-	current->sas_ss_sp = current->sas_ss_size = 0;
+	me->sas_ss_sp = me->sas_ss_size = 0;
 
 	/*
 	 * Figure out dumpability. Note that this checking only of current
@@ -1431,18 +1432,18 @@ void setup_new_exec(struct linux_binprm * bprm)
 
 	arch_setup_new_exec();
 	perf_event_exec();
-	__set_task_comm(current, kbasename(bprm->filename), true);
+	__set_task_comm(me, kbasename(bprm->filename), true);
 
 	/* Set the new mm task size. We have to do that late because it may
 	 * depend on TIF_32BIT which is only updated in flush_thread() on
 	 * some architectures like powerpc
 	 */
-	current->mm->task_size = TASK_SIZE;
+	me->mm->task_size = TASK_SIZE;
 
 	/* An exec changes our domain. We are no longer part of the thread
 	   group */
-	WRITE_ONCE(current->self_exec_id, current->self_exec_id + 1);
-	flush_signal_handlers(current, 0);
+	WRITE_ONCE(me->self_exec_id, me->self_exec_id + 1);
+	flush_signal_handlers(me, 0);
 
 	/*
 	 * install the new credentials for this executable
@@ -1458,16 +1459,16 @@ void setup_new_exec(struct linux_binprm * bprm)
 	 * wait until new credentials are committed
 	 * by commit_creds() above
 	 */
-	if (get_dumpable(current->mm) != SUID_DUMP_USER)
-		perf_event_exit_task(current);
+	if (get_dumpable(me->mm) != SUID_DUMP_USER)
+		perf_event_exit_task(me);
 	/*
 	 * cred_guard_mutex must be held at least to this point to prevent
 	 * ptrace_attach() from altering our determination of the task's
 	 * credentials; any time after this it may be unlocked.
 	 */
 	security_bprm_committed_creds(bprm);
-	mutex_unlock(&current->signal->exec_update_mutex);
-	mutex_unlock(&current->signal->cred_guard_mutex);
+	mutex_unlock(&me->signal->exec_update_mutex);
+	mutex_unlock(&me->signal->cred_guard_mutex);
 }
 EXPORT_SYMBOL(setup_new_exec);
 
-- 
2.20.1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH 6/7] exec: Move most of setup_new_exec into flush_old_exec
  2020-05-05 19:39 exec: Promised cleanups after introducing exec_update_mutex Eric W. Biederman
                   ` (4 preceding siblings ...)
  2020-05-05 19:44 ` [PATCH 5/7] exec: In setup_new_exec cache current in the local variable me Eric W. Biederman
@ 2020-05-05 19:45 ` Eric W. Biederman
  2020-05-05 21:29   ` Kees Cook
  2020-05-05 19:46 ` [PATCH 7/7] exec: Rename flush_old_exec begin_new_exec Eric W. Biederman
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 122+ messages in thread
From: Eric W. Biederman @ 2020-05-05 19:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Oleg Nesterov, Jann Horn, Kees Cook,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton


The current idiom for the callers is:

flush_old_exec(bprm);
set_personality(...);
setup_new_exec(bprm);

In 2010 Linus split flush_old_exec into flush_old_exec and
setup_new_exec.  With the intention that setup_new_exec be what is
called after the processes new personality is set.

Move the code that doesn't depend upon the personality from
setup_new_exec into flush_old_exec.  This is to facilitate future
changes by having as much code together in one function as possible.

Ref: 221af7f87b97 ("Split 'flush_old_exec' into two functions")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/exec.c | 85 ++++++++++++++++++++++++++++---------------------------
 1 file changed, 44 insertions(+), 41 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 8c3abafb9bb1..0eff20558735 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1359,39 +1359,7 @@ int flush_old_exec(struct linux_binprm * bprm)
 	 * undergoing exec(2).
 	 */
 	do_close_on_exec(me->files);
-	return 0;
-
-out_unlock:
-	mutex_unlock(&me->signal->exec_update_mutex);
-out:
-	return retval;
-}
-EXPORT_SYMBOL(flush_old_exec);
-
-void would_dump(struct linux_binprm *bprm, struct file *file)
-{
-	struct inode *inode = file_inode(file);
-	if (inode_permission(inode, MAY_READ) < 0) {
-		struct user_namespace *old, *user_ns;
-		bprm->interp_flags |= BINPRM_FLAGS_ENFORCE_NONDUMP;
-
-		/* Ensure mm->user_ns contains the executable */
-		user_ns = old = bprm->mm->user_ns;
-		while ((user_ns != &init_user_ns) &&
-		       !privileged_wrt_inode_uidgid(user_ns, inode))
-			user_ns = user_ns->parent;
 
-		if (old != user_ns) {
-			bprm->mm->user_ns = get_user_ns(user_ns);
-			put_user_ns(old);
-		}
-	}
-}
-EXPORT_SYMBOL(would_dump);
-
-void setup_new_exec(struct linux_binprm * bprm)
-{
-	struct task_struct *me = current;
 	/*
 	 * Once here, prepare_binrpm() will not be called any more, so
 	 * the final state of setuid/setgid/fscaps can be merged into the
@@ -1414,8 +1382,6 @@ void setup_new_exec(struct linux_binprm * bprm)
 			bprm->rlim_stack.rlim_cur = _STK_LIM;
 	}
 
-	arch_pick_mmap_layout(me->mm, &bprm->rlim_stack);
-
 	me->sas_ss_sp = me->sas_ss_size = 0;
 
 	/*
@@ -1430,16 +1396,9 @@ void setup_new_exec(struct linux_binprm * bprm)
 	else
 		set_dumpable(current->mm, SUID_DUMP_USER);
 
-	arch_setup_new_exec();
 	perf_event_exec();
 	__set_task_comm(me, kbasename(bprm->filename), true);
 
-	/* Set the new mm task size. We have to do that late because it may
-	 * depend on TIF_32BIT which is only updated in flush_thread() on
-	 * some architectures like powerpc
-	 */
-	me->mm->task_size = TASK_SIZE;
-
 	/* An exec changes our domain. We are no longer part of the thread
 	   group */
 	WRITE_ONCE(me->self_exec_id, me->self_exec_id + 1);
@@ -1467,6 +1426,50 @@ void setup_new_exec(struct linux_binprm * bprm)
 	 * credentials; any time after this it may be unlocked.
 	 */
 	security_bprm_committed_creds(bprm);
+	return 0;
+
+out_unlock:
+	mutex_unlock(&me->signal->exec_update_mutex);
+out:
+	return retval;
+}
+EXPORT_SYMBOL(flush_old_exec);
+
+void would_dump(struct linux_binprm *bprm, struct file *file)
+{
+	struct inode *inode = file_inode(file);
+	if (inode_permission(inode, MAY_READ) < 0) {
+		struct user_namespace *old, *user_ns;
+		bprm->interp_flags |= BINPRM_FLAGS_ENFORCE_NONDUMP;
+
+		/* Ensure mm->user_ns contains the executable */
+		user_ns = old = bprm->mm->user_ns;
+		while ((user_ns != &init_user_ns) &&
+		       !privileged_wrt_inode_uidgid(user_ns, inode))
+			user_ns = user_ns->parent;
+
+		if (old != user_ns) {
+			bprm->mm->user_ns = get_user_ns(user_ns);
+			put_user_ns(old);
+		}
+	}
+}
+EXPORT_SYMBOL(would_dump);
+
+void setup_new_exec(struct linux_binprm * bprm)
+{
+	/* Setup things that can depend upon the personality */
+	struct task_struct *me = current;
+
+	arch_pick_mmap_layout(me->mm, &bprm->rlim_stack);
+
+	arch_setup_new_exec();
+
+	/* Set the new mm task size. We have to do that late because it may
+	 * depend on TIF_32BIT which is only updated in flush_thread() on
+	 * some architectures like powerpc
+	 */
+	me->mm->task_size = TASK_SIZE;
 	mutex_unlock(&me->signal->exec_update_mutex);
 	mutex_unlock(&me->signal->cred_guard_mutex);
 }
-- 
2.20.1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH 7/7] exec: Rename flush_old_exec begin_new_exec
  2020-05-05 19:39 exec: Promised cleanups after introducing exec_update_mutex Eric W. Biederman
                   ` (5 preceding siblings ...)
  2020-05-05 19:45 ` [PATCH 6/7] exec: Move most of setup_new_exec into flush_old_exec Eric W. Biederman
@ 2020-05-05 19:46 ` Eric W. Biederman
  2020-05-05 21:30   ` Kees Cook
  2020-05-06 12:41 ` exec: Promised cleanups after introducing exec_update_mutex Greg Ungerer
  2020-05-08 18:43 ` [PATCH 0/6] exec: Trivial cleanups for exec Eric W. Biederman
  8 siblings, 1 reply; 122+ messages in thread
From: Eric W. Biederman @ 2020-05-05 19:46 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Oleg Nesterov, Jann Horn, Kees Cook,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton


There is and has been for a very long time been a lot more going on in
flush_old_exec than just flushing the old state.  After the movement
of code from setup_new_exec there is a whole lot more going on than
just flushing the old executables state.

Rename flush_old_exec to begin_new_exec to more accurately reflect
what this function does.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 Documentation/trace/ftrace.rst | 2 +-
 arch/x86/ia32/ia32_aout.c      | 2 +-
 fs/binfmt_aout.c               | 2 +-
 fs/binfmt_elf.c                | 2 +-
 fs/binfmt_elf_fdpic.c          | 2 +-
 fs/binfmt_flat.c               | 2 +-
 fs/exec.c                      | 4 ++--
 include/linux/binfmts.h        | 2 +-
 8 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/Documentation/trace/ftrace.rst b/Documentation/trace/ftrace.rst
index 3b5614b1d1a5..430a16283103 100644
--- a/Documentation/trace/ftrace.rst
+++ b/Documentation/trace/ftrace.rst
@@ -1524,7 +1524,7 @@ display-graph option::
    => remove_vma
    => exit_mmap
    => mmput
-   => flush_old_exec
+   => begin_new_exec
    => load_elf_binary
    => search_binary_handler
    => __do_execve_file.isra.32
diff --git a/arch/x86/ia32/ia32_aout.c b/arch/x86/ia32/ia32_aout.c
index 8255fdc3a027..385d3d172ee1 100644
--- a/arch/x86/ia32/ia32_aout.c
+++ b/arch/x86/ia32/ia32_aout.c
@@ -131,7 +131,7 @@ static int load_aout_binary(struct linux_binprm *bprm)
 		return -ENOMEM;
 
 	/* Flush all traces of the currently running executable */
-	retval = flush_old_exec(bprm);
+	retval = begin_new_exec(bprm);
 	if (retval)
 		return retval;
 
diff --git a/fs/binfmt_aout.c b/fs/binfmt_aout.c
index c8ba28f285e5..3e84e9bb9084 100644
--- a/fs/binfmt_aout.c
+++ b/fs/binfmt_aout.c
@@ -151,7 +151,7 @@ static int load_aout_binary(struct linux_binprm * bprm)
 		return -ENOMEM;
 
 	/* Flush all traces of the currently running executable */
-	retval = flush_old_exec(bprm);
+	retval = begin_new_exec(bprm);
 	if (retval)
 		return retval;
 
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index e6b586623035..396d5c2e6b5e 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -844,7 +844,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
 		goto out_free_dentry;
 
 	/* Flush all traces of the currently running executable */
-	retval = flush_old_exec(bprm);
+	retval = begin_new_exec(bprm);
 	if (retval)
 		goto out_free_dentry;
 
diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c
index 9a1aa61b4cc3..896e3ca9bf85 100644
--- a/fs/binfmt_elf_fdpic.c
+++ b/fs/binfmt_elf_fdpic.c
@@ -338,7 +338,7 @@ static int load_elf_fdpic_binary(struct linux_binprm *bprm)
 		interp_params.flags |= ELF_FDPIC_FLAG_CONSTDISP;
 
 	/* flush all traces of the currently running executable */
-	retval = flush_old_exec(bprm);
+	retval = begin_new_exec(bprm);
 	if (retval)
 		goto error;
 
diff --git a/fs/binfmt_flat.c b/fs/binfmt_flat.c
index 252878969582..9b82bc111d0a 100644
--- a/fs/binfmt_flat.c
+++ b/fs/binfmt_flat.c
@@ -534,7 +534,7 @@ static int load_flat_file(struct linux_binprm *bprm,
 
 	/* Flush all traces of the currently running executable */
 	if (id == 0) {
-		ret = flush_old_exec(bprm);
+		ret = begin_new_exec(bprm);
 		if (ret)
 			goto err;
 
diff --git a/fs/exec.c b/fs/exec.c
index 0eff20558735..3cc40048cc65 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1298,7 +1298,7 @@ void __set_task_comm(struct task_struct *tsk, const char *buf, bool exec)
  * signal (via de_thread() or coredump), or will have SEGV raised
  * (after exec_mmap()) by search_binary_handlers (see below).
  */
-int flush_old_exec(struct linux_binprm * bprm)
+int begin_new_exec(struct linux_binprm * bprm)
 {
 	struct task_struct *me = current;
 	int retval;
@@ -1433,7 +1433,7 @@ int flush_old_exec(struct linux_binprm * bprm)
 out:
 	return retval;
 }
-EXPORT_SYMBOL(flush_old_exec);
+EXPORT_SYMBOL(begin_new_exec);
 
 void would_dump(struct linux_binprm *bprm, struct file *file)
 {
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index 2a8fddf3574a..1b48e2154766 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -125,7 +125,7 @@ extern void unregister_binfmt(struct linux_binfmt *);
 extern int prepare_binprm(struct linux_binprm *);
 extern int __must_check remove_arg_zero(struct linux_binprm *);
 extern int search_binary_handler(struct linux_binprm *);
-extern int flush_old_exec(struct linux_binprm * bprm);
+extern int begin_new_exec(struct linux_binprm * bprm);
 extern void setup_new_exec(struct linux_binprm * bprm);
 extern void finalize_exec(struct linux_binprm *bprm);
 extern void would_dump(struct linux_binprm *, struct file *);
-- 
2.20.1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH 1/7] binfmt: Move install_exec_creds after setup_new_exec to match binfmt_elf
  2020-05-05 19:41 ` [PATCH 1/7] binfmt: Move install_exec_creds after setup_new_exec to match binfmt_elf Eric W. Biederman
@ 2020-05-05 20:45   ` Kees Cook
  2020-05-06 12:42   ` Greg Ungerer
  1 sibling, 0 replies; 122+ messages in thread
From: Kees Cook @ 2020-05-05 20:45 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, Jann Horn,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton

On Tue, May 05, 2020 at 02:41:01PM -0500, Eric W. Biederman wrote:
> 
> In 2016 Linus moved install_exec_creds immediately after
> setup_new_exec, in binfmt_elf as a cleanup and as part of closing a
> potential information leak.
> 
> Perform the same cleanup for the other binary formats.
> 
> Different binary formats doing the same things the same way makes exec
> easier to reason about and easier to maintain.
> 
> The binfmt_flagt bits were tested by Greg Ungerer <gerg@linux-m68k.org>
> 
> Ref: 9f834ec18def ("binfmt_elf: switch to new creds when switching to new mm")
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH 2/7] exec: Make unlocking exec_update_mutex explict
  2020-05-05 19:41 ` [PATCH 2/7] exec: Make unlocking exec_update_mutex explict Eric W. Biederman
@ 2020-05-05 20:46   ` Kees Cook
  0 siblings, 0 replies; 122+ messages in thread
From: Kees Cook @ 2020-05-05 20:46 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, Jann Horn,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton

On Tue, May 05, 2020 at 02:41:47PM -0500, Eric W. Biederman wrote:
> 
> With install_exec_creds updated to follow immediately after
> setup_new_exec, the failure of unshare_sighand is the only
> code path where exec_update_mutex is held but not explicitly
> unlocked.
> 
> Update that code path to explicitly unlock exec_update_mutex.
> 
> Remove the unlocking of exec_update_mutex from free_bprm.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH 3/7] exec: Rename the flag called_exec_mmap point_of_no_return
  2020-05-05 19:42 ` [PATCH 3/7] exec: Rename the flag called_exec_mmap point_of_no_return Eric W. Biederman
@ 2020-05-05 20:49   ` Kees Cook
  0 siblings, 0 replies; 122+ messages in thread
From: Kees Cook @ 2020-05-05 20:49 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, Jann Horn,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton

On Tue, May 05, 2020 at 02:42:26PM -0500, Eric W. Biederman wrote:
> 
> Update the comments and make the code easier to understand by
> renaming this flag.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH 4/7] exec: Merge install_exec_creds into setup_new_exec
  2020-05-05 19:43 ` [PATCH 4/7] exec: Merge install_exec_creds into setup_new_exec Eric W. Biederman
@ 2020-05-05 20:50   ` Kees Cook
  0 siblings, 0 replies; 122+ messages in thread
From: Kees Cook @ 2020-05-05 20:50 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, Jann Horn,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton

On Tue, May 05, 2020 at 02:43:25PM -0500, Eric W. Biederman wrote:
> 
> The two functions are now always called one right after the
> other so merge them together to make future maintenance easier.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH 5/7] exec: In setup_new_exec cache current in the local variable me
  2020-05-05 19:44 ` [PATCH 5/7] exec: In setup_new_exec cache current in the local variable me Eric W. Biederman
@ 2020-05-05 20:51   ` Kees Cook
  0 siblings, 0 replies; 122+ messages in thread
From: Kees Cook @ 2020-05-05 20:51 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, Jann Horn,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton

On Tue, May 05, 2020 at 02:44:28PM -0500, Eric W. Biederman wrote:
> 
> At least gcc 8.3 when generating code for x86_64 has a hard time
> consolidating multiple calls to current aka get_current(), and winds
> up unnecessarily rereading %gs:current_task several times in
> setup_new_exec.
> 
> Caching the value of current in the local variable of me generates
> slightly better and shorter assembly.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH 6/7] exec: Move most of setup_new_exec into flush_old_exec
  2020-05-05 19:45 ` [PATCH 6/7] exec: Move most of setup_new_exec into flush_old_exec Eric W. Biederman
@ 2020-05-05 21:29   ` Kees Cook
  2020-05-06 14:57     ` Eric W. Biederman
  2020-05-07 21:51     ` Eric W. Biederman
  0 siblings, 2 replies; 122+ messages in thread
From: Kees Cook @ 2020-05-05 21:29 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, Jann Horn,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton

On Tue, May 05, 2020 at 02:45:33PM -0500, Eric W. Biederman wrote:
> 
> The current idiom for the callers is:
> 
> flush_old_exec(bprm);
> set_personality(...);
> setup_new_exec(bprm);
> 
> In 2010 Linus split flush_old_exec into flush_old_exec and
> setup_new_exec.  With the intention that setup_new_exec be what is
> called after the processes new personality is set.
> 
> Move the code that doesn't depend upon the personality from
> setup_new_exec into flush_old_exec.  This is to facilitate future
> changes by having as much code together in one function as possible.

Er, I *think* this is okay, but I have some questions below which
maybe you already investigated (and should perhaps get called out in
the changelog).

> 
> Ref: 221af7f87b97 ("Split 'flush_old_exec' into two functions")
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  fs/exec.c | 85 ++++++++++++++++++++++++++++---------------------------
>  1 file changed, 44 insertions(+), 41 deletions(-)
> 
> diff --git a/fs/exec.c b/fs/exec.c
> index 8c3abafb9bb1..0eff20558735 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1359,39 +1359,7 @@ int flush_old_exec(struct linux_binprm * bprm)
>  	 * undergoing exec(2).
>  	 */
>  	do_close_on_exec(me->files);
> -	return 0;
> -
> -out_unlock:
> -	mutex_unlock(&me->signal->exec_update_mutex);
> -out:
> -	return retval;
> -}
> -EXPORT_SYMBOL(flush_old_exec);
> -
> -void would_dump(struct linux_binprm *bprm, struct file *file)
> -{
> -	struct inode *inode = file_inode(file);
> -	if (inode_permission(inode, MAY_READ) < 0) {
> -		struct user_namespace *old, *user_ns;
> -		bprm->interp_flags |= BINPRM_FLAGS_ENFORCE_NONDUMP;
> -
> -		/* Ensure mm->user_ns contains the executable */
> -		user_ns = old = bprm->mm->user_ns;
> -		while ((user_ns != &init_user_ns) &&
> -		       !privileged_wrt_inode_uidgid(user_ns, inode))
> -			user_ns = user_ns->parent;
>  
> -		if (old != user_ns) {
> -			bprm->mm->user_ns = get_user_ns(user_ns);
> -			put_user_ns(old);
> -		}
> -	}
> -}
> -EXPORT_SYMBOL(would_dump);
> -
> -void setup_new_exec(struct linux_binprm * bprm)
> -{
> -	struct task_struct *me = current;
>  	/*
>  	 * Once here, prepare_binrpm() will not be called any more, so
>  	 * the final state of setuid/setgid/fscaps can be merged into the
> @@ -1414,8 +1382,6 @@ void setup_new_exec(struct linux_binprm * bprm)
>  			bprm->rlim_stack.rlim_cur = _STK_LIM;
>  	}
>  
> -	arch_pick_mmap_layout(me->mm, &bprm->rlim_stack);
> -
>  	me->sas_ss_sp = me->sas_ss_size = 0;
>  
>  	/*
> @@ -1430,16 +1396,9 @@ void setup_new_exec(struct linux_binprm * bprm)
>  	else
>  		set_dumpable(current->mm, SUID_DUMP_USER);
>  
> -	arch_setup_new_exec();
>  	perf_event_exec();

What is perf expecting to be able to examine at this point? Does it want
a view of things after arch_setup_new_exec()? (i.e. "final" TIF flags,
mmap layout, etc.) From what I can, the answer is "no, it's just
resetting counters", so I think this is fine. Maybe double-check with
Steve?

>  	__set_task_comm(me, kbasename(bprm->filename), true);
>  
> -	/* Set the new mm task size. We have to do that late because it may
> -	 * depend on TIF_32BIT which is only updated in flush_thread() on
> -	 * some architectures like powerpc
> -	 */
> -	me->mm->task_size = TASK_SIZE;
> -
>  	/* An exec changes our domain. We are no longer part of the thread
>  	   group */
>  	WRITE_ONCE(me->self_exec_id, me->self_exec_id + 1);
> @@ -1467,6 +1426,50 @@ void setup_new_exec(struct linux_binprm * bprm)
>  	 * credentials; any time after this it may be unlocked.
>  	 */
>  	security_bprm_committed_creds(bprm);

Similarly for the LSM hook: is it expecting a post-arch-setup view? I
don't see anything looking at task_size, TIF flags, or anything else;
they seem to be just cleaning up from the old process being replaced, so
against, I think this is okay.

Not visible in this patch, the following things how happen earlier,
which I feel should maybe get called out in the changelog, with,
perhaps, better justification than what I've got here:

bprm->secureexec set/check (looks safe, since it depends on
prepare_binprm()'s security_bprm_set_creds().

rlim_stack.rlim_cur setting (safe, just needs to happen before
arch_pick_mmap_layout())

dumpable() check (looks safe, BINPRM_FLAGS_ENFORCE_NONDUMP depends on
much earlier would_dump(), and uid/gid depend on earlier calls to
prepare_binprm()'s bprm_fill_uid())

__set_task_comm (looks safe, just dealing with the task name...)

self_exec_id bump (looks safe, but I think -- it's still after uid
setting)

flush_signal_handlers() (looks safe -- nothing appears to depend on mm
nor personality)

> +	return 0;
> +
> +out_unlock:
> +	mutex_unlock(&me->signal->exec_update_mutex);
> +out:
> +	return retval;
> +}
> +EXPORT_SYMBOL(flush_old_exec);
> +
> +void would_dump(struct linux_binprm *bprm, struct file *file)
> +{
> +	struct inode *inode = file_inode(file);
> +	if (inode_permission(inode, MAY_READ) < 0) {
> +		struct user_namespace *old, *user_ns;
> +		bprm->interp_flags |= BINPRM_FLAGS_ENFORCE_NONDUMP;
> +
> +		/* Ensure mm->user_ns contains the executable */
> +		user_ns = old = bprm->mm->user_ns;
> +		while ((user_ns != &init_user_ns) &&
> +		       !privileged_wrt_inode_uidgid(user_ns, inode))
> +			user_ns = user_ns->parent;
> +
> +		if (old != user_ns) {
> +			bprm->mm->user_ns = get_user_ns(user_ns);
> +			put_user_ns(old);
> +		}
> +	}
> +}
> +EXPORT_SYMBOL(would_dump);

The diff helpfully decided this moved would_dump(). ;) Is it worth
maybe just moviing it explicitly above flush_old_exec() to avoid this
churn? I dunno.

> +
> +void setup_new_exec(struct linux_binprm * bprm)
> +{
> +	/* Setup things that can depend upon the personality */

Should this comment be above the function instead?

> +	struct task_struct *me = current;
> +
> +	arch_pick_mmap_layout(me->mm, &bprm->rlim_stack);
> +
> +	arch_setup_new_exec();
> +
> +	/* Set the new mm task size. We have to do that late because it may
> +	 * depend on TIF_32BIT which is only updated in flush_thread() on
> +	 * some architectures like powerpc
> +	 */
> +	me->mm->task_size = TASK_SIZE;
>  	mutex_unlock(&me->signal->exec_update_mutex);
>  	mutex_unlock(&me->signal->cred_guard_mutex);
>  }
> -- 
> 2.20.1
> 

So, as I say, I *think* this is okay, but I always get suspicious about
reordering things in execve(). ;)

So, with a bit larger changelog discussing what's moving "earlier",
I think this looks good:

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH 7/7] exec: Rename flush_old_exec begin_new_exec
  2020-05-05 19:46 ` [PATCH 7/7] exec: Rename flush_old_exec begin_new_exec Eric W. Biederman
@ 2020-05-05 21:30   ` Kees Cook
  0 siblings, 0 replies; 122+ messages in thread
From: Kees Cook @ 2020-05-05 21:30 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, Jann Horn,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton

On Tue, May 05, 2020 at 02:46:01PM -0500, Eric W. Biederman wrote:
> 
> There is and has been for a very long time been a lot more going on in
> flush_old_exec than just flushing the old state.  After the movement
> of code from setup_new_exec there is a whole lot more going on than
> just flushing the old executables state.
> 
> Rename flush_old_exec to begin_new_exec to more accurately reflect
> what this function does.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: exec: Promised cleanups after introducing exec_update_mutex
  2020-05-05 19:39 exec: Promised cleanups after introducing exec_update_mutex Eric W. Biederman
                   ` (6 preceding siblings ...)
  2020-05-05 19:46 ` [PATCH 7/7] exec: Rename flush_old_exec begin_new_exec Eric W. Biederman
@ 2020-05-06 12:41 ` Greg Ungerer
  2020-05-08 18:43 ` [PATCH 0/6] exec: Trivial cleanups for exec Eric W. Biederman
  8 siblings, 0 replies; 122+ messages in thread
From: Greg Ungerer @ 2020-05-06 12:41 UTC (permalink / raw)
  To: Eric W. Biederman, linux-kernel
  Cc: Linus Torvalds, Oleg Nesterov, Jann Horn, Kees Cook, Rob Landley,
	Bernd Edlinger, linux-fsdevel, Al Viro, Alexey Dobriyan,
	Andrew Morton

Hi Eric,

On 6/5/20 5:39 am, Eric W. Biederman wrote:
> In the patchset that introduced exec_update_mutex there were a few last
> minute discoveries and fixes that left the code in a state that can
> be very easily be improved.
> 
> During the merge window we discussed the first three of these patches
> and I promised I would resend them.
> 
> What the first patch does is it makes the the calls in the binfmts:
> 	flush_old_exec();
>          /* set the personality */
>          setup_new_exec();
>          install_exec_creds();
> 
> With no sleeps or anything in between.
> 
> At the conclusion of this set of changes the the calls in the binfmts
> are:
> 	begin_new_exec();
>          /* set the personality */
>          setup_new_exec();
> 
> The intent is to make the code easier to follow and easier to change.
> 
> Eric W. Biederman (7):
>        binfmt: Move install_exec_creds after setup_new_exec to match binfmt_elf
>        exec: Make unlocking exec_update_mutex explict
>        exec: Rename the flag called_exec_mmap point_of_no_return
>        exec: Merge install_exec_creds into setup_new_exec
>        exec: In setup_new_exec cache current in the local variable me
>        exec: Move most of setup_new_exec into flush_old_exec
>        exec: Rename flush_old_exec begin_new_exec
> 
>   Documentation/trace/ftrace.rst |   2 +-
>   arch/x86/ia32/ia32_aout.c      |   4 +-
>   fs/binfmt_aout.c               |   3 +-
>   fs/binfmt_elf.c                |   3 +-
>   fs/binfmt_elf_fdpic.c          |   3 +-
>   fs/binfmt_flat.c               |   4 +-
>   fs/exec.c                      | 162 ++++++++++++++++++++---------------------
>   include/linux/binfmts.h        |  10 +--
>   kernel/events/core.c           |   2 +-
>   9 files changed, 92 insertions(+), 101 deletions(-)

I tested the the whole series on non-MMU m68k and non-MMU arm
(exercising binfmt_flat) and it all tested out with no problems,
so for the binfmt_flat changes:

Tested-by: Greg Ungerer <gerg@linux-m68k.org>

I reviewed the whole series too, and looks good to me:

Reviewed-by: Greg Ungerer <gerg@linux-m68k.org>

Regards
Greg


> ---
> 
> These changes are against v5.7-rc3.
> 
> My intention once everything passes code reveiw is to place these
> changes in a topic branch in my tree and then into linux-next, and
> eventually to send Linus a pull when the next merge window opens.
> Unless someone has a better idea.
> 
> I am a little concerned that I might conflict with the ongoing coredump
> cleanups.
> 
> I have several follow up sets of changes with additional cleanups as
> well but I am trying to keep everything small enough that the code can
> be reviewed.
> 
> After enough cleanups I hope to reopen the conversation of dealing with
> the livelock situation with cred_guard_mutex.  As I think figuring out
> what to do becomes much easier once several of my planned
> cleanups/improvements have been made.
> 
> But ultimately I just want to get exec to the point where when
> we have disucssions on how to make exec better the code is in good
> enough shape we can actually address the issues we see.
> 
> Eric
> 

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH 1/7] binfmt: Move install_exec_creds after setup_new_exec to match binfmt_elf
  2020-05-05 19:41 ` [PATCH 1/7] binfmt: Move install_exec_creds after setup_new_exec to match binfmt_elf Eric W. Biederman
  2020-05-05 20:45   ` Kees Cook
@ 2020-05-06 12:42   ` Greg Ungerer
  2020-05-06 12:56     ` Eric W. Biederman
  1 sibling, 1 reply; 122+ messages in thread
From: Greg Ungerer @ 2020-05-06 12:42 UTC (permalink / raw)
  To: Eric W. Biederman, linux-kernel
  Cc: Linus Torvalds, Oleg Nesterov, Jann Horn, Kees Cook, Rob Landley,
	Bernd Edlinger, linux-fsdevel, Al Viro, Alexey Dobriyan,
	Andrew Morton

One small nit:

On 6/5/20 5:41 am, Eric W. Biederman wrote:
> In 2016 Linus moved install_exec_creds immediately after
> setup_new_exec, in binfmt_elf as a cleanup and as part of closing a
> potential information leak.
> 
> Perform the same cleanup for the other binary formats.
> 
> Different binary formats doing the same things the same way makes exec
> easier to reason about and easier to maintain.
> 
> The binfmt_flagt bits were tested by Greg Ungerer <gerg@linux-m68k.org>
              ^^^^^
              flat

Regards
Greg


> Ref: 9f834ec18def ("binfmt_elf: switch to new creds when switching to new mm")
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>   arch/x86/ia32/ia32_aout.c | 3 +--
>   fs/binfmt_aout.c          | 2 +-
>   fs/binfmt_elf_fdpic.c     | 2 +-
>   fs/binfmt_flat.c          | 3 +--
>   4 files changed, 4 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/ia32/ia32_aout.c b/arch/x86/ia32/ia32_aout.c
> index 9bb71abd66bd..37b36a8ce5fa 100644
> --- a/arch/x86/ia32/ia32_aout.c
> +++ b/arch/x86/ia32/ia32_aout.c
> @@ -140,6 +140,7 @@ static int load_aout_binary(struct linux_binprm *bprm)
>   	set_personality_ia32(false);
>   
>   	setup_new_exec(bprm);
> +	install_exec_creds(bprm);
>   
>   	regs->cs = __USER32_CS;
>   	regs->r8 = regs->r9 = regs->r10 = regs->r11 = regs->r12 =
> @@ -156,8 +157,6 @@ static int load_aout_binary(struct linux_binprm *bprm)
>   	if (retval < 0)
>   		return retval;
>   
> -	install_exec_creds(bprm);
> -
>   	if (N_MAGIC(ex) == OMAGIC) {
>   		unsigned long text_addr, map_size;
>   
> diff --git a/fs/binfmt_aout.c b/fs/binfmt_aout.c
> index 8e8346a81723..ace587b66904 100644
> --- a/fs/binfmt_aout.c
> +++ b/fs/binfmt_aout.c
> @@ -162,6 +162,7 @@ static int load_aout_binary(struct linux_binprm * bprm)
>   	set_personality(PER_LINUX);
>   #endif
>   	setup_new_exec(bprm);
> +	install_exec_creds(bprm);
>   
>   	current->mm->end_code = ex.a_text +
>   		(current->mm->start_code = N_TXTADDR(ex));
> @@ -174,7 +175,6 @@ static int load_aout_binary(struct linux_binprm * bprm)
>   	if (retval < 0)
>   		return retval;
>   
> -	install_exec_creds(bprm);
>   
>   	if (N_MAGIC(ex) == OMAGIC) {
>   		unsigned long text_addr, map_size;
> diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c
> index 240f66663543..6c94c6d53d97 100644
> --- a/fs/binfmt_elf_fdpic.c
> +++ b/fs/binfmt_elf_fdpic.c
> @@ -353,6 +353,7 @@ static int load_elf_fdpic_binary(struct linux_binprm *bprm)
>   		current->personality |= READ_IMPLIES_EXEC;
>   
>   	setup_new_exec(bprm);
> +	install_exec_creds(bprm);
>   
>   	set_binfmt(&elf_fdpic_format);
>   
> @@ -434,7 +435,6 @@ static int load_elf_fdpic_binary(struct linux_binprm *bprm)
>   	current->mm->start_stack = current->mm->start_brk + stack_size;
>   #endif
>   
> -	install_exec_creds(bprm);
>   	if (create_elf_fdpic_tables(bprm, current->mm,
>   				    &exec_params, &interp_params) < 0)
>   		goto error;
> diff --git a/fs/binfmt_flat.c b/fs/binfmt_flat.c
> index 831a2b25ba79..1a1d1fcb893f 100644
> --- a/fs/binfmt_flat.c
> +++ b/fs/binfmt_flat.c
> @@ -541,6 +541,7 @@ static int load_flat_file(struct linux_binprm *bprm,
>   		/* OK, This is the point of no return */
>   		set_personality(PER_LINUX_32BIT);
>   		setup_new_exec(bprm);
> +		install_exec_creds(bprm);
>   	}
>   
>   	/*
> @@ -963,8 +964,6 @@ static int load_flat_binary(struct linux_binprm *bprm)
>   		}
>   	}
>   
> -	install_exec_creds(bprm);
> -
>   	set_binfmt(&flat_format);
>   
>   #ifdef CONFIG_MMU
> 

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH 1/7] binfmt: Move install_exec_creds after setup_new_exec to match binfmt_elf
  2020-05-06 12:42   ` Greg Ungerer
@ 2020-05-06 12:56     ` Eric W. Biederman
  0 siblings, 0 replies; 122+ messages in thread
From: Eric W. Biederman @ 2020-05-06 12:56 UTC (permalink / raw)
  To: Greg Ungerer
  Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, Jann Horn,
	Kees Cook, Rob Landley, Bernd Edlinger, linux-fsdevel, Al Viro,
	Alexey Dobriyan, Andrew Morton

Greg Ungerer <gerg@linux-m68k.org> writes:

> One small nit:

Good point.

> On 6/5/20 5:41 am, Eric W. Biederman wrote:
>> In 2016 Linus moved install_exec_creds immediately after
>> setup_new_exec, in binfmt_elf as a cleanup and as part of closing a
>> potential information leak.
>>
>> Perform the same cleanup for the other binary formats.
>>
>> Different binary formats doing the same things the same way makes exec
>> easier to reason about and easier to maintain.
>>
>> The binfmt_flagt bits were tested by Greg Ungerer <gerg@linux-m68k.org>
>              ^^^^^
>              flat
>
> Regards
> Greg
>
>
>> Ref: 9f834ec18def ("binfmt_elf: switch to new creds when switching to new mm")
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>> ---
>>   arch/x86/ia32/ia32_aout.c | 3 +--
>>   fs/binfmt_aout.c          | 2 +-
>>   fs/binfmt_elf_fdpic.c     | 2 +-
>>   fs/binfmt_flat.c          | 3 +--
>>   4 files changed, 4 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/x86/ia32/ia32_aout.c b/arch/x86/ia32/ia32_aout.c
>> index 9bb71abd66bd..37b36a8ce5fa 100644
>> --- a/arch/x86/ia32/ia32_aout.c
>> +++ b/arch/x86/ia32/ia32_aout.c
>> @@ -140,6 +140,7 @@ static int load_aout_binary(struct linux_binprm *bprm)
>>   	set_personality_ia32(false);
>>     	setup_new_exec(bprm);
>> +	install_exec_creds(bprm);
>>     	regs->cs = __USER32_CS;
>>   	regs->r8 = regs->r9 = regs->r10 = regs->r11 = regs->r12 =
>> @@ -156,8 +157,6 @@ static int load_aout_binary(struct linux_binprm *bprm)
>>   	if (retval < 0)
>>   		return retval;
>>   -	install_exec_creds(bprm);
>> -
>>   	if (N_MAGIC(ex) == OMAGIC) {
>>   		unsigned long text_addr, map_size;
>>   diff --git a/fs/binfmt_aout.c b/fs/binfmt_aout.c
>> index 8e8346a81723..ace587b66904 100644
>> --- a/fs/binfmt_aout.c
>> +++ b/fs/binfmt_aout.c
>> @@ -162,6 +162,7 @@ static int load_aout_binary(struct linux_binprm * bprm)
>>   	set_personality(PER_LINUX);
>>   #endif
>>   	setup_new_exec(bprm);
>> +	install_exec_creds(bprm);
>>     	current->mm->end_code = ex.a_text +
>>   		(current->mm->start_code = N_TXTADDR(ex));
>> @@ -174,7 +175,6 @@ static int load_aout_binary(struct linux_binprm * bprm)
>>   	if (retval < 0)
>>   		return retval;
>>   -	install_exec_creds(bprm);
>>     	if (N_MAGIC(ex) == OMAGIC) {
>>   		unsigned long text_addr, map_size;
>> diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c
>> index 240f66663543..6c94c6d53d97 100644
>> --- a/fs/binfmt_elf_fdpic.c
>> +++ b/fs/binfmt_elf_fdpic.c
>> @@ -353,6 +353,7 @@ static int load_elf_fdpic_binary(struct linux_binprm *bprm)
>>   		current->personality |= READ_IMPLIES_EXEC;
>>     	setup_new_exec(bprm);
>> +	install_exec_creds(bprm);
>>     	set_binfmt(&elf_fdpic_format);
>>   @@ -434,7 +435,6 @@ static int load_elf_fdpic_binary(struct linux_binprm
>> *bprm)
>>   	current->mm->start_stack = current->mm->start_brk + stack_size;
>>   #endif
>>   -	install_exec_creds(bprm);
>>   	if (create_elf_fdpic_tables(bprm, current->mm,
>>   				    &exec_params, &interp_params) < 0)
>>   		goto error;
>> diff --git a/fs/binfmt_flat.c b/fs/binfmt_flat.c
>> index 831a2b25ba79..1a1d1fcb893f 100644
>> --- a/fs/binfmt_flat.c
>> +++ b/fs/binfmt_flat.c
>> @@ -541,6 +541,7 @@ static int load_flat_file(struct linux_binprm *bprm,
>>   		/* OK, This is the point of no return */
>>   		set_personality(PER_LINUX_32BIT);
>>   		setup_new_exec(bprm);
>> +		install_exec_creds(bprm);
>>   	}
>>     	/*
>> @@ -963,8 +964,6 @@ static int load_flat_binary(struct linux_binprm *bprm)
>>   		}
>>   	}
>>   -	install_exec_creds(bprm);
>> -
>>   	set_binfmt(&flat_format);
>>     #ifdef CONFIG_MMU
>>

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH 6/7] exec: Move most of setup_new_exec into flush_old_exec
  2020-05-05 21:29   ` Kees Cook
@ 2020-05-06 14:57     ` Eric W. Biederman
  2020-05-06 15:30       ` Kees Cook
  2020-05-07 21:51     ` Eric W. Biederman
  1 sibling, 1 reply; 122+ messages in thread
From: Eric W. Biederman @ 2020-05-06 14:57 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, Jann Horn,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton

Kees Cook <keescook@chromium.org> writes:

> On Tue, May 05, 2020 at 02:45:33PM -0500, Eric W. Biederman wrote:
>> 
>> The current idiom for the callers is:
>> 
>> flush_old_exec(bprm);
>> set_personality(...);
>> setup_new_exec(bprm);
>> 
>> In 2010 Linus split flush_old_exec into flush_old_exec and
>> setup_new_exec.  With the intention that setup_new_exec be what is
>> called after the processes new personality is set.
>> 
>> Move the code that doesn't depend upon the personality from
>> setup_new_exec into flush_old_exec.  This is to facilitate future
>> changes by having as much code together in one function as possible.
>
> Er, I *think* this is okay, but I have some questions below which
> maybe you already investigated (and should perhaps get called out in
> the changelog).

I will see if I can expand more on the review that I have done.

I saw this as moving thre lines and the personality setting later in the
code, rather than moving a bunch of lines up

AKA these lines:
>> +	arch_pick_mmap_layout(me->mm, &bprm->rlim_stack);
>> +
>> +	arch_setup_new_exec();
>> +
>> +	/* Set the new mm task size. We have to do that late because it may
>> +	 * depend on TIF_32BIT which is only updated in flush_thread() on
>> +	 * some architectures like powerpc
>> +	 */
>> +	me->mm->task_size = TASK_SIZE;


I verified carefully that only those three lines can depend upon the
personality changes.

Your concern if anything depends on those moved lines I haven't looked
at so closely so I will go back through and do that.  I don't actually
expect anything depends upon those three lines because they should only
be changing architecture specific state.  But that is general handwaving
not actually careful review which tends to turn up suprises in exec.

Speaking of while I was looking through the lsm hooks again I just
realized that 613cc2b6f272 ("fs: exec: apply CLOEXEC before changing
dumpable task flags") only fixed half the problem.  So I am going to
take a quick detour fix that then come back to this.  As that directly
affects this code motion.

Eric

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH 6/7] exec: Move most of setup_new_exec into flush_old_exec
  2020-05-06 14:57     ` Eric W. Biederman
@ 2020-05-06 15:30       ` Kees Cook
  2020-05-07 19:51         ` Eric W. Biederman
  0 siblings, 1 reply; 122+ messages in thread
From: Kees Cook @ 2020-05-06 15:30 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, Jann Horn,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton

On Wed, May 06, 2020 at 09:57:10AM -0500, Eric W. Biederman wrote:
> Kees Cook <keescook@chromium.org> writes:
> 
> > On Tue, May 05, 2020 at 02:45:33PM -0500, Eric W. Biederman wrote:
> >> 
> >> The current idiom for the callers is:
> >> 
> >> flush_old_exec(bprm);
> >> set_personality(...);
> >> setup_new_exec(bprm);
> >> 
> >> In 2010 Linus split flush_old_exec into flush_old_exec and
> >> setup_new_exec.  With the intention that setup_new_exec be what is
> >> called after the processes new personality is set.
> >> 
> >> Move the code that doesn't depend upon the personality from
> >> setup_new_exec into flush_old_exec.  This is to facilitate future
> >> changes by having as much code together in one function as possible.
> >
> > Er, I *think* this is okay, but I have some questions below which
> > maybe you already investigated (and should perhaps get called out in
> > the changelog).
> 
> I will see if I can expand more on the review that I have done.
> 
> I saw this as moving thre lines and the personality setting later in the
> code, rather than moving a bunch of lines up
> 
> AKA these lines:
> >> +	arch_pick_mmap_layout(me->mm, &bprm->rlim_stack);
> >> +
> >> +	arch_setup_new_exec();
> >> +
> >> +	/* Set the new mm task size. We have to do that late because it may
> >> +	 * depend on TIF_32BIT which is only updated in flush_thread() on
> >> +	 * some architectures like powerpc
> >> +	 */
> >> +	me->mm->task_size = TASK_SIZE;
> 
> 
> I verified carefully that only those three lines can depend upon the
> personality changes.
> 
> Your concern if anything depends on those moved lines I haven't looked
> at so closely so I will go back through and do that.  I don't actually
> expect anything depends upon those three lines because they should only
> be changing architecture specific state.  But that is general handwaving
> not actually careful review which tends to turn up suprises in exec.

Right -- I looked through all of it (see my last email) and I think it's
all okay, but I was curious if you'd looked too. :)

> Speaking of while I was looking through the lsm hooks again I just
> realized that 613cc2b6f272 ("fs: exec: apply CLOEXEC before changing
> dumpable task flags") only fixed half the problem.  So I am going to
> take a quick detour fix that then come back to this.  As that directly
> affects this code motion.

Oh yay. :) Thanks for catching it!

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH 6/7] exec: Move most of setup_new_exec into flush_old_exec
  2020-05-06 15:30       ` Kees Cook
@ 2020-05-07 19:51         ` Eric W. Biederman
  0 siblings, 0 replies; 122+ messages in thread
From: Eric W. Biederman @ 2020-05-07 19:51 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, Jann Horn,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton

Kees Cook <keescook@chromium.org> writes:

> On Wed, May 06, 2020 at 09:57:10AM -0500, Eric W. Biederman wrote:
>> Kees Cook <keescook@chromium.org> writes:
>> 
>> > On Tue, May 05, 2020 at 02:45:33PM -0500, Eric W. Biederman wrote:
>> >> 
>> >> The current idiom for the callers is:
>> >> 
>> >> flush_old_exec(bprm);
>> >> set_personality(...);
>> >> setup_new_exec(bprm);
>> >> 
>> >> In 2010 Linus split flush_old_exec into flush_old_exec and
>> >> setup_new_exec.  With the intention that setup_new_exec be what is
>> >> called after the processes new personality is set.
>> >> 
>> >> Move the code that doesn't depend upon the personality from
>> >> setup_new_exec into flush_old_exec.  This is to facilitate future
>> >> changes by having as much code together in one function as possible.
>> >
>> > Er, I *think* this is okay, but I have some questions below which
>> > maybe you already investigated (and should perhaps get called out in
>> > the changelog).
>> 
>> I will see if I can expand more on the review that I have done.
>> 
>> I saw this as moving thre lines and the personality setting later in the
>> code, rather than moving a bunch of lines up
>> 
>> AKA these lines:
>> >> +	arch_pick_mmap_layout(me->mm, &bprm->rlim_stack);
>> >> +
>> >> +	arch_setup_new_exec();
>> >> +
>> >> +	/* Set the new mm task size. We have to do that late because it may
>> >> +	 * depend on TIF_32BIT which is only updated in flush_thread() on
>> >> +	 * some architectures like powerpc
>> >> +	 */
>> >> +	me->mm->task_size = TASK_SIZE;
>> 
>> 
>> I verified carefully that only those three lines can depend upon the
>> personality changes.
>> 
>> Your concern if anything depends on those moved lines I haven't looked
>> at so closely so I will go back through and do that.  I don't actually
>> expect anything depends upon those three lines because they should only
>> be changing architecture specific state.  But that is general handwaving
>> not actually careful review which tends to turn up suprises in exec.
>
> Right -- I looked through all of it (see my last email) and I think it's
> all okay, but I was curious if you'd looked too. :)

I had and I will finish looking in the other direction and see if there
is anything else I can see.

Thank you for asking and keeping me honest.  There are so many moving
parts to this code it is easy to overlook something by accident.

>> Speaking of while I was looking through the lsm hooks again I just
>> realized that 613cc2b6f272 ("fs: exec: apply CLOEXEC before changing
>> dumpable task flags") only fixed half the problem.  So I am going to
>> take a quick detour fix that then come back to this.  As that directly
>> affects this code motion.
>
> Oh yay. :) Thanks for catching it!

Well that fix is going to be a lot more involved than I anticipated.
The more I looked the more bugs I find so I will revisit fixing that
after I complete this set of changes.  I thought it was going to be a
trivial localized fix, and unfortunately not.

Eric




^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH 6/7] exec: Move most of setup_new_exec into flush_old_exec
  2020-05-05 21:29   ` Kees Cook
  2020-05-06 14:57     ` Eric W. Biederman
@ 2020-05-07 21:51     ` Eric W. Biederman
  2020-05-08  5:50       ` Kees Cook
  1 sibling, 1 reply; 122+ messages in thread
From: Eric W. Biederman @ 2020-05-07 21:51 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, Jann Horn,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton

Kees Cook <keescook@chromium.org> writes:

> On Tue, May 05, 2020 at 02:45:33PM -0500, Eric W. Biederman wrote:
>> 
>> The current idiom for the callers is:
>> 
>> flush_old_exec(bprm);
>> set_personality(...);
>> setup_new_exec(bprm);
>> 
>> In 2010 Linus split flush_old_exec into flush_old_exec and
>> setup_new_exec.  With the intention that setup_new_exec be what is
>> called after the processes new personality is set.
>> 
>> Move the code that doesn't depend upon the personality from
>> setup_new_exec into flush_old_exec.  This is to facilitate future
>> changes by having as much code together in one function as possible.
>
> Er, I *think* this is okay, but I have some questions below which
> maybe you already investigated (and should perhaps get called out in
> the changelog).

I intend to the following text to the changelog.  At this point I
believe I have read through everything and nothing raises any concerns
for me:

--- text begin ---

To see why it is safe to move this code please note that effectively
this change moves the personality setting in the binfmt and the following
three lines of code after everything except unlocking the mutexes:
        arch_pick_mmap_layout
        arch_setup_new_exec
        mm->task_size = TASK_SIZE

The function arch_pick_mmap_layout at most sets:
        mm->get_unmapped_area
        mm->mmap_base
        mm->mmap_legacy_base
        mm->mmap_compat_base
        mm->mmap_compat_legacy_base
which nothing in flush_old_exec or setup_new_exec depends on.

The function arch_setup_new_exec only sets architecture specific
state and the rest of the functions only deal in state that applies
to all architectures.

The last line just sets mm->task_size and again nothing in flush_old_exec
or setup_new_exec depend on task_size.

--- text end ---

>> 
>> Ref: 221af7f87b97 ("Split 'flush_old_exec' into two functions")
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>> ---
>>  fs/exec.c | 85 ++++++++++++++++++++++++++++---------------------------
>>  1 file changed, 44 insertions(+), 41 deletions(-)
>> 
>> diff --git a/fs/exec.c b/fs/exec.c
>> index 8c3abafb9bb1..0eff20558735 100644
>> --- a/fs/exec.c
>> +++ b/fs/exec.c
>> @@ -1359,39 +1359,7 @@ int flush_old_exec(struct linux_binprm * bprm)
>>  	 * undergoing exec(2).
>>  	 */
>>  	do_close_on_exec(me->files);
>> -	return 0;
>> -
>> -out_unlock:
>> -	mutex_unlock(&me->signal->exec_update_mutex);
>> -out:
>> -	return retval;
>> -}
>> -EXPORT_SYMBOL(flush_old_exec);
>> -
>> -void would_dump(struct linux_binprm *bprm, struct file *file)
>> -{
>> -	struct inode *inode = file_inode(file);
>> -	if (inode_permission(inode, MAY_READ) < 0) {
>> -		struct user_namespace *old, *user_ns;
>> -		bprm->interp_flags |= BINPRM_FLAGS_ENFORCE_NONDUMP;
>> -
>> -		/* Ensure mm->user_ns contains the executable */
>> -		user_ns = old = bprm->mm->user_ns;
>> -		while ((user_ns != &init_user_ns) &&
>> -		       !privileged_wrt_inode_uidgid(user_ns, inode))
>> -			user_ns = user_ns->parent;
>>  
>> -		if (old != user_ns) {
>> -			bprm->mm->user_ns = get_user_ns(user_ns);
>> -			put_user_ns(old);
>> -		}
>> -	}
>> -}
>> -EXPORT_SYMBOL(would_dump);
>> -
>> -void setup_new_exec(struct linux_binprm * bprm)
>> -{
>> -	struct task_struct *me = current;
>>  	/*
>>  	 * Once here, prepare_binrpm() will not be called any more, so
>>  	 * the final state of setuid/setgid/fscaps can be merged into the
>> @@ -1414,8 +1382,6 @@ void setup_new_exec(struct linux_binprm * bprm)
>>  			bprm->rlim_stack.rlim_cur = _STK_LIM;
>>  	}
>>  
>> -	arch_pick_mmap_layout(me->mm, &bprm->rlim_stack);
>> -
>>  	me->sas_ss_sp = me->sas_ss_size = 0;
>>  
>>  	/*
>> @@ -1430,16 +1396,9 @@ void setup_new_exec(struct linux_binprm * bprm)
>>  	else
>>  		set_dumpable(current->mm, SUID_DUMP_USER);
>>  
>> -	arch_setup_new_exec();
>>  	perf_event_exec();
>
> What is perf expecting to be able to examine at this point? Does it want
> a view of things after arch_setup_new_exec()? (i.e. "final" TIF flags,
> mmap layout, etc.) From what I can, the answer is "no, it's just
> resetting counters", so I think this is fine. Maybe double-check with
> Steve?

I can't find anything in the perf code that depends on
arch_pick_mmap_layout or mm->task_size.  So I don't have any concerns.
I have grepped through both kernel/events/ and arch/x86/events/ and
include/trace to double check and have nothing turned up.

I can't see the policy of where things will be allocated in the
memory map making any difference to perf.

Depending on what events actually are I can imagine then firing and
having issues as I can imagine an event to be just about anything
but I don't see a way to prevent that.  

I do see perf disabling events that are based on addresses.  I further
see perf enabling/disabling events that have already been computed.  I
see perf treating exec effectively as a process scheduling out and in.

Then finally I see perf shutting itself down on suid exec, and
generating some final perf events.  I have some concerns that is
a bit late, and that the test might not be quite right but nothing
particular to this change.

>>  	__set_task_comm(me, kbasename(bprm->filename), true);
>>  
>> -	/* Set the new mm task size. We have to do that late because it may
>> -	 * depend on TIF_32BIT which is only updated in flush_thread() on
>> -	 * some architectures like powerpc
>> -	 */
>> -	me->mm->task_size = TASK_SIZE;
>> -
>>  	/* An exec changes our domain. We are no longer part of the thread
>>  	   group */
>>  	WRITE_ONCE(me->self_exec_id, me->self_exec_id + 1);
>> @@ -1467,6 +1426,50 @@ void setup_new_exec(struct linux_binprm * bprm)
>>  	 * credentials; any time after this it may be unlocked.
>>  	 */
>>  	security_bprm_committed_creds(bprm);
>
> Similarly for the LSM hook: is it expecting a post-arch-setup view? I
> don't see anything looking at task_size, TIF flags, or anything else;
> they seem to be just cleaning up from the old process being replaced, so
> against, I think this is okay.

Nothing at all with the mm.  The LSM hooks close files and
muck with rlimits and signals, and tidy up their lsm state.

There are only 3 implementations apparmor, tomoyo and selinux
so it isn't too hard to read through them.

> Not visible in this patch, the following things how happen earlier,
> which I feel should maybe get called out in the changelog, with,
> perhaps, better justification than what I've got here:
>
> bprm->secureexec set/check (looks safe, since it depends on
> prepare_binprm()'s security_bprm_set_creds().
>
> rlim_stack.rlim_cur setting (safe, just needs to happen before
> arch_pick_mmap_layout())
>
> dumpable() check (looks safe, BINPRM_FLAGS_ENFORCE_NONDUMP depends on
> much earlier would_dump(), and uid/gid depend on earlier calls to
> prepare_binprm()'s bprm_fill_uid())
>
> __set_task_comm (looks safe, just dealing with the task name...)
>
> self_exec_id bump (looks safe, but I think -- it's still after uid
> setting)
>
> flush_signal_handlers() (looks safe -- nothing appears to depend on mm
> nor personality)

Agreed.

>> +	return 0;
>> +
>> +out_unlock:
>> +	mutex_unlock(&me->signal->exec_update_mutex);
>> +out:
>> +	return retval;
>> +}
>> +EXPORT_SYMBOL(flush_old_exec);
>> +
>> +void would_dump(struct linux_binprm *bprm, struct file *file)
>> +{
>> +	struct inode *inode = file_inode(file);
>> +	if (inode_permission(inode, MAY_READ) < 0) {
>> +		struct user_namespace *old, *user_ns;
>> +		bprm->interp_flags |= BINPRM_FLAGS_ENFORCE_NONDUMP;
>> +
>> +		/* Ensure mm->user_ns contains the executable */
>> +		user_ns = old = bprm->mm->user_ns;
>> +		while ((user_ns != &init_user_ns) &&
>> +		       !privileged_wrt_inode_uidgid(user_ns, inode))
>> +			user_ns = user_ns->parent;
>> +
>> +		if (old != user_ns) {
>> +			bprm->mm->user_ns = get_user_ns(user_ns);
>> +			put_user_ns(old);
>> +		}
>> +	}
>> +}
>> +EXPORT_SYMBOL(would_dump);
>
> The diff helpfully decided this moved would_dump(). ;) Is it worth
> maybe just moviing it explicitly above flush_old_exec() to avoid this
> churn? I dunno.

Given the amount of review and testing that has been put in at
this point I don't think so.

>> +
>> +void setup_new_exec(struct linux_binprm * bprm)
>> +{
>> +	/* Setup things that can depend upon the personality */
>
> Should this comment be above the function instead?

My experience has been that comments above functions unless they are in
full linuxdoc tend to be less well maintained than comments within the
function itself.  So I don't think it is worth moving.x

>> +	struct task_struct *me = current;
>> +
>> +	arch_pick_mmap_layout(me->mm, &bprm->rlim_stack);
>> +
>> +	arch_setup_new_exec();
>> +
>> +	/* Set the new mm task size. We have to do that late because it may
>> +	 * depend on TIF_32BIT which is only updated in flush_thread() on
>> +	 * some architectures like powerpc
>> +	 */
>> +	me->mm->task_size = TASK_SIZE;
>>  	mutex_unlock(&me->signal->exec_update_mutex);
>>  	mutex_unlock(&me->signal->cred_guard_mutex);
>>  }
>> -- 
>> 2.20.1
>> 
>
> So, as I say, I *think* this is okay, but I always get suspicious about
> reordering things in execve(). ;)
>
> So, with a bit larger changelog discussing what's moving "earlier",
> I think this looks good:

Please see above.

Eric

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH 6/7] exec: Move most of setup_new_exec into flush_old_exec
  2020-05-07 21:51     ` Eric W. Biederman
@ 2020-05-08  5:50       ` Kees Cook
  0 siblings, 0 replies; 122+ messages in thread
From: Kees Cook @ 2020-05-08  5:50 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, Jann Horn,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton

On Thu, May 07, 2020 at 04:51:13PM -0500, Eric W. Biederman wrote:
> I intend to the following text to the changelog.  At this point I
> believe I have read through everything and nothing raises any concerns
> for me:
> 
> --- text begin ---
> 
> To see why it is safe to move this code please note that effectively
> this change moves the personality setting in the binfmt and the following
> three lines of code after everything except unlocking the mutexes:
>         arch_pick_mmap_layout
>         arch_setup_new_exec
>         mm->task_size = TASK_SIZE
> 
> The function arch_pick_mmap_layout at most sets:
>         mm->get_unmapped_area
>         mm->mmap_base
>         mm->mmap_legacy_base
>         mm->mmap_compat_base
>         mm->mmap_compat_legacy_base
> which nothing in flush_old_exec or setup_new_exec depends on.
> 
> The function arch_setup_new_exec only sets architecture specific
> state and the rest of the functions only deal in state that applies
> to all architectures.
> 
> The last line just sets mm->task_size and again nothing in flush_old_exec
> or setup_new_exec depend on task_size.
> 
> --- text end ---
> [...]
> > So, with a bit larger changelog discussing what's moving "earlier",
> > I think this looks good:
> 
> Please see above.

Awesome! Thanks for checking my checking of your checking. ;)

Acked-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH 0/6] exec: Trivial cleanups for exec
  2020-05-05 19:39 exec: Promised cleanups after introducing exec_update_mutex Eric W. Biederman
                   ` (7 preceding siblings ...)
  2020-05-06 12:41 ` exec: Promised cleanups after introducing exec_update_mutex Greg Ungerer
@ 2020-05-08 18:43 ` Eric W. Biederman
  2020-05-08 18:44   ` [PATCH 1/6] exec: Move the comment from above de_thread to above unshare_sighand Eric W. Biederman
                     ` (6 more replies)
  8 siblings, 7 replies; 122+ messages in thread
From: Eric W. Biederman @ 2020-05-08 18:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Oleg Nesterov, Jann Horn, Kees Cook,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton


This is a continuation of my work to clean up exec so it's more
difficult problems are approachable.

The changes correct some comments, stop open coding mutex_lock_killable,
and move the point_of_no_return variable up to when the
point_of_no_return actually occurs.

I don't think there is anything controversial in there but if you see
something please let me know.

Eric W. Biederman (6):
      exec: Move the comment from above de_thread to above unshare_sighand
      exec: Fix spelling of search_binary_handler in a comment
      exec: Stop open coding mutex_lock_killable of cred_guard_mutex
      exec: Run sync_mm_rss before taking exec_update_mutex
      exec: Move handling of the point of no return to the top level
      exec: Set the point of no return sooner

 fs/exec.c       | 51 +++++++++++++++++++++++++++------------------------
 kernel/ptrace.c |  4 ++--
 2 files changed, 29 insertions(+), 26 deletions(-)

^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH 1/6] exec: Move the comment from above de_thread to above unshare_sighand
  2020-05-08 18:43 ` [PATCH 0/6] exec: Trivial cleanups for exec Eric W. Biederman
@ 2020-05-08 18:44   ` Eric W. Biederman
  2020-05-09  5:02     ` Kees Cook
  2020-05-08 18:44   ` [PATCH 2/6] exec: Fix spelling of search_binary_handler in a comment Eric W. Biederman
                     ` (5 subsequent siblings)
  6 siblings, 1 reply; 122+ messages in thread
From: Eric W. Biederman @ 2020-05-08 18:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Oleg Nesterov, Jann Horn, Kees Cook,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton


The comment describes work that now happens in unshare_sighand so
move the comment where it makes sense.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/exec.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 3cc40048cc65..d4387bc92292 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1093,12 +1093,6 @@ static int exec_mmap(struct mm_struct *mm)
 	return 0;
 }
 
-/*
- * This function makes sure the current process has its own signal table,
- * so that flush_signal_handlers can later reset the handlers without
- * disturbing other processes.  (Other processes might share the signal
- * table via the CLONE_SIGHAND option to clone().)
- */
 static int de_thread(struct task_struct *tsk)
 {
 	struct signal_struct *sig = tsk->signal;
@@ -1240,6 +1234,12 @@ static int de_thread(struct task_struct *tsk)
 }
 
 
+/*
+ * This function makes sure the current process has its own signal table,
+ * so that flush_signal_handlers can later reset the handlers without
+ * disturbing other processes.  (Other processes might share the signal
+ * table via the CLONE_SIGHAND option to clone().)
+ */
 static int unshare_sighand(struct task_struct *me)
 {
 	struct sighand_struct *oldsighand = me->sighand;
-- 
2.20.1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH 2/6] exec: Fix spelling of search_binary_handler in a comment
  2020-05-08 18:43 ` [PATCH 0/6] exec: Trivial cleanups for exec Eric W. Biederman
  2020-05-08 18:44   ` [PATCH 1/6] exec: Move the comment from above de_thread to above unshare_sighand Eric W. Biederman
@ 2020-05-08 18:44   ` Eric W. Biederman
  2020-05-09  5:03     ` Kees Cook
  2020-05-08 18:45   ` [PATCH 3/6] exec: Stop open coding mutex_lock_killable of cred_guard_mutex Eric W. Biederman
                     ` (4 subsequent siblings)
  6 siblings, 1 reply; 122+ messages in thread
From: Eric W. Biederman @ 2020-05-08 18:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Oleg Nesterov, Jann Horn, Kees Cook,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton


Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/exec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/exec.c b/fs/exec.c
index d4387bc92292..82106241ed53 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1296,7 +1296,7 @@ void __set_task_comm(struct task_struct *tsk, const char *buf, bool exec)
  * Calling this is the point of no return. None of the failures will be
  * seen by userspace since either the process is already taking a fatal
  * signal (via de_thread() or coredump), or will have SEGV raised
- * (after exec_mmap()) by search_binary_handlers (see below).
+ * (after exec_mmap()) by search_binary_handler (see below).
  */
 int begin_new_exec(struct linux_binprm * bprm)
 {
-- 
2.20.1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH 3/6] exec: Stop open coding mutex_lock_killable of cred_guard_mutex
  2020-05-08 18:43 ` [PATCH 0/6] exec: Trivial cleanups for exec Eric W. Biederman
  2020-05-08 18:44   ` [PATCH 1/6] exec: Move the comment from above de_thread to above unshare_sighand Eric W. Biederman
  2020-05-08 18:44   ` [PATCH 2/6] exec: Fix spelling of search_binary_handler in a comment Eric W. Biederman
@ 2020-05-08 18:45   ` Eric W. Biederman
  2020-05-09  5:08     ` Kees Cook
  2020-05-09 19:18     ` Linus Torvalds
  2020-05-08 18:45   ` [PATCH 4/6] exec: Run sync_mm_rss before taking exec_update_mutex Eric W. Biederman
                     ` (3 subsequent siblings)
  6 siblings, 2 replies; 122+ messages in thread
From: Eric W. Biederman @ 2020-05-08 18:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Oleg Nesterov, Jann Horn, Kees Cook,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton


Oleg modified the code that did
"mutex_lock_interruptible(&current->cred_guard_mutex)" to return
-ERESTARTNOINTR instead of -EINTR, so that userspace will never see a
failure to grab the mutex.

Slightly earlier Liam R. Howlett defined mutex_lock_killable for
exactly the same situation but it does it a little more cleanly.

Switch the code to mutex_lock_killable so that it is clearer what the
code is doing.

Ref: ad776537cc6b ("Add mutex_lock_killable")
Ref: 793285fcafce ("cred_guard_mutex: do not return -EINTR to user-space")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/exec.c       | 5 +++--
 kernel/ptrace.c | 4 ++--
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 82106241ed53..11a5c073aa35 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1493,8 +1493,9 @@ EXPORT_SYMBOL(finalize_exec);
  */
 static int prepare_bprm_creds(struct linux_binprm *bprm)
 {
-	if (mutex_lock_interruptible(&current->signal->cred_guard_mutex))
-		return -ERESTARTNOINTR;
+	int retval = mutex_lock_killable(&current->signal->cred_guard_mutex);
+	if (retval)
+		return retval;
 
 	bprm->cred = prepare_exec_creds();
 	if (likely(bprm->cred))
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 43d6179508d6..1876b3392488 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -391,8 +391,8 @@ static int ptrace_attach(struct task_struct *task, long request,
 	 * SUID, SGID and LSM creds get determined differently
 	 * under ptrace.
 	 */
-	retval = -ERESTARTNOINTR;
-	if (mutex_lock_interruptible(&task->signal->cred_guard_mutex))
+	retval = mutex_lock_killable(&task->signal->cred_guard_mutex);
+	if (retval)
 		goto out;
 
 	task_lock(task);
-- 
2.20.1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH 4/6] exec: Run sync_mm_rss before taking exec_update_mutex
  2020-05-08 18:43 ` [PATCH 0/6] exec: Trivial cleanups for exec Eric W. Biederman
                     ` (2 preceding siblings ...)
  2020-05-08 18:45   ` [PATCH 3/6] exec: Stop open coding mutex_lock_killable of cred_guard_mutex Eric W. Biederman
@ 2020-05-08 18:45   ` Eric W. Biederman
  2020-05-09  5:15     ` Kees Cook
  2020-05-08 18:47   ` [PATCH 5/6] exec: Move handling of the point of no return to the top level Eric W. Biederman
                     ` (2 subsequent siblings)
  6 siblings, 1 reply; 122+ messages in thread
From: Eric W. Biederman @ 2020-05-08 18:45 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Oleg Nesterov, Jann Horn, Kees Cook,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton


Like exec_mm_release sync_mm_rss is about flushing out the state of
the old_mm, which does not need to happen under exec_update_mutex.

Make this explicit by moving sync_mm_rss outside of exec_update_mutex.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/exec.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/exec.c b/fs/exec.c
index 11a5c073aa35..15682a1dfee9 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1051,13 +1051,14 @@ static int exec_mmap(struct mm_struct *mm)
 	tsk = current;
 	old_mm = current->mm;
 	exec_mm_release(tsk, old_mm);
+	if (old_mm)
+		sync_mm_rss(old_mm);
 
 	ret = mutex_lock_killable(&tsk->signal->exec_update_mutex);
 	if (ret)
 		return ret;
 
 	if (old_mm) {
-		sync_mm_rss(old_mm);
 		/*
 		 * Make sure that if there is a core dump in progress
 		 * for the old mm, we get out and die instead of going
-- 
2.20.1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH 5/6] exec: Move handling of the point of no return to the top level
  2020-05-08 18:43 ` [PATCH 0/6] exec: Trivial cleanups for exec Eric W. Biederman
                     ` (3 preceding siblings ...)
  2020-05-08 18:45   ` [PATCH 4/6] exec: Run sync_mm_rss before taking exec_update_mutex Eric W. Biederman
@ 2020-05-08 18:47   ` Eric W. Biederman
  2020-05-09  5:31     ` Kees Cook
  2020-05-08 18:48   ` [PATCH 6/6] exec: Set the point of no return sooner Eric W. Biederman
  2020-05-09 19:40   ` [PATCH 0/5] exec: Control flow simplifications Eric W. Biederman
  6 siblings, 1 reply; 122+ messages in thread
From: Eric W. Biederman @ 2020-05-08 18:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Oleg Nesterov, Jann Horn, Kees Cook,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton


Move the handing of the point of no return from search_binary_handler
into __do_execve_file so that it is easier to find, and to keep
things robust in the face of change.

Make it clear that an existing fatal signal will take precedence over
a forced SIGSEGV by not forcing SIGSEGV if a fatal signal is already
pending.  This does not change the behavior but it saves a reader
of the code the tedium of reading and understanding force_sig
and the signal delivery code.

Update the comment in begin_new_exec about where SIGSEGV is forced.

Keep point_of_no_return from being a mystery by documenting
what the code is doing where it forces SIGSEGV if the
code is past the point of no return.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/exec.c | 21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 15682a1dfee9..443eb960f9a0 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1329,8 +1329,8 @@ int begin_new_exec(struct linux_binprm * bprm)
 	/*
 	 * With the new mm installed it is completely impossible to
 	 * fail and return to the original process.  If anything from
-	 * here on returns an error, the check in
-	 * search_binary_handler() will SEGV current.
+	 * here on returns an error, the check in __do_execve_file()
+	 * will SEGV current.
 	 */
 	bprm->point_of_no_return = true;
 	bprm->mm = NULL;
@@ -1722,13 +1722,8 @@ int search_binary_handler(struct linux_binprm *bprm)
 
 		read_lock(&binfmt_lock);
 		put_binfmt(fmt);
-		if (retval < 0 && bprm->point_of_no_return) {
-			/* we got to flush_old_exec() and failed after it */
-			read_unlock(&binfmt_lock);
-			force_sigsegv(SIGSEGV);
-			return retval;
-		}
-		if (retval != -ENOEXEC || !bprm->file) {
+		if (bprm->point_of_no_return || !bprm->file ||
+		    (retval != -ENOEXEC)) {
 			read_unlock(&binfmt_lock);
 			return retval;
 		}
@@ -1899,6 +1894,14 @@ static int __do_execve_file(int fd, struct filename *filename,
 	return retval;
 
 out:
+	/*
+	 * If past the point of no return ensure the the code never
+	 * returns to the userspace process.  Use an existing fatal
+	 * signal if present otherwise terminate the process with
+	 * SIGSEGV.
+	 */
+	if (bprm->point_of_no_return && !fatal_signal_pending(current))
+		force_sigsegv(SIGSEGV);
 	if (bprm->mm) {
 		acct_arg_size(bprm, 0);
 		mmput(bprm->mm);
-- 
2.20.1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH 6/6] exec: Set the point of no return sooner
  2020-05-08 18:43 ` [PATCH 0/6] exec: Trivial cleanups for exec Eric W. Biederman
                     ` (4 preceding siblings ...)
  2020-05-08 18:47   ` [PATCH 5/6] exec: Move handling of the point of no return to the top level Eric W. Biederman
@ 2020-05-08 18:48   ` Eric W. Biederman
  2020-05-09  5:33     ` Kees Cook
  2020-05-09 19:40   ` [PATCH 0/5] exec: Control flow simplifications Eric W. Biederman
  6 siblings, 1 reply; 122+ messages in thread
From: Eric W. Biederman @ 2020-05-08 18:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Oleg Nesterov, Jann Horn, Kees Cook,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton


Make the code more robust by marking the point of no return sooner.
This ensures that future code changes don't need to worry about how
they return errors if they are past this point.

This results in no actual change in behavior as __do_execve_file does
not force SIGSEGV when there is a pending fatal signal pending past
the point of no return.  Further the only error returns from de_thread
and exec_mmap that can occur result in fatal signals being pending.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/exec.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 443eb960f9a0..b0620d5ebc66 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1304,6 +1304,11 @@ int begin_new_exec(struct linux_binprm * bprm)
 	struct task_struct *me = current;
 	int retval;
 
+	/*
+	 * Ensure all future errors are fatal.
+	 */
+	bprm->point_of_no_return = true;
+
 	/*
 	 * Make this the only thread in the thread group.
 	 */
@@ -1326,13 +1331,6 @@ int begin_new_exec(struct linux_binprm * bprm)
 	if (retval)
 		goto out;
 
-	/*
-	 * With the new mm installed it is completely impossible to
-	 * fail and return to the original process.  If anything from
-	 * here on returns an error, the check in __do_execve_file()
-	 * will SEGV current.
-	 */
-	bprm->point_of_no_return = true;
 	bprm->mm = NULL;
 
 #ifdef CONFIG_POSIX_TIMERS
-- 
2.20.1


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH 1/6] exec: Move the comment from above de_thread to above unshare_sighand
  2020-05-08 18:44   ` [PATCH 1/6] exec: Move the comment from above de_thread to above unshare_sighand Eric W. Biederman
@ 2020-05-09  5:02     ` Kees Cook
  0 siblings, 0 replies; 122+ messages in thread
From: Kees Cook @ 2020-05-09  5:02 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, Jann Horn,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton

On Fri, May 08, 2020 at 01:44:19PM -0500, Eric W. Biederman wrote:
> 
> The comment describes work that now happens in unshare_sighand so
> move the comment where it makes sense.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH 2/6] exec: Fix spelling of search_binary_handler in a comment
  2020-05-08 18:44   ` [PATCH 2/6] exec: Fix spelling of search_binary_handler in a comment Eric W. Biederman
@ 2020-05-09  5:03     ` Kees Cook
  0 siblings, 0 replies; 122+ messages in thread
From: Kees Cook @ 2020-05-09  5:03 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, Jann Horn,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton

On Fri, May 08, 2020 at 01:44:46PM -0500, Eric W. Biederman wrote:
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH 3/6] exec: Stop open coding mutex_lock_killable of cred_guard_mutex
  2020-05-08 18:45   ` [PATCH 3/6] exec: Stop open coding mutex_lock_killable of cred_guard_mutex Eric W. Biederman
@ 2020-05-09  5:08     ` Kees Cook
  2020-05-09 19:18     ` Linus Torvalds
  1 sibling, 0 replies; 122+ messages in thread
From: Kees Cook @ 2020-05-09  5:08 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, Jann Horn,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton

On Fri, May 08, 2020 at 01:45:25PM -0500, Eric W. Biederman wrote:
> 
> Oleg modified the code that did
> "mutex_lock_interruptible(&current->cred_guard_mutex)" to return
> -ERESTARTNOINTR instead of -EINTR, so that userspace will never see a
> failure to grab the mutex.
> 
> Slightly earlier Liam R. Howlett defined mutex_lock_killable for
> exactly the same situation but it does it a little more cleanly.
> 
> Switch the code to mutex_lock_killable so that it is clearer what the
> code is doing.
> 
> Ref: ad776537cc6b ("Add mutex_lock_killable")
> Ref: 793285fcafce ("cred_guard_mutex: do not return -EINTR to user-space")
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH 4/6] exec: Run sync_mm_rss before taking exec_update_mutex
  2020-05-08 18:45   ` [PATCH 4/6] exec: Run sync_mm_rss before taking exec_update_mutex Eric W. Biederman
@ 2020-05-09  5:15     ` Kees Cook
  2020-05-09 14:17       ` Eric W. Biederman
  0 siblings, 1 reply; 122+ messages in thread
From: Kees Cook @ 2020-05-09  5:15 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, Jann Horn,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton

On Fri, May 08, 2020 at 01:45:56PM -0500, Eric W. Biederman wrote:
> Like exec_mm_release sync_mm_rss is about flushing out the state of
> the old_mm, which does not need to happen under exec_update_mutex.
> 
> Make this explicit by moving sync_mm_rss outside of exec_update_mutex.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Reviewed-by: Kees Cook <keescook@chromium.org>

Additional thoughts below...

> ---
>  fs/exec.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/exec.c b/fs/exec.c
> index 11a5c073aa35..15682a1dfee9 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1051,13 +1051,14 @@ static int exec_mmap(struct mm_struct *mm)
>  	tsk = current;
>  	old_mm = current->mm;
>  	exec_mm_release(tsk, old_mm);
> +	if (old_mm)
> +		sync_mm_rss(old_mm);
>  
>  	ret = mutex_lock_killable(&tsk->signal->exec_update_mutex);
>  	if (ret)
>  		return ret;
>  
>  	if (old_mm) {
> -		sync_mm_rss(old_mm);
>  		/*
>  		 * Make sure that if there is a core dump in progress
>  		 * for the old mm, we get out and die instead of going

$ git grep exec_mm_release
fs/exec.c:      exec_mm_release(tsk, old_mm);
include/linux/sched/mm.h:extern void exec_mm_release(struct task_struct *, struct mm_struct *);
kernel/fork.c:void exec_mm_release(struct task_struct *tsk, struct mm_struct *mm)

kernel/fork.c:

void exit_mm_release(struct task_struct *tsk, struct mm_struct *mm)
{
        futex_exit_release(tsk);
        mm_release(tsk, mm);
}

void exec_mm_release(struct task_struct *tsk, struct mm_struct *mm)
{
        futex_exec_release(tsk);
        mm_release(tsk, mm);
}

$ git grep exit_mm_release
include/linux/sched/mm.h:extern void exit_mm_release(struct task_struct *, struct mm_struct *);
kernel/exit.c:  exit_mm_release(current, mm);
kernel/fork.c:void exit_mm_release(struct task_struct *tsk, struct mm_struct *mm)

kernel/exit.c:

        exit_mm_release(current, mm);
        if (!mm)
                return;
        sync_mm_rss(mm);

It looks to me like both exec_mm_release() and exit_mm_release() could
easily have the sync_mm_rss(...) folded into their function bodies and
removed from the callers. *shrug*

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH 5/6] exec: Move handling of the point of no return to the top level
  2020-05-08 18:47   ` [PATCH 5/6] exec: Move handling of the point of no return to the top level Eric W. Biederman
@ 2020-05-09  5:31     ` Kees Cook
  2020-05-09 13:39       ` Eric W. Biederman
  0 siblings, 1 reply; 122+ messages in thread
From: Kees Cook @ 2020-05-09  5:31 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, Jann Horn,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton

On Fri, May 08, 2020 at 01:47:10PM -0500, Eric W. Biederman wrote:
> 
> Move the handing of the point of no return from search_binary_handler
> into __do_execve_file so that it is easier to find, and to keep
> things robust in the face of change.
> 
> Make it clear that an existing fatal signal will take precedence over
> a forced SIGSEGV by not forcing SIGSEGV if a fatal signal is already
> pending.  This does not change the behavior but it saves a reader
> of the code the tedium of reading and understanding force_sig
> and the signal delivery code.
> 
> Update the comment in begin_new_exec about where SIGSEGV is forced.
> 
> Keep point_of_no_return from being a mystery by documenting
> what the code is doing where it forces SIGSEGV if the
> code is past the point of no return.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

I had to read the code around these changes a bit carefully, but yeah,
this looks like a safe cleanup. It is a behavioral change, though (in
that in unmasks non-SEGV fatal signals), so I do wonder if something
somewhere might notice this, but I'd agree that it's the more robust
behavior.

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH 6/6] exec: Set the point of no return sooner
  2020-05-08 18:48   ` [PATCH 6/6] exec: Set the point of no return sooner Eric W. Biederman
@ 2020-05-09  5:33     ` Kees Cook
  0 siblings, 0 replies; 122+ messages in thread
From: Kees Cook @ 2020-05-09  5:33 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, Jann Horn,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton

On Fri, May 08, 2020 at 01:48:13PM -0500, Eric W. Biederman wrote:
> 
> Make the code more robust by marking the point of no return sooner.
> This ensures that future code changes don't need to worry about how
> they return errors if they are past this point.
> 
> This results in no actual change in behavior as __do_execve_file does
> not force SIGSEGV when there is a pending fatal signal pending past
> the point of no return.  Further the only error returns from de_thread
> and exec_mmap that can occur result in fatal signals being pending.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Yes, thank you. I'm a fan; this makes the comment above the function a
bit easier to understand, since the very first thing is to set the
point_of_no_return. :)

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH 5/6] exec: Move handling of the point of no return to the top level
  2020-05-09  5:31     ` Kees Cook
@ 2020-05-09 13:39       ` Eric W. Biederman
  0 siblings, 0 replies; 122+ messages in thread
From: Eric W. Biederman @ 2020-05-09 13:39 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, Jann Horn,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton

Kees Cook <keescook@chromium.org> writes:

> On Fri, May 08, 2020 at 01:47:10PM -0500, Eric W. Biederman wrote:
>> 
>> Move the handing of the point of no return from search_binary_handler
>> into __do_execve_file so that it is easier to find, and to keep
>> things robust in the face of change.
>> 
>> Make it clear that an existing fatal signal will take precedence over
>> a forced SIGSEGV by not forcing SIGSEGV if a fatal signal is already
>> pending.  This does not change the behavior but it saves a reader
>> of the code the tedium of reading and understanding force_sig
>> and the signal delivery code.
>> 
>> Update the comment in begin_new_exec about where SIGSEGV is forced.
>> 
>> Keep point_of_no_return from being a mystery by documenting
>> what the code is doing where it forces SIGSEGV if the
>> code is past the point of no return.
>> 
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>
> I had to read the code around these changes a bit carefully, but yeah,
> this looks like a safe cleanup. It is a behavioral change, though (in
> that in unmasks non-SEGV fatal signals), so I do wonder if something
> somewhere might notice this, but I'd agree that it's the more robust
> behavior.

So the only behavioral change that I can see is that for non-SIGSEGV
fatal signals the signal handler for SIGSEGV will not be set to SIG_DFL
and SIGSEGV will not be removed from tasks local blocked signal set.

I think there is a good case that behavior change is a bug.

If you think that it was SIGSEGV that was being delivered and
it was masking an other existing fatal you would be incorrect.

If you look at:

fatal_signal_pending - you will see that it tests for SIGKILL on the
current's tasks queue.

complete_signal() - you will see that when a fatal (non-coredumpable)
signal is delvered it sets SIGKILL in every threads local queue.  As
well as setting SIGNAL_GROUP_EXIT

get_signal - It special cases SIGNAL_GROUP_EXIT and fast forwards
to the end.  So that a signal that has been delivered can not be
overriden by another signal.

__send_signal - It tests SIGNAL_GROUP_EXIT and if it is set
gets out early (which applies to force_sigsegv amoung others)

So unless it is de_thread or coredumping that sets the task
local SIGKILL there is no chance for a force SIGSEGV to do antyhing,
and the code has already tested for those to in de_thread and
exit_mmap before point_of_no_return is set.

So except for the SIGSEGV handler and blocked state there are no
behavior changes.


That does takes some reading through all of that code to see what is
going on, and just saying !fatal_signal_pending makes it all so much
clearer.


In the next patch when I move setting point_of_no_return earlier that
fatal_signal_pending check ensures that we don't stomp on de_thread or
coredump state with force_sigsegv(SIGSEGV).  But again that also won't
be a change in behavior, as we aren't performing the force_sigsegv test
when it could be either of those things until that patch.  So
force_sigsegv never gets a chance to stomp on those cases.

Eric

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH 4/6] exec: Run sync_mm_rss before taking exec_update_mutex
  2020-05-09  5:15     ` Kees Cook
@ 2020-05-09 14:17       ` Eric W. Biederman
  0 siblings, 0 replies; 122+ messages in thread
From: Eric W. Biederman @ 2020-05-09 14:17 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, Jann Horn,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton

Kees Cook <keescook@chromium.org> writes:

> $ git grep exec_mm_release
> fs/exec.c:      exec_mm_release(tsk, old_mm);
> include/linux/sched/mm.h:extern void exec_mm_release(struct task_struct *, struct mm_struct *);
> kernel/fork.c:void exec_mm_release(struct task_struct *tsk, struct mm_struct *mm)
>
> kernel/fork.c:
>
> void exit_mm_release(struct task_struct *tsk, struct mm_struct *mm)
> {
>         futex_exit_release(tsk);
>         mm_release(tsk, mm);
> }
>
> void exec_mm_release(struct task_struct *tsk, struct mm_struct *mm)
> {
>         futex_exec_release(tsk);
>         mm_release(tsk, mm);
> }
>
> $ git grep exit_mm_release
> include/linux/sched/mm.h:extern void exit_mm_release(struct task_struct *, struct mm_struct *);
> kernel/exit.c:  exit_mm_release(current, mm);
> kernel/fork.c:void exit_mm_release(struct task_struct *tsk, struct mm_struct *mm)
>
> kernel/exit.c:
>
>         exit_mm_release(current, mm);
>         if (!mm)
>                 return;
>         sync_mm_rss(mm);
>
> It looks to me like both exec_mm_release() and exit_mm_release() could
> easily have the sync_mm_rss(...) folded into their function bodies and
> removed from the callers. *shrug*

Well it would have to be all of:
	if (mm) 
		sync_mm_rss(mm);

I remember reading through exit_mm_release and seeing that nothing
actually depended upon a non-NULL mm. Unless you have clear_child_tid
set.

I am not up to speed on that part of the mm layer right now to know if
it is a good idea to put sync_mm_rss in exit_mm_release but at a quick
look it feels like it.

Eric


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH 3/6] exec: Stop open coding mutex_lock_killable of cred_guard_mutex
  2020-05-08 18:45   ` [PATCH 3/6] exec: Stop open coding mutex_lock_killable of cred_guard_mutex Eric W. Biederman
  2020-05-09  5:08     ` Kees Cook
@ 2020-05-09 19:18     ` Linus Torvalds
  2020-05-09 19:57       ` Eric W. Biederman
  2020-05-10 20:33       ` Kees Cook
  1 sibling, 2 replies; 122+ messages in thread
From: Linus Torvalds @ 2020-05-09 19:18 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Kernel Mailing List, Oleg Nesterov, Jann Horn, Kees Cook,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton

On Fri, May 8, 2020 at 11:48 AM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
>
> Oleg modified the code that did
> "mutex_lock_interruptible(&current->cred_guard_mutex)" to return
> -ERESTARTNOINTR instead of -EINTR, so that userspace will never see a
> failure to grab the mutex.
>
> Slightly earlier Liam R. Howlett defined mutex_lock_killable for
> exactly the same situation but it does it a little more cleanly.

What what what?

None of this makes sense. Your commit message is completely wrong, and
the patch is utter shite.

mutex_lock_interruptible() and mutex_lock_killable() are completely
different operations, and the difference has absolutely nothing to do
with  -ERESTARTNOINTR or -EINTR.

mutex_lock_interruptible() is interrupted by any signal.

mutex_lock_killable() is - surprise surprise - only interrupted by
SIGKILL (in theory any fatal signal, but we never actually implemented
that logic, so it's only interruptible by the known-to-always-be-fatal
SIGKILL).

> Switch the code to mutex_lock_killable so that it is clearer what the
> code is doing.

This nonsensical patch makes me worry about all your other patches.
The explanation is wrong, the patch is wrong, and it changes things to
be fundamentally broken.

Before this, ^C would break out of a blocked execve()/ptrace()
situation. After this patch, you need special tools to do so.

This patch is completely wrong.

And Kees, what the heck is that "Reviewed-by" for? Worthless review too.

                Linus

^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH 0/5] exec: Control flow simplifications
  2020-05-08 18:43 ` [PATCH 0/6] exec: Trivial cleanups for exec Eric W. Biederman
                     ` (5 preceding siblings ...)
  2020-05-08 18:48   ` [PATCH 6/6] exec: Set the point of no return sooner Eric W. Biederman
@ 2020-05-09 19:40   ` Eric W. Biederman
  2020-05-09 19:40     ` [PATCH 1/5] exec: Call cap_bprm_set_creds directly from prepare_binprm Eric W. Biederman
                       ` (5 more replies)
  6 siblings, 6 replies; 122+ messages in thread
From: Eric W. Biederman @ 2020-05-09 19:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Oleg Nesterov, Jann Horn, Kees Cook,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton, Casey Schaufler,
	linux-security-module, James Morris, Serge E. Hallyn,
	Andy Lutomirski


It is hard to follow the control flow in exec.c as the code has evolved
over time and something that used to work one way now works another.
This set of changes attempts to address the worst of that, to remove
unnecessary work and to make the code a little easier to follow.

The one rough point in my changes is cap_bprm_set_creds propbably
needs a new name as I have taken it out of security_bprm_set_creds
but my imagination failed to come up with anything better.

Eric W. Biederman (5):
      exec: Call cap_bprm_set_creds directly from prepare_binprm
      exec: Directly call security_bprm_set_creds from __do_execve_file
      exec: Remove recursion from search_binary_handler
      exec: Allow load_misc_binary to call prepare_binfmt unconditionally
      exec: Move the call of prepare_binprm into search_binary_handler

 arch/alpha/kernel/binfmt_loader.c |  5 +----
 fs/binfmt_em86.c                  |  7 +-----
 fs/binfmt_misc.c                  | 22 +++---------------
 fs/binfmt_script.c                |  5 +----
 fs/exec.c                         | 47 +++++++++++++++++++++------------------
 include/linux/binfmts.h           | 11 ++-------
 include/linux/security.h          |  2 +-
 security/apparmor/domain.c        |  3 ---
 security/commoncap.c              |  1 -
 security/selinux/hooks.c          |  2 --
 security/smack/smack_lsm.c        |  3 ---
 security/tomoyo/tomoyo.c          |  6 -----
 12 files changed, 34 insertions(+), 80 deletions(-)

---

I think this is correct set of changes that makes things better but
please look things over/review this code if you have any expertise in
anything I am touching.

Thank you,
Eric



^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH 1/5] exec: Call cap_bprm_set_creds directly from prepare_binprm
  2020-05-09 19:40   ` [PATCH 0/5] exec: Control flow simplifications Eric W. Biederman
@ 2020-05-09 19:40     ` Eric W. Biederman
  2020-05-09 20:04       ` Linus Torvalds
  2020-05-09 19:41     ` [PATCH 2/5] exec: Directly call security_bprm_set_creds from __do_execve_file Eric W. Biederman
                       ` (4 subsequent siblings)
  5 siblings, 1 reply; 122+ messages in thread
From: Eric W. Biederman @ 2020-05-09 19:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Oleg Nesterov, Jann Horn, Kees Cook,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton, Casey Schaufler,
	linux-security-module, James Morris, Serge E. Hallyn,
	Andy Lutomirski


The function cap_bprm_set_creds is the only instance of
security_bprm_set_creds that does something for the primary executable
file and for every interpreter the rest of the implementations of
security_bprm_set_creds do something only for the primary executable
file even if that file is a shell script.

The function cap_bprm_set_creds is also special in that it is called
even when CONFIG_SECURITY is unset.

So calling cap_bprm_set_creds separately to make these two cases explicit,
and allow future changes to take advantages of these differences
to simplify the code.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/exec.c                | 4 ++++
 include/linux/security.h | 2 +-
 security/commoncap.c     | 1 -
 3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index b0620d5ebc66..765bfd51a546 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1641,6 +1641,10 @@ int prepare_binprm(struct linux_binprm *bprm)
 		return retval;
 	bprm->called_set_creds = 1;
 
+	retval = cap_bprm_set_creds(bprm);
+	if (retval)
+		return retval;
+
 	memset(bprm->buf, 0, BINPRM_BUF_SIZE);
 	return kernel_read(bprm->file, bprm->buf, BINPRM_BUF_SIZE, &pos);
 }
diff --git a/include/linux/security.h b/include/linux/security.h
index a8d9310472df..c1aa1638429a 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -571,7 +571,7 @@ static inline int security_vm_enough_memory_mm(struct mm_struct *mm, long pages)
 
 static inline int security_bprm_set_creds(struct linux_binprm *bprm)
 {
-	return cap_bprm_set_creds(bprm);
+	return 0;
 }
 
 static inline int security_bprm_check(struct linux_binprm *bprm)
diff --git a/security/commoncap.c b/security/commoncap.c
index f4ee0ae106b2..3757988abe42 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -1346,7 +1346,6 @@ static struct security_hook_list capability_hooks[] __lsm_ro_after_init = {
 	LSM_HOOK_INIT(ptrace_traceme, cap_ptrace_traceme),
 	LSM_HOOK_INIT(capget, cap_capget),
 	LSM_HOOK_INIT(capset, cap_capset),
-	LSM_HOOK_INIT(bprm_set_creds, cap_bprm_set_creds),
 	LSM_HOOK_INIT(inode_need_killpriv, cap_inode_need_killpriv),
 	LSM_HOOK_INIT(inode_killpriv, cap_inode_killpriv),
 	LSM_HOOK_INIT(inode_getsecurity, cap_inode_getsecurity),
-- 
2.25.0


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH 2/5] exec: Directly call security_bprm_set_creds from __do_execve_file
  2020-05-09 19:40   ` [PATCH 0/5] exec: Control flow simplifications Eric W. Biederman
  2020-05-09 19:40     ` [PATCH 1/5] exec: Call cap_bprm_set_creds directly from prepare_binprm Eric W. Biederman
@ 2020-05-09 19:41     ` Eric W. Biederman
  2020-05-09 20:07       ` Linus Torvalds
  2020-05-11  3:15       ` Kees Cook
  2020-05-09 19:41     ` [PATCH 3/5] exec: Remove recursion from search_binary_handler Eric W. Biederman
                       ` (3 subsequent siblings)
  5 siblings, 2 replies; 122+ messages in thread
From: Eric W. Biederman @ 2020-05-09 19:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Oleg Nesterov, Jann Horn, Kees Cook,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton, Casey Schaufler,
	linux-security-module, James Morris, Serge E. Hallyn,
	Andy Lutomirski


Now that security_bprm_set_creds is no longer responsible for calling
cap_bprm_set_creds, security_bprm_set_creds only does something for
the primary file that is being executed (not any interpreters it may
have).  Therefore call security_bprm_set_creds from __do_execve_file,
instead of from prepare_binprm so that it is only called once, and
remove the now unnecessary called_set_creds field of struct binprm.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/exec.c                  | 11 +++++------
 include/linux/binfmts.h    |  6 ------
 security/apparmor/domain.c |  3 ---
 security/selinux/hooks.c   |  2 --
 security/smack/smack_lsm.c |  3 ---
 security/tomoyo/tomoyo.c   |  6 ------
 6 files changed, 5 insertions(+), 26 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 765bfd51a546..635b5085050c 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1635,12 +1635,6 @@ int prepare_binprm(struct linux_binprm *bprm)
 
 	bprm_fill_uid(bprm);
 
-	/* fill in binprm security blob */
-	retval = security_bprm_set_creds(bprm);
-	if (retval)
-		return retval;
-	bprm->called_set_creds = 1;
-
 	retval = cap_bprm_set_creds(bprm);
 	if (retval)
 		return retval;
@@ -1858,6 +1852,11 @@ static int __do_execve_file(int fd, struct filename *filename,
 	if (retval < 0)
 		goto out;
 
+	/* fill in binprm security blob */
+	retval = security_bprm_set_creds(bprm);
+	if (retval)
+		goto out;
+
 	retval = prepare_binprm(bprm);
 	if (retval < 0)
 		goto out;
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index 1b48e2154766..42f760acfc2c 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -26,12 +26,6 @@ struct linux_binprm {
 	unsigned long p; /* current top of mem */
 	unsigned long argmin; /* rlimit marker for copy_strings() */
 	unsigned int
-		/*
-		 * True after the bprm_set_creds hook has been called once
-		 * (multiple calls can be made via prepare_binprm() for
-		 * binfmt_script/misc).
-		 */
-		called_set_creds:1,
 		/*
 		 * True if most recent call to the commoncaps bprm_set_creds
 		 * hook (due to multiple prepare_binprm() calls from the
diff --git a/security/apparmor/domain.c b/security/apparmor/domain.c
index 6ceb74e0f789..61b9181a9e1f 100644
--- a/security/apparmor/domain.c
+++ b/security/apparmor/domain.c
@@ -875,9 +875,6 @@ int apparmor_bprm_set_creds(struct linux_binprm *bprm)
 		file_inode(bprm->file)->i_mode
 	};
 
-	if (bprm->called_set_creds)
-		return 0;
-
 	ctx = task_ctx(current);
 	AA_BUG(!cred_label(bprm->cred));
 	AA_BUG(!ctx);
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 0b4e32161b77..ff3e1be53da5 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2297,8 +2297,6 @@ static int selinux_bprm_set_creds(struct linux_binprm *bprm)
 
 	/* SELinux context only depends on initial program or script and not
 	 * the script interpreter */
-	if (bprm->called_set_creds)
-		return 0;
 
 	old_tsec = selinux_cred(current_cred());
 	new_tsec = selinux_cred(bprm->cred);
diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
index 8c61d175e195..bd1967730fec 100644
--- a/security/smack/smack_lsm.c
+++ b/security/smack/smack_lsm.c
@@ -904,9 +904,6 @@ static int smack_bprm_set_creds(struct linux_binprm *bprm)
 	struct superblock_smack *sbsp;
 	int rc;
 
-	if (bprm->called_set_creds)
-		return 0;
-
 	isp = smack_inode(inode);
 	if (isp->smk_task == NULL || isp->smk_task == bsp->smk_task)
 		return 0;
diff --git a/security/tomoyo/tomoyo.c b/security/tomoyo/tomoyo.c
index 716c92ec941a..d965ce80a7fb 100644
--- a/security/tomoyo/tomoyo.c
+++ b/security/tomoyo/tomoyo.c
@@ -71,12 +71,6 @@ static void tomoyo_bprm_committed_creds(struct linux_binprm *bprm)
  */
 static int tomoyo_bprm_set_creds(struct linux_binprm *bprm)
 {
-	/*
-	 * Do only if this function is called for the first time of an execve
-	 * operation.
-	 */
-	if (bprm->called_set_creds)
-		return 0;
 	/*
 	 * Load policy if /sbin/tomoyo-init exists and /sbin/init is requested
 	 * for the first time.
-- 
2.25.0


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH 3/5] exec: Remove recursion from search_binary_handler
  2020-05-09 19:40   ` [PATCH 0/5] exec: Control flow simplifications Eric W. Biederman
  2020-05-09 19:40     ` [PATCH 1/5] exec: Call cap_bprm_set_creds directly from prepare_binprm Eric W. Biederman
  2020-05-09 19:41     ` [PATCH 2/5] exec: Directly call security_bprm_set_creds from __do_execve_file Eric W. Biederman
@ 2020-05-09 19:41     ` Eric W. Biederman
  2020-05-09 20:16       ` Linus Torvalds
  2020-05-10  4:22       ` Tetsuo Handa
  2020-05-09 19:42     ` [PATCH 4/5] exec: Allow load_misc_binary to call prepare_binfmt unconditionally Eric W. Biederman
                       ` (2 subsequent siblings)
  5 siblings, 2 replies; 122+ messages in thread
From: Eric W. Biederman @ 2020-05-09 19:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Oleg Nesterov, Jann Horn, Kees Cook,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton, Casey Schaufler,
	linux-security-module, James Morris, Serge E. Hallyn,
	Andy Lutomirski


Instead of recursing in search_binary_handler have the methods that
would recurse return a positive value, and simply loop in exec_binprm.

This is a trivial change as all of the methods that would recurse do
so as effectively the last thing they do.   Making this a trivial code
change.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/alpha/kernel/binfmt_loader.c |  2 +-
 fs/binfmt_em86.c                  |  2 +-
 fs/binfmt_misc.c                  |  5 +----
 fs/binfmt_script.c                |  2 +-
 fs/exec.c                         | 20 +++++++++-----------
 include/linux/binfmts.h           |  2 --
 6 files changed, 13 insertions(+), 20 deletions(-)

diff --git a/arch/alpha/kernel/binfmt_loader.c b/arch/alpha/kernel/binfmt_loader.c
index a8d0d6e06526..a90c8b1d5498 100644
--- a/arch/alpha/kernel/binfmt_loader.c
+++ b/arch/alpha/kernel/binfmt_loader.c
@@ -38,7 +38,7 @@ static int load_binary(struct linux_binprm *bprm)
 	retval = prepare_binprm(bprm);
 	if (retval < 0)
 		return retval;
-	return search_binary_handler(bprm);
+	return 1; /* Search for the interpreter */
 }
 
 static struct linux_binfmt loader_format = {
diff --git a/fs/binfmt_em86.c b/fs/binfmt_em86.c
index 466497860c62..a9b9ac7f9bb0 100644
--- a/fs/binfmt_em86.c
+++ b/fs/binfmt_em86.c
@@ -95,7 +95,7 @@ static int load_em86(struct linux_binprm *bprm)
 	if (retval < 0)
 		return retval;
 
-	return search_binary_handler(bprm);
+	return 1; /* Search for the interpreter */
 }
 
 static struct linux_binfmt em86_format = {
diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c
index cdb45829354d..127fae9c21ab 100644
--- a/fs/binfmt_misc.c
+++ b/fs/binfmt_misc.c
@@ -234,10 +234,7 @@ static int load_misc_binary(struct linux_binprm *bprm)
 	if (retval < 0)
 		goto error;
 
-	retval = search_binary_handler(bprm);
-	if (retval < 0)
-		goto error;
-
+	retval = 1; /* Search for the interpreter */
 ret:
 	dput(fmt->dentry);
 	return retval;
diff --git a/fs/binfmt_script.c b/fs/binfmt_script.c
index e9e6a6f4a35f..76a05696d376 100644
--- a/fs/binfmt_script.c
+++ b/fs/binfmt_script.c
@@ -146,7 +146,7 @@ static int load_script(struct linux_binprm *bprm)
 	retval = prepare_binprm(bprm);
 	if (retval < 0)
 		return retval;
-	return search_binary_handler(bprm);
+	return 1; /* Search for the interpreter */
 }
 
 static struct linux_binfmt script_format = {
diff --git a/fs/exec.c b/fs/exec.c
index 635b5085050c..8bbf5fa785a6 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1690,16 +1690,12 @@ EXPORT_SYMBOL(remove_arg_zero);
 /*
  * cycle the list of binary formats handler, until one recognizes the image
  */
-int search_binary_handler(struct linux_binprm *bprm)
+static int search_binary_handler(struct linux_binprm *bprm)
 {
 	bool need_retry = IS_ENABLED(CONFIG_MODULES);
 	struct linux_binfmt *fmt;
 	int retval;
 
-	/* This allows 4 levels of binfmt rewrites before failing hard. */
-	if (bprm->recursion_depth > 5)
-		return -ELOOP;
-
 	retval = security_bprm_check(bprm);
 	if (retval)
 		return retval;
@@ -1712,10 +1708,7 @@ int search_binary_handler(struct linux_binprm *bprm)
 			continue;
 		read_unlock(&binfmt_lock);
 
-		bprm->recursion_depth++;
 		retval = fmt->load_binary(bprm);
-		bprm->recursion_depth--;
-
 		read_lock(&binfmt_lock);
 		put_binfmt(fmt);
 		if (bprm->point_of_no_return || !bprm->file ||
@@ -1738,12 +1731,11 @@ int search_binary_handler(struct linux_binprm *bprm)
 
 	return retval;
 }
-EXPORT_SYMBOL(search_binary_handler);
 
 static int exec_binprm(struct linux_binprm *bprm)
 {
 	pid_t old_pid, old_vpid;
-	int ret;
+	int ret, depth = 0;
 
 	/* Need to fetch pid before load_binary changes it */
 	old_pid = current->pid;
@@ -1751,7 +1743,13 @@ static int exec_binprm(struct linux_binprm *bprm)
 	old_vpid = task_pid_nr_ns(current, task_active_pid_ns(current->parent));
 	rcu_read_unlock();
 
-	ret = search_binary_handler(bprm);
+	do {
+		depth++;
+		ret = search_binary_handler(bprm);
+		/* This allows 4 levels of binfmt rewrites before failing hard. */
+		if ((ret > 0) && (depth > 5))
+			ret = -ELOOP;
+	} while (ret > 0);
 	if (ret >= 0) {
 		audit_bprm(bprm);
 		trace_sched_process_exec(current, old_pid, bprm);
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index 42f760acfc2c..89f1135dcb75 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -47,7 +47,6 @@ struct linux_binprm {
 #ifdef __alpha__
 	unsigned int taso:1;
 #endif
-	unsigned int recursion_depth; /* only for search_binary_handler() */
 	struct file * file;
 	struct cred *cred;	/* new credentials */
 	int unsafe;		/* how unsafe this exec is (mask of LSM_UNSAFE_*) */
@@ -118,7 +117,6 @@ extern void unregister_binfmt(struct linux_binfmt *);
 
 extern int prepare_binprm(struct linux_binprm *);
 extern int __must_check remove_arg_zero(struct linux_binprm *);
-extern int search_binary_handler(struct linux_binprm *);
 extern int begin_new_exec(struct linux_binprm * bprm);
 extern void setup_new_exec(struct linux_binprm * bprm);
 extern void finalize_exec(struct linux_binprm *bprm);
-- 
2.25.0


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH 4/5] exec: Allow load_misc_binary to call prepare_binfmt unconditionally
  2020-05-09 19:40   ` [PATCH 0/5] exec: Control flow simplifications Eric W. Biederman
                       ` (2 preceding siblings ...)
  2020-05-09 19:41     ` [PATCH 3/5] exec: Remove recursion from search_binary_handler Eric W. Biederman
@ 2020-05-09 19:42     ` Eric W. Biederman
  2020-05-11 22:09       ` Kees Cook
  2020-05-09 19:42     ` [PATCH 5/5] exec: Move the call of prepare_binprm into search_binary_handler Eric W. Biederman
  2020-05-19  0:29     ` [PATCH v2 0/8] exec: Control flow simplifications Eric W. Biederman
  5 siblings, 1 reply; 122+ messages in thread
From: Eric W. Biederman @ 2020-05-09 19:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Oleg Nesterov, Jann Horn, Kees Cook,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton, Casey Schaufler,
	linux-security-module, James Morris, Serge E. Hallyn,
	Andy Lutomirski


Add a flag preserve_creds that binfmt_misc can set to prevent
credentials from being updated.  This allows binfmrt_misc to always
call prepare_binfmt.  Allowing the credential computation logic to be
consolidated.

Ref: c407c033de84 ("[PATCH] binfmt_misc: improve calculation of interpreter's credentials")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/binfmt_misc.c        | 15 +++------------
 fs/exec.c               | 14 +++++++++-----
 include/linux/binfmts.h |  2 ++
 3 files changed, 14 insertions(+), 17 deletions(-)

diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c
index 127fae9c21ab..16bfafd2671d 100644
--- a/fs/binfmt_misc.c
+++ b/fs/binfmt_misc.c
@@ -218,19 +218,10 @@ static int load_misc_binary(struct linux_binprm *bprm)
 		goto error;
 
 	bprm->file = interp_file;
-	if (fmt->flags & MISC_FMT_CREDENTIALS) {
-		loff_t pos = 0;
-
-		/*
-		 * No need to call prepare_binprm(), it's already been
-		 * done.  bprm->buf is stale, update from interp_file.
-		 */
-		memset(bprm->buf, 0, BINPRM_BUF_SIZE);
-		retval = kernel_read(bprm->file, bprm->buf, BINPRM_BUF_SIZE,
-				&pos);
-	} else
-		retval = prepare_binprm(bprm);
+	if (fmt->flags & MISC_FMT_CREDENTIALS)
+		bprm->preserve_creds = 1;
 
+	retval = prepare_binprm(bprm);
 	if (retval < 0)
 		goto error;
 
diff --git a/fs/exec.c b/fs/exec.c
index 8bbf5fa785a6..01dbeb025c46 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1630,14 +1630,18 @@ static void bprm_fill_uid(struct linux_binprm *bprm)
  */
 int prepare_binprm(struct linux_binprm *bprm)
 {
-	int retval;
 	loff_t pos = 0;
 
-	bprm_fill_uid(bprm);
+	if (!bprm->preserve_creds) {
+		int retval;
 
-	retval = cap_bprm_set_creds(bprm);
-	if (retval)
-		return retval;
+		bprm_fill_uid(bprm);
+
+		retval = cap_bprm_set_creds(bprm);
+		if (retval)
+			return retval;
+	}
+	bprm->preserve_creds = 0;
 
 	memset(bprm->buf, 0, BINPRM_BUF_SIZE);
 	return kernel_read(bprm->file, bprm->buf, BINPRM_BUF_SIZE, &pos);
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index 89f1135dcb75..cb016f001e7a 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -26,6 +26,8 @@ struct linux_binprm {
 	unsigned long p; /* current top of mem */
 	unsigned long argmin; /* rlimit marker for copy_strings() */
 	unsigned int
+		/* Don't update the creds for an interpreter (see binfmt_misc) */
+		preserve_creds:1,
 		/*
 		 * True if most recent call to the commoncaps bprm_set_creds
 		 * hook (due to multiple prepare_binprm() calls from the
-- 
2.25.0


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH 5/5] exec: Move the call of prepare_binprm into search_binary_handler
  2020-05-09 19:40   ` [PATCH 0/5] exec: Control flow simplifications Eric W. Biederman
                       ` (3 preceding siblings ...)
  2020-05-09 19:42     ` [PATCH 4/5] exec: Allow load_misc_binary to call prepare_binfmt unconditionally Eric W. Biederman
@ 2020-05-09 19:42     ` Eric W. Biederman
  2020-05-11 22:24       ` Kees Cook
  2020-05-19  0:29     ` [PATCH v2 0/8] exec: Control flow simplifications Eric W. Biederman
  5 siblings, 1 reply; 122+ messages in thread
From: Eric W. Biederman @ 2020-05-09 19:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Oleg Nesterov, Jann Horn, Kees Cook,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton, Casey Schaufler,
	linux-security-module, James Morris, Serge E. Hallyn,
	Andy Lutomirski


The code in prepare_binary_handler needs to be run every time
search_binary_handler is called so move the call into search_binary_handler
itself to make the code simpler and easier to understand.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/alpha/kernel/binfmt_loader.c |  3 ---
 fs/binfmt_em86.c                  |  5 -----
 fs/binfmt_misc.c                  |  4 ----
 fs/binfmt_script.c                |  3 ---
 fs/exec.c                         | 12 +++++-------
 include/linux/binfmts.h           |  1 -
 6 files changed, 5 insertions(+), 23 deletions(-)

diff --git a/arch/alpha/kernel/binfmt_loader.c b/arch/alpha/kernel/binfmt_loader.c
index a90c8b1d5498..ec7b26e4b81a 100644
--- a/arch/alpha/kernel/binfmt_loader.c
+++ b/arch/alpha/kernel/binfmt_loader.c
@@ -35,9 +35,6 @@ static int load_binary(struct linux_binprm *bprm)
 
 	bprm->file = file;
 	bprm->loader = loader;
-	retval = prepare_binprm(bprm);
-	if (retval < 0)
-		return retval;
 	return 1; /* Search for the interpreter */
 }
 
diff --git a/fs/binfmt_em86.c b/fs/binfmt_em86.c
index a9b9ac7f9bb0..2726bfb832b2 100644
--- a/fs/binfmt_em86.c
+++ b/fs/binfmt_em86.c
@@ -90,11 +90,6 @@ static int load_em86(struct linux_binprm *bprm)
 		return PTR_ERR(file);
 
 	bprm->file = file;
-
-	retval = prepare_binprm(bprm);
-	if (retval < 0)
-		return retval;
-
 	return 1; /* Search for the interpreter */
 }
 
diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c
index 16bfafd2671d..6b5e67eed65e 100644
--- a/fs/binfmt_misc.c
+++ b/fs/binfmt_misc.c
@@ -221,10 +221,6 @@ static int load_misc_binary(struct linux_binprm *bprm)
 	if (fmt->flags & MISC_FMT_CREDENTIALS)
 		bprm->preserve_creds = 1;
 
-	retval = prepare_binprm(bprm);
-	if (retval < 0)
-		goto error;
-
 	retval = 1; /* Search for the interpreter */
 ret:
 	dput(fmt->dentry);
diff --git a/fs/binfmt_script.c b/fs/binfmt_script.c
index 76a05696d376..ed4607c7095e 100644
--- a/fs/binfmt_script.c
+++ b/fs/binfmt_script.c
@@ -143,9 +143,6 @@ static int load_script(struct linux_binprm *bprm)
 		return PTR_ERR(file);
 
 	bprm->file = file;
-	retval = prepare_binprm(bprm);
-	if (retval < 0)
-		return retval;
 	return 1; /* Search for the interpreter */
 }
 
diff --git a/fs/exec.c b/fs/exec.c
index 01dbeb025c46..206f18120073 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1628,7 +1628,7 @@ static void bprm_fill_uid(struct linux_binprm *bprm)
  *
  * This may be called multiple times for binary chains (scripts for example).
  */
-int prepare_binprm(struct linux_binprm *bprm)
+static int prepare_binprm(struct linux_binprm *bprm)
 {
 	loff_t pos = 0;
 
@@ -1647,8 +1647,6 @@ int prepare_binprm(struct linux_binprm *bprm)
 	return kernel_read(bprm->file, bprm->buf, BINPRM_BUF_SIZE, &pos);
 }
 
-EXPORT_SYMBOL(prepare_binprm);
-
 /*
  * Arguments are '\0' separated strings found at the location bprm->p
  * points to; chop off the first by relocating brpm->p to right after
@@ -1700,6 +1698,10 @@ static int search_binary_handler(struct linux_binprm *bprm)
 	struct linux_binfmt *fmt;
 	int retval;
 
+	retval = prepare_binprm(bprm);
+	if (retval < 0)
+		return retval;
+
 	retval = security_bprm_check(bprm);
 	if (retval)
 		return retval;
@@ -1859,10 +1861,6 @@ static int __do_execve_file(int fd, struct filename *filename,
 	if (retval)
 		goto out;
 
-	retval = prepare_binprm(bprm);
-	if (retval < 0)
-		goto out;
-
 	retval = copy_strings_kernel(1, &bprm->filename, bprm);
 	if (retval < 0)
 		goto out;
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index cb016f001e7a..0748afca40cb 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -117,7 +117,6 @@ static inline void insert_binfmt(struct linux_binfmt *fmt)
 
 extern void unregister_binfmt(struct linux_binfmt *);
 
-extern int prepare_binprm(struct linux_binprm *);
 extern int __must_check remove_arg_zero(struct linux_binprm *);
 extern int begin_new_exec(struct linux_binprm * bprm);
 extern void setup_new_exec(struct linux_binprm * bprm);
-- 
2.25.0


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH 3/6] exec: Stop open coding mutex_lock_killable of cred_guard_mutex
  2020-05-09 19:18     ` Linus Torvalds
@ 2020-05-09 19:57       ` Eric W. Biederman
  2020-05-10 20:33       ` Kees Cook
  1 sibling, 0 replies; 122+ messages in thread
From: Eric W. Biederman @ 2020-05-09 19:57 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, Oleg Nesterov, Jann Horn, Kees Cook,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Fri, May 8, 2020 at 11:48 AM Eric W. Biederman <ebiederm@xmission.com> wrote:
>>
>>
>> Oleg modified the code that did
>> "mutex_lock_interruptible(&current->cred_guard_mutex)" to return
>> -ERESTARTNOINTR instead of -EINTR, so that userspace will never see a
>> failure to grab the mutex.
>>
>> Slightly earlier Liam R. Howlett defined mutex_lock_killable for
>> exactly the same situation but it does it a little more cleanly.
>
> What what what?
>
> None of this makes sense. Your commit message is completely wrong, and
> the patch is utter shite.
>
> mutex_lock_interruptible() and mutex_lock_killable() are completely
> different operations, and the difference has absolutely nothing to do
> with  -ERESTARTNOINTR or -EINTR.
>
> mutex_lock_interruptible() is interrupted by any signal.
>
> mutex_lock_killable() is - surprise surprise - only interrupted by
> SIGKILL (in theory any fatal signal, but we never actually implemented
> that logic, so it's only interruptible by the known-to-always-be-fatal
> SIGKILL).
>
>> Switch the code to mutex_lock_killable so that it is clearer what the
>> code is doing.
>
> This nonsensical patch makes me worry about all your other patches.
> The explanation is wrong, the patch is wrong, and it changes things to
> be fundamentally broken.
>
> Before this, ^C would break out of a blocked execve()/ptrace()
> situation. After this patch, you need special tools to do so.
>
> This patch is completely wrong.

Sigh.  Brain fart on my part. You are correct.

I saw the restart, and totally forgot that it allows the handling of a
signal before restarting the system call.

Except for the handling of the signal in userspace it is the same as
mutex_lock_killable but that is a big big big if.

My apologies.  I will drop this patch.

Eric

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH 1/5] exec: Call cap_bprm_set_creds directly from prepare_binprm
  2020-05-09 19:40     ` [PATCH 1/5] exec: Call cap_bprm_set_creds directly from prepare_binprm Eric W. Biederman
@ 2020-05-09 20:04       ` Linus Torvalds
  0 siblings, 0 replies; 122+ messages in thread
From: Linus Torvalds @ 2020-05-09 20:04 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Kernel Mailing List, Oleg Nesterov, Jann Horn, Kees Cook,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton, Casey Schaufler,
	LSM List, James Morris, Serge E. Hallyn, Andy Lutomirski

On Sat, May 9, 2020 at 12:44 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> The function cap_bprm_set_creds is the only instance of
> security_bprm_set_creds that does something for the primary executable
> file and for every interpreter the rest of the implementations of
> security_bprm_set_creds do something only for the primary executable
> file even if that file is a shell script.

Eric, can you please re-write that sentence as something that can be
parsed and understood?

I'm pretty sure that what you are talking about is the whole
"called_set_creds" flag logic, where the logic is that some security
layers only react to the first one, while the capability checks are
done for every one.

But there is no way to realize that from your description above. In
fact, the description above is actively incorrect and misleading,
since you say that "cap_bprm_set_creds is the only instance [..] that
does something for the primary executable"

I think that you mean to say that it does something for *every*
instance of the executable, not just the primary one.

> The function cap_bprm_set_creds is also special in that it is called
> even when CONFIG_SECURITY is unset.
>
> So calling cap_bprm_set_creds separately to make these two cases explicit,
> and allow future changes to take advantages of these differences
> to simplify the code.

I think you need to rename "security_bprm_set_creds()" too, to show
what it does. Since it clearly no longer does that "bprm_set_creds()"
from the common capabilities.

In fact, I think it would probably be good to change the patch too, so
that it is actually understandable what the heck the logic is.

Instead of

        retval = security_bprm_set_creds(bprm);
        if (retval)
                return retval;
        bprm->called_set_creds = 1;
        retval = cap_bprm_set_creds(bprm);
        if (retval)
                return retval;

which makes no sense at all when you read it, do this:

        /* Every instance of the executable gets called for capabilities */
        retval = cap_bprm_set_creds(bprm);
        if (retval)
                return retval;

        /* Other security layers only want the primary executable */
        if (!bprm->called_set_creds) {
                retval = security_primary_bprm_set_creds(bprm);
                if (retval)
                         return retval;
                bprm->called_set_creds = 1;
        }

which now actually describes what is going on.

Then remove the 'called_set_creds' logic from the security layers, and
rename those 'xyz_bprm_set_creds()' to be
'xyz_primary_bprm_set_creds()'.

After that, and with a proper commit message that actually explains
this _properly_, this looks like a cleanup.

Because right now that patch description makes zero sense at all, and
the patch itself results in this insane situation where
"security_bprm_set_creds()" expressly doesn't call the basic
"cap_bprm_set_creds()" at all, which just makes things very very
confusing and the naming actively misleading.

               Linus

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH 2/5] exec: Directly call security_bprm_set_creds from __do_execve_file
  2020-05-09 19:41     ` [PATCH 2/5] exec: Directly call security_bprm_set_creds from __do_execve_file Eric W. Biederman
@ 2020-05-09 20:07       ` Linus Torvalds
  2020-05-09 20:12         ` Eric W. Biederman
  2020-05-11  3:15       ` Kees Cook
  1 sibling, 1 reply; 122+ messages in thread
From: Linus Torvalds @ 2020-05-09 20:07 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Kernel Mailing List, Oleg Nesterov, Jann Horn, Kees Cook,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton, Casey Schaufler,
	LSM List, James Morris, Serge E. Hallyn, Andy Lutomirski

On Sat, May 9, 2020 at 12:44 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> Now that security_bprm_set_creds is no longer responsible for calling
> cap_bprm_set_creds, security_bprm_set_creds only does something for
> the primary file that is being executed (not any interpreters it may
> have).  Therefore call security_bprm_set_creds from __do_execve_file,
> instead of from prepare_binprm so that it is only called once, and
> remove the now unnecessary called_set_creds field of struct binprm.

Ahh, good, this patch removes the 'called_set_creds' logic from the
security subsystems.

So it does half of what I asked for: please also just rename that
"security_bprm_set_creds()" to be "security_primary_bprm_set_creds()"
so that the change of semantics also shows up that way.

And so that there is no confusion about the fact that
"cap_bprm_set_creds()" has absolutely nothing to do with
"security_bprm_set_creds()" any more.

             Linus

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH 2/5] exec: Directly call security_bprm_set_creds from __do_execve_file
  2020-05-09 20:07       ` Linus Torvalds
@ 2020-05-09 20:12         ` Eric W. Biederman
  2020-05-09 20:19           ` Linus Torvalds
  0 siblings, 1 reply; 122+ messages in thread
From: Eric W. Biederman @ 2020-05-09 20:12 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, Oleg Nesterov, Jann Horn, Kees Cook,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton, Casey Schaufler,
	LSM List, James Morris, Serge E. Hallyn, Andy Lutomirski

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Sat, May 9, 2020 at 12:44 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>>
>> Now that security_bprm_set_creds is no longer responsible for calling
>> cap_bprm_set_creds, security_bprm_set_creds only does something for
>> the primary file that is being executed (not any interpreters it may
>> have).  Therefore call security_bprm_set_creds from __do_execve_file,
>> instead of from prepare_binprm so that it is only called once, and
>> remove the now unnecessary called_set_creds field of struct binprm.
>
> Ahh, good, this patch removes the 'called_set_creds' logic from the
> security subsystems.
>
> So it does half of what I asked for: please also just rename that
> "security_bprm_set_creds()" to be "security_primary_bprm_set_creds()"
> so that the change of semantics also shows up that way.
>
> And so that there is no confusion about the fact that
> "cap_bprm_set_creds()" has absolutely nothing to do with
> "security_bprm_set_creds()" any more.

I agree something needs to be renamed, to remove confusion.

I am off for a nap now, and tomorrow is Mother's day so I probably won't
be back to this seriously until Monday.  But please disect these patches
and I will address any problems.

Eric



^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH 3/5] exec: Remove recursion from search_binary_handler
  2020-05-09 19:41     ` [PATCH 3/5] exec: Remove recursion from search_binary_handler Eric W. Biederman
@ 2020-05-09 20:16       ` Linus Torvalds
  2020-05-10  4:22       ` Tetsuo Handa
  1 sibling, 0 replies; 122+ messages in thread
From: Linus Torvalds @ 2020-05-09 20:16 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Kernel Mailing List, Oleg Nesterov, Jann Horn, Kees Cook,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton, Casey Schaufler,
	LSM List, James Morris, Serge E. Hallyn, Andy Lutomirski

On Sat, May 9, 2020 at 12:45 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> Instead of recursing in search_binary_handler have the methods that
> would recurse return a positive value, and simply loop in exec_binprm.
>
> This is a trivial change as all of the methods that would recurse do
> so as effectively the last thing they do.   Making this a trivial code
> change.

Looks good.

I'd suggest doing that loop slightly differently:

> -       ret = search_binary_handler(bprm);
> +       do {
> +               depth++;
> +               ret = search_binary_handler(bprm);
> +               /* This allows 4 levels of binfmt rewrites before failing hard. */
> +               if ((ret > 0) && (depth > 5))
> +                       ret = -ELOOP;
> +       } while (ret > 0);
>          if (ret >= 0) {

That's really an odd way to write this.

So honestly, if "ret < 0", then we can just return directly.

So I think would make much more sense to do this loop something like

        for (depth = 0; depth < 5; depth++) {
                int ret;

                ret = search_binary_handler(bprm);
                if (ret < 0)
                        return ret;

                /* Continue searching for the next binary handler? */
                if (ret > 0)
                        continue;

                /* Success! */
                audit_bprm(bprm);
                trace_sched_process_exec(current, old_pid, bprm);
                ptrace_event(PTRACE_EVENT_EXEC, old_vpid);
                proc_exec_connector(current);
                return 0;
        }
        return -ELOOP;

(if I read the logic of exec_binprm() right - I might have missed something).

                Linus

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH 2/5] exec: Directly call security_bprm_set_creds from __do_execve_file
  2020-05-09 20:12         ` Eric W. Biederman
@ 2020-05-09 20:19           ` Linus Torvalds
  0 siblings, 0 replies; 122+ messages in thread
From: Linus Torvalds @ 2020-05-09 20:19 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Kernel Mailing List, Oleg Nesterov, Jann Horn, Kees Cook,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton, Casey Schaufler,
	LSM List, James Morris, Serge E. Hallyn, Andy Lutomirski

On Sat, May 9, 2020 at 1:15 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> I agree something needs to be renamed, to remove confusion.

Yeah, the alternative is to rename the capability version. I don't
care much which way it goes, although I do think it's best to call out
explicitly that the security hook functions get only the "primary"
executable brpm info.

Which is why I'd prefer to just rename all those low-level security
cases. It makes for a slightly bigger patch, but I think it makes for
better readability, and makes it explicit that that hook is literally
just for the primary executable, not for the interpreter or whatever.

               Linus

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH 3/5] exec: Remove recursion from search_binary_handler
  2020-05-09 19:41     ` [PATCH 3/5] exec: Remove recursion from search_binary_handler Eric W. Biederman
  2020-05-09 20:16       ` Linus Torvalds
@ 2020-05-10  4:22       ` Tetsuo Handa
  2020-05-10 19:38         ` Linus Torvalds
  1 sibling, 1 reply; 122+ messages in thread
From: Tetsuo Handa @ 2020-05-10  4:22 UTC (permalink / raw)
  To: Eric W. Biederman, linux-kernel
  Cc: Linus Torvalds, Oleg Nesterov, Jann Horn, Kees Cook,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton, Casey Schaufler,
	linux-security-module, James Morris, Serge E. Hallyn,
	Andy Lutomirski

On 2020/05/10 4:41, Eric W. Biederman wrote:
> --- a/fs/binfmt_misc.c
> +++ b/fs/binfmt_misc.c
> @@ -234,10 +234,7 @@ static int load_misc_binary(struct linux_binprm *bprm)
>  	if (retval < 0)
>  		goto error;
>  
> -	retval = search_binary_handler(bprm);
> -	if (retval < 0)
> -		goto error;
> -
> +	retval = 1; /* Search for the interpreter */
>  ret:
>  	dput(fmt->dentry);
>  	return retval;

Wouldn't this change cause

	if (fd_binary > 0)
		ksys_close(fd_binary);
	bprm->interp_flags = 0;
	bprm->interp_data = 0;

not to be called when "Search for the interpreter" failed?


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH 3/5] exec: Remove recursion from search_binary_handler
  2020-05-10  4:22       ` Tetsuo Handa
@ 2020-05-10 19:38         ` Linus Torvalds
  2020-05-11 14:33           ` Eric W. Biederman
  0 siblings, 1 reply; 122+ messages in thread
From: Linus Torvalds @ 2020-05-10 19:38 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Eric W. Biederman, Linux Kernel Mailing List, Oleg Nesterov,
	Jann Horn, Kees Cook, Greg Ungerer, Rob Landley, Bernd Edlinger,
	linux-fsdevel, Al Viro, Alexey Dobriyan, Andrew Morton,
	Casey Schaufler, LSM List, James Morris, Serge E. Hallyn,
	Andy Lutomirski

On Sat, May 9, 2020 at 9:30 PM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> Wouldn't this change cause
>
>         if (fd_binary > 0)
>                 ksys_close(fd_binary);
>         bprm->interp_flags = 0;
>         bprm->interp_data = 0;
>
> not to be called when "Search for the interpreter" failed?

Good catch. We seem to have some subtle magic wrt the fd_binary file
descriptor, which depends on the recursive behavior.

I'm not seeing how to fix it cleanly with the "turn it into a loop".
Basically, that binfmt_misc use-case isn't really a tail-call.

Eric, ideas?

                 Linus

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH 3/6] exec: Stop open coding mutex_lock_killable of cred_guard_mutex
  2020-05-09 19:18     ` Linus Torvalds
  2020-05-09 19:57       ` Eric W. Biederman
@ 2020-05-10 20:33       ` Kees Cook
  1 sibling, 0 replies; 122+ messages in thread
From: Kees Cook @ 2020-05-10 20:33 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric W. Biederman, Linux Kernel Mailing List, Oleg Nesterov,
	Jann Horn, Greg Ungerer, Rob Landley, Bernd Edlinger,
	linux-fsdevel, Al Viro, Alexey Dobriyan, Andrew Morton

On Sat, May 09, 2020 at 12:18:06PM -0700, Linus Torvalds wrote:
> On Fri, May 8, 2020 at 11:48 AM Eric W. Biederman <ebiederm@xmission.com> wrote:
> >
> >
> > Oleg modified the code that did
> > "mutex_lock_interruptible(&current->cred_guard_mutex)" to return
> > -ERESTARTNOINTR instead of -EINTR, so that userspace will never see a
> > failure to grab the mutex.
> >
> > Slightly earlier Liam R. Howlett defined mutex_lock_killable for
> > exactly the same situation but it does it a little more cleanly.
> 
> mutex_lock_interruptible() and mutex_lock_killable() are completely
> different operations, and the difference has absolutely nothing to do
> with  -ERESTARTNOINTR or -EINTR.
>
> [...]
> 
> And Kees, what the heck is that "Reviewed-by" for? Worthless review too.

Yeah, I messed that up; apologies. And I know exactly where my brain
misfired on this one. On a related note, I must stop doing code reviews
on Friday night. :)

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH 2/5] exec: Directly call security_bprm_set_creds from __do_execve_file
  2020-05-09 19:41     ` [PATCH 2/5] exec: Directly call security_bprm_set_creds from __do_execve_file Eric W. Biederman
  2020-05-09 20:07       ` Linus Torvalds
@ 2020-05-11  3:15       ` Kees Cook
  2020-05-11 16:52         ` Eric W. Biederman
  1 sibling, 1 reply; 122+ messages in thread
From: Kees Cook @ 2020-05-11  3:15 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, Jann Horn,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton, Casey Schaufler,
	linux-security-module, James Morris, Serge E. Hallyn,
	Andy Lutomirski

On Sat, May 09, 2020 at 02:41:17PM -0500, Eric W. Biederman wrote:
> 
> Now that security_bprm_set_creds is no longer responsible for calling
> cap_bprm_set_creds, security_bprm_set_creds only does something for
> the primary file that is being executed (not any interpreters it may
> have).  Therefore call security_bprm_set_creds from __do_execve_file,
> instead of from prepare_binprm so that it is only called once, and
> remove the now unnecessary called_set_creds field of struct binprm.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  fs/exec.c                  | 11 +++++------
>  include/linux/binfmts.h    |  6 ------
>  security/apparmor/domain.c |  3 ---
>  security/selinux/hooks.c   |  2 --
>  security/smack/smack_lsm.c |  3 ---
>  security/tomoyo/tomoyo.c   |  6 ------
>  6 files changed, 5 insertions(+), 26 deletions(-)
> 
> diff --git a/fs/exec.c b/fs/exec.c
> index 765bfd51a546..635b5085050c 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1635,12 +1635,6 @@ int prepare_binprm(struct linux_binprm *bprm)
>  
>  	bprm_fill_uid(bprm);
>  
> -	/* fill in binprm security blob */
> -	retval = security_bprm_set_creds(bprm);
> -	if (retval)
> -		return retval;
> -	bprm->called_set_creds = 1;
> -
>  	retval = cap_bprm_set_creds(bprm);
>  	if (retval)
>  		return retval;
> @@ -1858,6 +1852,11 @@ static int __do_execve_file(int fd, struct filename *filename,
>  	if (retval < 0)
>  		goto out;
>  
> +	/* fill in binprm security blob */
> +	retval = security_bprm_set_creds(bprm);
> +	if (retval)
> +		goto out;
> +
>  	retval = prepare_binprm(bprm);
>  	if (retval < 0)
>  		goto out;
> 

Here I go with a Sunday night review, so hopefully I'm thinking better
than Friday night's review, but I *think* this patch is broken from
the LSM sense of the world in that security_bprm_set_creds() is getting
called _before_ the creds actually get fully set (in prepare_binprm()
by the calls to bprm_fill_uid(), cap_bprm_set_creds(), and
check_unsafe_exec()).

As a specific example, see the setting of LSM_UNSAFE_NO_NEW_PRIVS in
bprm->unsafe during check_unsafe_exec(), which must happen after
bprm_fill_uid(bprm) and cap_bprm_set_creds(bprm), to have a "true" view
of the execution privileges. Apparmor checks for this flag in its
security_bprm_set_creds() hook. Similarly do selinux, smack, etc...

The security_bprm_set_creds() boundary for LSM is to see the "final"
state of the process privileges, and that needs to happen after
bprm_fill_uid(), cap_bprm_set_creds(), and check_unsafe_exec() have all
finished.

So, as it stands, I don't think this will work, but perhaps it can still
be rearranged to avoid the called_set_creds silliness. I'll look more
this week...

-Kees

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH 3/5] exec: Remove recursion from search_binary_handler
  2020-05-10 19:38         ` Linus Torvalds
@ 2020-05-11 14:33           ` Eric W. Biederman
  2020-05-11 19:10             ` Rob Landley
  2020-05-11 21:55             ` Kees Cook
  0 siblings, 2 replies; 122+ messages in thread
From: Eric W. Biederman @ 2020-05-11 14:33 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Tetsuo Handa, Linux Kernel Mailing List, Oleg Nesterov,
	Jann Horn, Kees Cook, Greg Ungerer, Rob Landley, Bernd Edlinger,
	linux-fsdevel, Al Viro, Alexey Dobriyan, Andrew Morton,
	Casey Schaufler, LSM List, James Morris, Serge E. Hallyn,
	Andy Lutomirski

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Sat, May 9, 2020 at 9:30 PM Tetsuo Handa
> <penguin-kernel@i-love.sakura.ne.jp> wrote:
>>
>> Wouldn't this change cause
>>
>>         if (fd_binary > 0)
>>                 ksys_close(fd_binary);
>>         bprm->interp_flags = 0;
>>         bprm->interp_data = 0;
>>
>> not to be called when "Search for the interpreter" failed?
>
> Good catch. We seem to have some subtle magic wrt the fd_binary file
> descriptor, which depends on the recursive behavior.

Yes.  I Tetsuo I really appreciate you noticing this.  This is exactly
the kind of behavior I am trying to flush out and keep from being
hidden.

> I'm not seeing how to fix it cleanly with the "turn it into a loop".
> Basically, that binfmt_misc use-case isn't really a tail-call.

I have reservations about installing a new file descriptor before
we process the close on exec logic and the related security modules
closing file descriptors that your new credentials no longer give
you access to logic.

I haven't yet figured out how opening a file descriptor during exec
should fit into all of that.



What I do see is that interp_data is just a parameter that is smuggled
into the call of search binary handler.  And the next binary handler
needs to be binfmt_elf for it to make much sense, as only binfmt_elf
(and binfmt_elf_fdpic) deals with BINPRM_FLAGS_EXECFD.


So I think what needs to happen is to rename bprm->interp_data to
bprm->execfd, remove BINPRM_FLAGS_EXECFD and make closing that file
descriptor free_bprm's responsiblity.

I hope such a change will make it easier to see all of the pieces that
are intereacting during exec.

I am still asking: is the installation of that file descriptor useful if
it is not exported passed to userspace as an AT_EXECFD note?


I will dig in and see what I can come up with.

Eric



^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH 2/5] exec: Directly call security_bprm_set_creds from __do_execve_file
  2020-05-11  3:15       ` Kees Cook
@ 2020-05-11 16:52         ` Eric W. Biederman
  2020-05-11 21:18           ` Kees Cook
  0 siblings, 1 reply; 122+ messages in thread
From: Eric W. Biederman @ 2020-05-11 16:52 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, Jann Horn,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton, Casey Schaufler,
	linux-security-module, James Morris, Serge E. Hallyn,
	Andy Lutomirski

Kees Cook <keescook@chromium.org> writes:

> On Sat, May 09, 2020 at 02:41:17PM -0500, Eric W. Biederman wrote:
>> 
>> Now that security_bprm_set_creds is no longer responsible for calling
>> cap_bprm_set_creds, security_bprm_set_creds only does something for
>> the primary file that is being executed (not any interpreters it may
>> have).  Therefore call security_bprm_set_creds from __do_execve_file,
>> instead of from prepare_binprm so that it is only called once, and
>> remove the now unnecessary called_set_creds field of struct binprm.
>> 
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>> ---
>>  fs/exec.c                  | 11 +++++------
>>  include/linux/binfmts.h    |  6 ------
>>  security/apparmor/domain.c |  3 ---
>>  security/selinux/hooks.c   |  2 --
>>  security/smack/smack_lsm.c |  3 ---
>>  security/tomoyo/tomoyo.c   |  6 ------
>>  6 files changed, 5 insertions(+), 26 deletions(-)
>> 
>> diff --git a/fs/exec.c b/fs/exec.c
>> index 765bfd51a546..635b5085050c 100644
>> --- a/fs/exec.c
>> +++ b/fs/exec.c
>> @@ -1635,12 +1635,6 @@ int prepare_binprm(struct linux_binprm *bprm)
>>  
>>  	bprm_fill_uid(bprm);
>>  
>> -	/* fill in binprm security blob */
>> -	retval = security_bprm_set_creds(bprm);
>> -	if (retval)
>> -		return retval;
>> -	bprm->called_set_creds = 1;
>> -
>>  	retval = cap_bprm_set_creds(bprm);
>>  	if (retval)
>>  		return retval;
>> @@ -1858,6 +1852,11 @@ static int __do_execve_file(int fd, struct filename *filename,
>>  	if (retval < 0)
>>  		goto out;
>>  
>> +	/* fill in binprm security blob */
>> +	retval = security_bprm_set_creds(bprm);
>> +	if (retval)
>> +		goto out;
>> +
>>  	retval = prepare_binprm(bprm);
>>  	if (retval < 0)
>>  		goto out;
>> 
>
> Here I go with a Sunday night review, so hopefully I'm thinking better
> than Friday night's review, but I *think* this patch is broken from
> the LSM sense of the world in that security_bprm_set_creds() is getting
> called _before_ the creds actually get fully set (in prepare_binprm()
> by the calls to bprm_fill_uid(), cap_bprm_set_creds(), and
> check_unsafe_exec()).
>
> As a specific example, see the setting of LSM_UNSAFE_NO_NEW_PRIVS in
> bprm->unsafe during check_unsafe_exec(), which must happen after
> bprm_fill_uid(bprm) and cap_bprm_set_creds(bprm), to have a "true" view
> of the execution privileges. Apparmor checks for this flag in its
> security_bprm_set_creds() hook. Similarly do selinux, smack, etc...

I think you are getting prepare_binprm confused with prepare_bprm_creds.
Understandable given the similarity of their names.

> The security_bprm_set_creds() boundary for LSM is to see the "final"
> state of the process privileges, and that needs to happen after
> bprm_fill_uid(), cap_bprm_set_creds(), and check_unsafe_exec() have all
> finished.
>
> So, as it stands, I don't think this will work, but perhaps it can still
> be rearranged to avoid the called_set_creds silliness. I'll look more
> this week...

If you look at the flow of the code in __do_execve_file before this
change it is:

	prepare_bprm_creds()
        check_unsafe_exec()

	...

        prepare_binprm()
        	bprm_file_uid()
                	bprm->cred->euid = current_euid()
                        bprm->cred->egid = current_egid()
		security_bprm_set_creds()
                	for_each_lsm()
                        	lsm->bprm_set_creds()
                                	if (called_set_creds)
                                        	return;
                                        ...
		bprm->called_set_creds = 1;
	...

	exec_binprm()
        	search_binary_handler()
                	security_bprm_check()
                        	tomoyo_bprm_check_security()
                                ima_bprm_check()
   			load_script()
                        	prepare_binprm()
                                	/* called_set_creds already == 1 */
                                	bprm_file_uid()
                                        security_bprm_set_creds()
			                	for_each_lsm()
			                        	lsm->bprm_set_creds()
		                                	if (called_set_creds)
                		                        	return;
                                		        ...
                                search_binary_handler()
                                	security_bprm_check_security()
                                        load_elf_binary()
                                        	...
                                                setup_new_exec
                                                ...


Assuming you are executing a shell script.

Now bprm_file_uid is written with the assumption that it will be called
multiple times and it reinitializes all of it's variables each time.

As you can see in above the implementations of bprm_set_creds() only
really execute before called_set_creds is set, aka the first time.
They in no way see the final state.

Further when I looked as those hooks they were not looking at the values
set by bprm_file_uid at all.  There were busy with the values their
they needed to set in that hook for their particular lsm.

So while in theory I can see the danger of moving above bprm_file_uid
I don't see anything in practice that would be a problem.

Further by moving the call of security_bprm_set_creds out of
prepare_binprm int __do_execve_file just before the call of
prepare_binprm I am just moving the call above binprm_fill_uid
and nothing else.

So I think you just confused prepare_bprm_creds with prepare_binprm.
As most of your criticisms appear valid in that case.  Can you take a
second look?

Thank you,
Eric

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH 3/5] exec: Remove recursion from search_binary_handler
  2020-05-11 14:33           ` Eric W. Biederman
@ 2020-05-11 19:10             ` Rob Landley
  2020-05-13 21:59               ` Eric W. Biederman
  2020-05-11 21:55             ` Kees Cook
  1 sibling, 1 reply; 122+ messages in thread
From: Rob Landley @ 2020-05-11 19:10 UTC (permalink / raw)
  To: Eric W. Biederman, Linus Torvalds
  Cc: Tetsuo Handa, Linux Kernel Mailing List, Oleg Nesterov,
	Jann Horn, Kees Cook, Greg Ungerer, Bernd Edlinger,
	linux-fsdevel, Al Viro, Alexey Dobriyan, Andrew Morton,
	Casey Schaufler, LSM List, James Morris, Serge E. Hallyn,
	Andy Lutomirski, dalias

On 5/11/20 9:33 AM, Eric W. Biederman wrote:
> What I do see is that interp_data is just a parameter that is smuggled
> into the call of search binary handler.  And the next binary handler
> needs to be binfmt_elf for it to make much sense, as only binfmt_elf
> (and binfmt_elf_fdpic) deals with BINPRM_FLAGS_EXECFD.

The binfmt_elf_fdpic driver is separate from binfmt_elf for the same reason
ext2/ext3/ext4 used to have 3 drivers: fdpic is really just binfmt_elf with the
4 main sections (text, data, bss, rodata) able to move independently of each
other (each tracked with its own base pointer).

It's kind of -fPIE on steroids, and various security people have sniffed at it
over the years to give ASLR more degrees of freedom on with-MMU systems. Many
moons ago Rich Felker proposed teaching the fdpic loader how to load normal ELF
binaries so there's just the one loader (there's a flag in the ELF header to say
whether the sections are independent or not).

Rob

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH 2/5] exec: Directly call security_bprm_set_creds from __do_execve_file
  2020-05-11 16:52         ` Eric W. Biederman
@ 2020-05-11 21:18           ` Kees Cook
  0 siblings, 0 replies; 122+ messages in thread
From: Kees Cook @ 2020-05-11 21:18 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, Jann Horn,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton, Casey Schaufler,
	linux-security-module, James Morris, Serge E. Hallyn,
	Andy Lutomirski

On Mon, May 11, 2020 at 11:52:41AM -0500, Eric W. Biederman wrote:
> Kees Cook <keescook@chromium.org> writes:
> 
> > On Sat, May 09, 2020 at 02:41:17PM -0500, Eric W. Biederman wrote:
> >> 
> >> Now that security_bprm_set_creds is no longer responsible for calling
> >> cap_bprm_set_creds, security_bprm_set_creds only does something for
> >> the primary file that is being executed (not any interpreters it may
> >> have).  Therefore call security_bprm_set_creds from __do_execve_file,
> >> instead of from prepare_binprm so that it is only called once, and
> >> remove the now unnecessary called_set_creds field of struct binprm.
> >> 
> >> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> >> ---
> >>  fs/exec.c                  | 11 +++++------
> >>  include/linux/binfmts.h    |  6 ------
> >>  security/apparmor/domain.c |  3 ---
> >>  security/selinux/hooks.c   |  2 --
> >>  security/smack/smack_lsm.c |  3 ---
> >>  security/tomoyo/tomoyo.c   |  6 ------
> >>  6 files changed, 5 insertions(+), 26 deletions(-)
> >> 
> >> diff --git a/fs/exec.c b/fs/exec.c
> >> index 765bfd51a546..635b5085050c 100644
> >> --- a/fs/exec.c
> >> +++ b/fs/exec.c
> >> @@ -1635,12 +1635,6 @@ int prepare_binprm(struct linux_binprm *bprm)
> >>  
> >>  	bprm_fill_uid(bprm);
> >>  
> >> -	/* fill in binprm security blob */
> >> -	retval = security_bprm_set_creds(bprm);
> >> -	if (retval)
> >> -		return retval;
> >> -	bprm->called_set_creds = 1;
> >> -
> >>  	retval = cap_bprm_set_creds(bprm);
> >>  	if (retval)
> >>  		return retval;
> >> @@ -1858,6 +1852,11 @@ static int __do_execve_file(int fd, struct filename *filename,
> >>  	if (retval < 0)
> >>  		goto out;
> >>  
> >> +	/* fill in binprm security blob */
> >> +	retval = security_bprm_set_creds(bprm);
> >> +	if (retval)
> >> +		goto out;
> >> +
> >>  	retval = prepare_binprm(bprm);
> >>  	if (retval < 0)
> >>  		goto out;
> >> 
> >
> > Here I go with a Sunday night review, so hopefully I'm thinking better
> > than Friday night's review, but I *think* this patch is broken from
> > the LSM sense of the world in that security_bprm_set_creds() is getting
> > called _before_ the creds actually get fully set (in prepare_binprm()
> > by the calls to bprm_fill_uid(), cap_bprm_set_creds(), and
> > check_unsafe_exec()).
> >
> > As a specific example, see the setting of LSM_UNSAFE_NO_NEW_PRIVS in
> > bprm->unsafe during check_unsafe_exec(), which must happen after
> > bprm_fill_uid(bprm) and cap_bprm_set_creds(bprm), to have a "true" view
> > of the execution privileges. Apparmor checks for this flag in its
> > security_bprm_set_creds() hook. Similarly do selinux, smack, etc...
> 
> I think you are getting prepare_binprm confused with prepare_bprm_creds.
> Understandable given the similarity of their names.

I fixated on a bad example, having confused myself about when
check_unsafe_exec() happens. My original concern (with the bad example)
was that the LSM is having security_bprm_set_creds() called before the
new cred in bprm->cred has been initialized with all the correct uid/gid,
caps, and associated flags.

But anything associated with capabilities should be confined to the
commoncap LSM, though there is "leakage" into the uid/gid states and some
bprm state (more on this later). That said, as you also found, I can't
find any LSM that examines those fields of the cred (I had stopped this
research last night when I saw check_unsafe_exec() and confused myself);
they're all looking at other bprm state not associated with caps and uid
changes (file, unsafe_exec, security field of new cred, etc). So that's
very good! That means we've actually kept a bright line between things
here -- whew.

> > The security_bprm_set_creds() boundary for LSM is to see the "final"
> > state of the process privileges, and that needs to happen after
> > bprm_fill_uid(), cap_bprm_set_creds(), and check_unsafe_exec() have all
> > finished.
> >
> > So, as it stands, I don't think this will work, but perhaps it can still
> > be rearranged to avoid the called_set_creds silliness. I'll look more
> > this week...
> 
> If you look at the flow of the code in __do_execve_file before this
> change it is:
> 
> 	prepare_bprm_creds()
>         check_unsafe_exec()
> 
> 	...
> 
>         prepare_binprm()
>         	bprm_file_uid()

(bprm_fill_uid(), but yes)

>                 	bprm->cred->euid = current_euid()
>                         bprm->cred->egid = current_egid()
> 		security_bprm_set_creds()
>                 	for_each_lsm()
>                         	lsm->bprm_set_creds()
>                                 	if (called_set_creds)
>                                         	return;
>                                         ...
> 		bprm->called_set_creds = 1;
> 	...
> 
> 	exec_binprm()
>         	search_binary_handler()
>                 	security_bprm_check()
>                         	tomoyo_bprm_check_security()
>                                 ima_bprm_check()
>    			load_script()
>                         	prepare_binprm()
>                                 	/* called_set_creds already == 1 */
>                                 	bprm_file_uid()
>                                         security_bprm_set_creds()
> 			                	for_each_lsm()
> 			                        	lsm->bprm_set_creds()
> 		                                	if (called_set_creds)
>                 		                        	return;
>                                 		        ...
>                                 search_binary_handler()
>                                 	security_bprm_check_security()
>                                         load_elf_binary()
>                                         	...
>                                                 setup_new_exec
>                                                 ...
> 
> 
> Assuming you are executing a shell script.
> 
> Now bprm_file_uid is written with the assumption that it will be called
> multiple times and it reinitializes all of it's variables each time.

Right -- and the same is true for cap_bprm_set_creds() (in that
it needs to be run multiple times and depends on the work done in
bprm_fill_uid()). If we encounter a future use-case for having other
LSMs call out here multiple time, we can introduce a new LSM hook.

> As you can see in above the implementations of bprm_set_creds() only
> really execute before called_set_creds is set, aka the first time.
> They in no way see the final state.
> 
> Further when I looked as those hooks they were not looking at the values
> set by bprm_file_uid at all.  There were busy with the values their
> they needed to set in that hook for their particular lsm.

Agreed (though I'd love some other LSM eyes on this conclusion).

> So while in theory I can see the danger of moving above bprm_file_uid
> I don't see anything in practice that would be a problem.
> 
> Further by moving the call of security_bprm_set_creds out of
> prepare_binprm int __do_execve_file just before the call of
> prepare_binprm I am just moving the call above binprm_fill_uid
> and nothing else.
> 
> So I think you just confused prepare_bprm_creds with prepare_binprm.
> As most of your criticisms appear valid in that case.  Can you take a
> second look?

So, in earlier attempts to clean up code near all this, I removed the
LSM's bprm_secureexec hook, which only commoncap was using to impart
details about privilege elevation. I switched the semantics to having LSMs
set bprm->secureexec to true (but never to zero). Since commoncap's idea
of "was I elevated?" might repeatedly change, I had to store its results
"privately" in the bprm, which got us cap_elevated (in 46d98eb4e1d2):

c425e189ffd7 ("binfmt: Introduce secureexec flag")
993b3ab0642e ("apparmor: Refactor to remove bprm_secureexec hook")
62874c3adf70 ("selinux: Refactor to remove bprm_secureexec hook")
46d98eb4e1d2 ("commoncap: Refactor to remove bprm_secureexec hook")
ee67ae7ef6ff ("commoncap: Move cap_elevated calculation into bprm_set_creds")
2af622802696 ("LSM: drop bprm_secureexec hook")

So, given the special-case nature of capabilities here, this does seem
to be the right choice (assuming we're not missing something in the
other LSMs). As such, I think the comment for cap_elevated needs to be
updated to reflect the change to function call flow, and to specify it
cannot be used by the other LSMs. Maybe something like:

               /*
                * True if most recent call to cap_bprm_set_creds()
                * (due to multiple prepare_binprm() calls from the
                * binfmt_script/misc handlers) resulted in elevated
                * privileges. This is used internally by fs/exec.c
		* to set bprm->secureexec.
                */
               cap_elevated:1,

And that brings us to naming. Whee. I think we should make the following
name changes:

bprm_fill_uid      ->	bprm_establish_privileges
cap_bprm_set_creds ->	cap_establish_privileges

Finally, I think we should update the comment on bprm_set_creds (which,
actually, I think is the correct name now) to something like:

 * @bprm_set_creds:
 *	Save security information in the @bprm->cred->security field,
 *	typically based on information about the bprm->file, for later
 *	use during the @bprm_committing_creds hook. Specifically
 *	the credentials themselves (uid, gid, etc), are not finalized
 *	yet and must not be examined until the @bprm_committing_creds
 *	hook.
 *      This hook is called once, after the creds structure has been
 *	allocated.
 *      The hook must set @bprm->secureexec to 1 if a "secure exec"
 *	has happened as a result of this hook call. The flag is used to
 *      indicate the need for a sanitized execution environment, and is
 *      also passed in the ELF auxiliary table on the initial stack to
 *      indicate whether libc should enable secure mode.
 *	This hook may also optionally check LSM-specific permissions
 *	(e.g. for transitions between security domains).
 *      @bprm contains the linux_binprm structure.
 *      Return 0 if the hook is successful and permission is granted.

-Kees

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH 3/5] exec: Remove recursion from search_binary_handler
  2020-05-11 14:33           ` Eric W. Biederman
  2020-05-11 19:10             ` Rob Landley
@ 2020-05-11 21:55             ` Kees Cook
  2020-05-12 18:42               ` Eric W. Biederman
  1 sibling, 1 reply; 122+ messages in thread
From: Kees Cook @ 2020-05-11 21:55 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linus Torvalds, Tetsuo Handa, Linux Kernel Mailing List,
	Oleg Nesterov, Jann Horn, Greg Ungerer, Rob Landley,
	Bernd Edlinger, linux-fsdevel, Al Viro, Alexey Dobriyan,
	Andrew Morton, Casey Schaufler, LSM List, James Morris,
	Serge E. Hallyn, Andy Lutomirski

On Mon, May 11, 2020 at 09:33:21AM -0500, Eric W. Biederman wrote:
> Linus Torvalds <torvalds@linux-foundation.org> writes:
> 
> > On Sat, May 9, 2020 at 9:30 PM Tetsuo Handa
> > <penguin-kernel@i-love.sakura.ne.jp> wrote:
> >>
> >> Wouldn't this change cause
> >>
> >>         if (fd_binary > 0)
> >>                 ksys_close(fd_binary);
> >>         bprm->interp_flags = 0;
> >>         bprm->interp_data = 0;
> >>
> >> not to be called when "Search for the interpreter" failed?
> >
> > Good catch. We seem to have some subtle magic wrt the fd_binary file
> > descriptor, which depends on the recursive behavior.
> 
> Yes.  I Tetsuo I really appreciate you noticing this.  This is exactly
> the kind of behavior I am trying to flush out and keep from being
> hidden.
> 
> > I'm not seeing how to fix it cleanly with the "turn it into a loop".
> > Basically, that binfmt_misc use-case isn't really a tail-call.
> 
> I have reservations about installing a new file descriptor before
> we process the close on exec logic and the related security modules
> closing file descriptors that your new credentials no longer give
> you access to logic.

Hm, this does feel odd. In looking at this, it seems like this file
never gets close-on-exec set, and doesn't have its flags changed from
its original open:
                .open_flag = O_LARGEFILE | O_RDONLY | __FMODE_EXEC,
only the UMH path through exec doesn't explicitly open a file by name
from what I can see, so we'll only have these flags.

> I haven't yet figured out how opening a file descriptor during exec
> should fit into all of that.
> 
> What I do see is that interp_data is just a parameter that is smuggled
> into the call of search binary handler.  And the next binary handler
> needs to be binfmt_elf for it to make much sense, as only binfmt_elf
> (and binfmt_elf_fdpic) deals with BINPRM_FLAGS_EXECFD.
> 
> So I think what needs to happen is to rename bprm->interp_data to
> bprm->execfd, remove BINPRM_FLAGS_EXECFD and make closing that file
> descriptor free_bprm's responsiblity.

Yeah, I would agree. As far as the close handling, I don't think there
is a difference here: it interp_data was closed on the binfmt_misc.c
error path, and in the new world it would be the exec error path -- both
would be under the original credentials.

> I hope such a change will make it easier to see all of the pieces that
> are intereacting during exec.

Right -- I'm not sure which piece should "consume" bprm->execfd though,
which I think is what you're asking next...

> I am still asking: is the installation of that file descriptor useful if
> it is not exported passed to userspace as an AT_EXECFD note?
> 
> I will dig in and see what I can come up with.

Should binfmt_misc do the install, or can the consuming binfmt do it?
i.e. when binfmt_elf sees bprm->execfd, does it perform the install
instead?

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH 4/5] exec: Allow load_misc_binary to call prepare_binfmt unconditionally
  2020-05-09 19:42     ` [PATCH 4/5] exec: Allow load_misc_binary to call prepare_binfmt unconditionally Eric W. Biederman
@ 2020-05-11 22:09       ` Kees Cook
  0 siblings, 0 replies; 122+ messages in thread
From: Kees Cook @ 2020-05-11 22:09 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, Jann Horn,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton, Casey Schaufler,
	linux-security-module, James Morris, Serge E. Hallyn,
	Andy Lutomirski

On Sat, May 09, 2020 at 02:42:23PM -0500, Eric W. Biederman wrote:
> 
> Add a flag preserve_creds that binfmt_misc can set to prevent
> credentials from being updated.  This allows binfmrt_misc to always
> call prepare_binfmt.  Allowing the credential computation logic to be
> consolidated.
> 
> Ref: c407c033de84 ("[PATCH] binfmt_misc: improve calculation of interpreter's credentials")
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  fs/binfmt_misc.c        | 15 +++------------
>  fs/exec.c               | 14 +++++++++-----
>  include/linux/binfmts.h |  2 ++
>  3 files changed, 14 insertions(+), 17 deletions(-)
> 
> diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c
> index 127fae9c21ab..16bfafd2671d 100644
> --- a/fs/binfmt_misc.c
> +++ b/fs/binfmt_misc.c
> @@ -218,19 +218,10 @@ static int load_misc_binary(struct linux_binprm *bprm)
>  		goto error;
>  
>  	bprm->file = interp_file;
> -	if (fmt->flags & MISC_FMT_CREDENTIALS) {
> -		loff_t pos = 0;
> -
> -		/*
> -		 * No need to call prepare_binprm(), it's already been
> -		 * done.  bprm->buf is stale, update from interp_file.
> -		 */
> -		memset(bprm->buf, 0, BINPRM_BUF_SIZE);
> -		retval = kernel_read(bprm->file, bprm->buf, BINPRM_BUF_SIZE,
> -				&pos);
> -	} else
> -		retval = prepare_binprm(bprm);
> +	if (fmt->flags & MISC_FMT_CREDENTIALS)
> +		bprm->preserve_creds = 1;
>  
> +	retval = prepare_binprm(bprm);
>  	if (retval < 0)
>  		goto error;
>  
> diff --git a/fs/exec.c b/fs/exec.c
> index 8bbf5fa785a6..01dbeb025c46 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1630,14 +1630,18 @@ static void bprm_fill_uid(struct linux_binprm *bprm)
>   */
>  int prepare_binprm(struct linux_binprm *bprm)
>  {
> -	int retval;
>  	loff_t pos = 0;
>  
> -	bprm_fill_uid(bprm);
> +	if (!bprm->preserve_creds) {

nit: hint this to the common execution path:

	if (likely(!bprm->preserve_creds) {

> +		int retval;
>  
> -	retval = cap_bprm_set_creds(bprm);
> -	if (retval)
> -		return retval;
> +		bprm_fill_uid(bprm);
> +
> +		retval = cap_bprm_set_creds(bprm);
> +		if (retval)
> +			return retval;
> +	}
> +	bprm->preserve_creds = 0;
>  
>  	memset(bprm->buf, 0, BINPRM_BUF_SIZE);
>  	return kernel_read(bprm->file, bprm->buf, BINPRM_BUF_SIZE, &pos);
> diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
> index 89f1135dcb75..cb016f001e7a 100644
> --- a/include/linux/binfmts.h
> +++ b/include/linux/binfmts.h
> @@ -26,6 +26,8 @@ struct linux_binprm {
>  	unsigned long p; /* current top of mem */
>  	unsigned long argmin; /* rlimit marker for copy_strings() */
>  	unsigned int
> +		/* Don't update the creds for an interpreter (see binfmt_misc) */

I'd like a much more verbose comment here. How about this:

		/*
		 * Skip setting new privileges for an interpreter (see
		 * binfmt_misc) on the next call to prepare_binprm().
		 */

> +		preserve_creds:1,

Nit pick: we've seen there is a logical difference here between "creds"
(which mean "the creds struct itself") and "privileges" (which are
stored in the cred struct). I think we should reinforce this distinction
here and name this:

		preserve_privileges:1,

>  		/*
>  		 * True if most recent call to the commoncaps bprm_set_creds
>  		 * hook (due to multiple prepare_binprm() calls from the
> -- 
> 2.25.0
> 

Otherwise, yeah, this seems okay to me.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH 5/5] exec: Move the call of prepare_binprm into search_binary_handler
  2020-05-09 19:42     ` [PATCH 5/5] exec: Move the call of prepare_binprm into search_binary_handler Eric W. Biederman
@ 2020-05-11 22:24       ` Kees Cook
  0 siblings, 0 replies; 122+ messages in thread
From: Kees Cook @ 2020-05-11 22:24 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, Jann Horn,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton, Casey Schaufler,
	linux-security-module, James Morris, Serge E. Hallyn,
	Andy Lutomirski

On Sat, May 09, 2020 at 02:42:52PM -0500, Eric W. Biederman wrote:
> 
> The code in prepare_binary_handler needs to be run every time
> search_binary_handler is called so move the call into search_binary_handler
> itself to make the code simpler and easier to understand.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Yes, nice. :) I don't see any ordering dependencies here. The only thing
I see is a potential for more "work done by kernel before bailing" in
the sense that the arg copying will be performed before we check the
kernel_read() result. I struggle to see how that might be a problem,
and this get us to fewer exec.c exports. Yay!

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH 3/5] exec: Remove recursion from search_binary_handler
  2020-05-11 21:55             ` Kees Cook
@ 2020-05-12 18:42               ` Eric W. Biederman
  2020-05-12 19:25                 ` Kees Cook
  2020-05-13  0:20                 ` Linus Torvalds
  0 siblings, 2 replies; 122+ messages in thread
From: Eric W. Biederman @ 2020-05-12 18:42 UTC (permalink / raw)
  To: Kees Cook
  Cc: Linus Torvalds, Tetsuo Handa, Linux Kernel Mailing List,
	Oleg Nesterov, Jann Horn, Greg Ungerer, Rob Landley,
	Bernd Edlinger, linux-fsdevel, Al Viro, Alexey Dobriyan,
	Andrew Morton, Casey Schaufler, LSM List, James Morris,
	Serge E. Hallyn, Andy Lutomirski

Kees Cook <keescook@chromium.org> writes:

> On Mon, May 11, 2020 at 09:33:21AM -0500, Eric W. Biederman wrote:
>> Linus Torvalds <torvalds@linux-foundation.org> writes:
>> 
>> > On Sat, May 9, 2020 at 9:30 PM Tetsuo Handa
>> > <penguin-kernel@i-love.sakura.ne.jp> wrote:
>> >>
>> >> Wouldn't this change cause
>> >>
>> >>         if (fd_binary > 0)
>> >>                 ksys_close(fd_binary);
>> >>         bprm->interp_flags = 0;
>> >>         bprm->interp_data = 0;
>> >>
>> >> not to be called when "Search for the interpreter" failed?
>> >
>> > Good catch. We seem to have some subtle magic wrt the fd_binary file
>> > descriptor, which depends on the recursive behavior.
>> 
>> Yes.  I Tetsuo I really appreciate you noticing this.  This is exactly
>> the kind of behavior I am trying to flush out and keep from being
>> hidden.
>> 
>> > I'm not seeing how to fix it cleanly with the "turn it into a loop".
>> > Basically, that binfmt_misc use-case isn't really a tail-call.
>> 
>> I have reservations about installing a new file descriptor before
>> we process the close on exec logic and the related security modules
>> closing file descriptors that your new credentials no longer give
>> you access to logic.
>
> Hm, this does feel odd. In looking at this, it seems like this file
> never gets close-on-exec set, and doesn't have its flags changed from
> its original open:
>                 .open_flag = O_LARGEFILE | O_RDONLY | __FMODE_EXEC,
> only the UMH path through exec doesn't explicitly open a file by name
> from what I can see, so we'll only have these flags.
>
>> I haven't yet figured out how opening a file descriptor during exec
>> should fit into all of that.
>> 
>> What I do see is that interp_data is just a parameter that is smuggled
>> into the call of search binary handler.  And the next binary handler
>> needs to be binfmt_elf for it to make much sense, as only binfmt_elf
>> (and binfmt_elf_fdpic) deals with BINPRM_FLAGS_EXECFD.
>> 
>> So I think what needs to happen is to rename bprm->interp_data to
>> bprm->execfd, remove BINPRM_FLAGS_EXECFD and make closing that file
>> descriptor free_bprm's responsiblity.
>
> Yeah, I would agree. As far as the close handling, I don't think there
> is a difference here: it interp_data was closed on the binfmt_misc.c
> error path, and in the new world it would be the exec error path -- both
> would be under the original credentials.
>
>> I hope such a change will make it easier to see all of the pieces that
>> are intereacting during exec.
>
> Right -- I'm not sure which piece should "consume" bprm->execfd though,
> which I think is what you're asking next...
>
>> I am still asking: is the installation of that file descriptor useful if
>> it is not exported passed to userspace as an AT_EXECFD note?
>> 
>> I will dig in and see what I can come up with.
>
> Should binfmt_misc do the install, or can the consuming binfmt do it?
> i.e. when binfmt_elf sees bprm->execfd, does it perform the install
> instead?

I am still thinking about this one, but here is where I am at.  At a
practical level passing the file descriptor of the script to interpreter
seems like something we should encourage in the long term.  It removes
races and it is cheaper because then the interpreter does not have to
turn around and open the script itself.



Strictly speaking binfmt_misc should not need to close the file
descriptor in binfmt_misc because we have already unshared the files
struct and reset_files_struct should handle restoring it.

Calling fd_install in binfmt_misc still seems wrong, as that exposes
the new file descriptor to user space with the old creds.

It is possible although unlikely for userspace to find the file
descriptor without consulting AT_EXECFD so just to be conservative I
think we should install the file descriptor in begin_new_exec even if
the next interpreter does not support AT_EXECFD.


I am still working on how to handle recursive binfmts but I suspect it
is just a matter of having an array of struct files in struct
linux_binprm.


Eric









^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH 3/5] exec: Remove recursion from search_binary_handler
  2020-05-12 18:42               ` Eric W. Biederman
@ 2020-05-12 19:25                 ` Kees Cook
  2020-05-12 20:31                   ` Eric W. Biederman
  2020-05-13  0:20                 ` Linus Torvalds
  1 sibling, 1 reply; 122+ messages in thread
From: Kees Cook @ 2020-05-12 19:25 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linus Torvalds, Tetsuo Handa, Linux Kernel Mailing List,
	Oleg Nesterov, Jann Horn, Greg Ungerer, Rob Landley,
	Bernd Edlinger, linux-fsdevel, Al Viro, Alexey Dobriyan,
	Andrew Morton, Casey Schaufler, LSM List, James Morris,
	Serge E. Hallyn, Andy Lutomirski

On Tue, May 12, 2020 at 01:42:53PM -0500, Eric W. Biederman wrote:
> Kees Cook <keescook@chromium.org> writes:
> > Should binfmt_misc do the install, or can the consuming binfmt do it?
> > i.e. when binfmt_elf sees bprm->execfd, does it perform the install
> > instead?
> 
> I am still thinking about this one, but here is where I am at.  At a
> practical level passing the file descriptor of the script to interpreter
> seems like something we should encourage in the long term.  It removes
> races and it is cheaper because then the interpreter does not have to
> turn around and open the script itself.

Yeah, this does sounds pretty good, though I have concerns about doing
it for a process that isn't expecting it. I've seen a lot of bad code
make assumptions about initial fd numbers. :(

> Strictly speaking binfmt_misc should not need to close the file
> descriptor in binfmt_misc because we have already unshared the files
> struct and reset_files_struct should handle restoring it.

If I get what you mean, I agree. The error case is fine.

> Calling fd_install in binfmt_misc still seems wrong, as that exposes
> the new file descriptor to user space with the old creds.

I haven't dug into the details here -- is there a real risk here? The
old creds are what opened the file originally for the exec. Are you
thinking about executable-but-not-readable files?

> It is possible although unlikely for userspace to find the file
> descriptor without consulting AT_EXECFD so just to be conservative I
> think we should install the file descriptor in begin_new_exec even if
> the next interpreter does not support AT_EXECFD.

I think universally installing the fd needs to be a distinct patch --
it's going to have a lot of consequences, IMO. We can certainly deal
with them, but I don't think it should be part of this clean-up series.

> I am still working on how to handle recursive binfmts but I suspect it
> is just a matter of having an array of struct files in struct
> linux_binprm.

If install is left if binfmt_misc, then the recursive problem goes away,
yes?

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH 3/5] exec: Remove recursion from search_binary_handler
  2020-05-12 19:25                 ` Kees Cook
@ 2020-05-12 20:31                   ` Eric W. Biederman
  2020-05-12 23:08                     ` Kees Cook
  0 siblings, 1 reply; 122+ messages in thread
From: Eric W. Biederman @ 2020-05-12 20:31 UTC (permalink / raw)
  To: Kees Cook
  Cc: Linus Torvalds, Tetsuo Handa, Linux Kernel Mailing List,
	Oleg Nesterov, Jann Horn, Greg Ungerer, Rob Landley,
	Bernd Edlinger, linux-fsdevel, Al Viro, Alexey Dobriyan,
	Andrew Morton, Casey Schaufler, LSM List, James Morris,
	Serge E. Hallyn, Andy Lutomirski

Kees Cook <keescook@chromium.org> writes:

> On Tue, May 12, 2020 at 01:42:53PM -0500, Eric W. Biederman wrote:
>> Kees Cook <keescook@chromium.org> writes:
>> > Should binfmt_misc do the install, or can the consuming binfmt do it?
>> > i.e. when binfmt_elf sees bprm->execfd, does it perform the install
>> > instead?
>> 
>> I am still thinking about this one, but here is where I am at.  At a
>> practical level passing the file descriptor of the script to interpreter
>> seems like something we should encourage in the long term.  It removes
>> races and it is cheaper because then the interpreter does not have to
>> turn around and open the script itself.
>
> Yeah, this does sounds pretty good, though I have concerns about doing
> it for a process that isn't expecting it. I've seen a lot of bad code
> make assumptions about initial fd numbers. :(

Yes.  That is definitely a concern.

>> Strictly speaking binfmt_misc should not need to close the file
>> descriptor in binfmt_misc because we have already unshared the files
>> struct and reset_files_struct should handle restoring it.
>
> If I get what you mean, I agree. The error case is fine.
>
>> Calling fd_install in binfmt_misc still seems wrong, as that exposes
>> the new file descriptor to user space with the old creds.
>
> I haven't dug into the details here -- is there a real risk here? The
> old creds are what opened the file originally for the exec. Are you
> thinking about executable-but-not-readable files?

I am thinking about looking in proc/<pid>/fd and maybe opening those
files.  That access is gated by ptrace_may_access which is gated
by the process credentials. So I know strictly speaking it is wrong.

I think you are correct that it would only allow access to a file that
could be accessed another way.  Even execveat at a quick glance appears
to go through the orinary permission checks of open.

The current code is definitely a maintenance pitfall as it install state
into the process early.

>> It is possible although unlikely for userspace to find the file
>> descriptor without consulting AT_EXECFD so just to be conservative I
>> think we should install the file descriptor in begin_new_exec even if
>> the next interpreter does not support AT_EXECFD.
>
> I think universally installing the fd needs to be a distinct patch --
> it's going to have a lot of consequences, IMO. We can certainly deal
> with them, but I don't think it should be part of this clean-up series.

I meant generically installing the fd not universally installing it.

>> I am still working on how to handle recursive binfmts but I suspect it
>> is just a matter of having an array of struct files in struct
>> linux_binprm.
>
> If install is left if binfmt_misc, then the recursive problem goes away,
> yes?

I don't think leaving the install in binfmt_misc is responsible at this
point.

Eric

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH 3/5] exec: Remove recursion from search_binary_handler
  2020-05-12 20:31                   ` Eric W. Biederman
@ 2020-05-12 23:08                     ` Kees Cook
  2020-05-12 23:47                       ` Kees Cook
  0 siblings, 1 reply; 122+ messages in thread
From: Kees Cook @ 2020-05-12 23:08 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linus Torvalds, Tetsuo Handa, Linux Kernel Mailing List,
	Oleg Nesterov, Jann Horn, Greg Ungerer, Rob Landley,
	Bernd Edlinger, linux-fsdevel, Al Viro, Alexey Dobriyan,
	Andrew Morton, Casey Schaufler, LSM List, James Morris,
	Serge E. Hallyn, Andy Lutomirski

On Tue, May 12, 2020 at 03:31:57PM -0500, Eric W. Biederman wrote:
> >> It is possible although unlikely for userspace to find the file
> >> descriptor without consulting AT_EXECFD so just to be conservative I
> >> think we should install the file descriptor in begin_new_exec even if
> >> the next interpreter does not support AT_EXECFD.
> >
> > I think universally installing the fd needs to be a distinct patch --
> > it's going to have a lot of consequences, IMO. We can certainly deal
> > with them, but I don't think it should be part of this clean-up series.
> 
> I meant generically installing the fd not universally installing it.
> 
> >> I am still working on how to handle recursive binfmts but I suspect it
> >> is just a matter of having an array of struct files in struct
> >> linux_binprm.
> >
> > If install is left if binfmt_misc, then the recursive problem goes away,
> > yes?
> 
> I don't think leaving the install in binfmt_misc is responsible at this
> point.

I'm nearly certain the answer is "yes", but I wonder if we should stop
for a moment and ask "does anything still use MISC_FMT_OPEN_BINARY ? It
looks like either "O" or "C" binfmt_misc registration flag. My installed
binfmts on Ubuntu don't use them...

I'm currently pulling a list of all the packages in Debian than depend
on the binfmt-support package and checking their flags.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH 3/5] exec: Remove recursion from search_binary_handler
  2020-05-12 23:08                     ` Kees Cook
@ 2020-05-12 23:47                       ` Kees Cook
  2020-05-12 23:51                         ` Kees Cook
  0 siblings, 1 reply; 122+ messages in thread
From: Kees Cook @ 2020-05-12 23:47 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linus Torvalds, Tetsuo Handa, Linux Kernel Mailing List,
	Oleg Nesterov, Jann Horn, Greg Ungerer, Rob Landley,
	Bernd Edlinger, linux-fsdevel, Al Viro, Alexey Dobriyan,
	Andrew Morton, Casey Schaufler, LSM List, James Morris,
	Serge E. Hallyn, Andy Lutomirski

On Tue, May 12, 2020 at 04:08:56PM -0700, Kees Cook wrote:
> I'm nearly certain the answer is "yes", but I wonder if we should stop
> for a moment and ask "does anything still use MISC_FMT_OPEN_BINARY ? It
> looks like either "O" or "C" binfmt_misc registration flag. My installed
> binfmts on Ubuntu don't use them...
> 
> I'm currently pulling a list of all the packages in Debian than depend
> on the binfmt-support package and checking their flags.

So, binfmt-support in Debian doesn't in _support_ MISC_FMT_OPEN_BINARY
("O"):


        credentials =
                (binfmt->credentials && !strcmp (binfmt->credentials, "yes"))
                ? "C" : "";
        preserve = (binfmt->preserve && !strcmp (binfmt->preserve, "yes"))
                ? "P" : "";
        fix_binary =
                (binfmt->fix_binary && !strcmp (binfmt->fix_binary, "yes"))
                ? "F" : "";
...
        regstring = xasprintf (":%s:%c:%s:%s:%s:%s:%s%s%s\n",
                               name, type, binfmt->offset, binfmt->magic,
                               binfmt->mask, interpreter,
                               credentials, preserve, fix_binary);

However, "credentials" ("C") does imply MISC_FMT_OPEN_BINARY.


I looked at every Debian package using binfmt-support, and "only" qemu
uses "credential".

And now I wonder if qemu actually uses the resulting AT_EXECFD ...

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH 3/5] exec: Remove recursion from search_binary_handler
  2020-05-12 23:47                       ` Kees Cook
@ 2020-05-12 23:51                         ` Kees Cook
  2020-05-14 14:56                           ` Eric W. Biederman
  0 siblings, 1 reply; 122+ messages in thread
From: Kees Cook @ 2020-05-12 23:51 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linus Torvalds, Tetsuo Handa, Linux Kernel Mailing List,
	Oleg Nesterov, Jann Horn, Greg Ungerer, Rob Landley,
	Bernd Edlinger, linux-fsdevel, Al Viro, Alexey Dobriyan,
	Andrew Morton, Casey Schaufler, LSM List, James Morris,
	Serge E. Hallyn, Andy Lutomirski

On Tue, May 12, 2020 at 04:47:14PM -0700, Kees Cook wrote:
> And now I wonder if qemu actually uses the resulting AT_EXECFD ...

It does, though I'm not sure if this is to support crossing mount points,
dropping privileges, or something else, since it does fall back to just
trying to open the file.

    execfd = qemu_getauxval(AT_EXECFD);
    if (execfd == 0) {
        execfd = open(filename, O_RDONLY);
        if (execfd < 0) {
            printf("Error while loading %s: %s\n", filename, strerror(errno));
            _exit(EXIT_FAILURE);
        }
    }


-- 
Kees Cook

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH 3/5] exec: Remove recursion from search_binary_handler
  2020-05-12 18:42               ` Eric W. Biederman
  2020-05-12 19:25                 ` Kees Cook
@ 2020-05-13  0:20                 ` Linus Torvalds
  2020-05-13  2:39                   ` Rob Landley
  2020-05-14 16:49                   ` Eric W. Biederman
  1 sibling, 2 replies; 122+ messages in thread
From: Linus Torvalds @ 2020-05-13  0:20 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Kees Cook, Tetsuo Handa, Linux Kernel Mailing List,
	Oleg Nesterov, Jann Horn, Greg Ungerer, Rob Landley,
	Bernd Edlinger, linux-fsdevel, Al Viro, Alexey Dobriyan,
	Andrew Morton, Casey Schaufler, LSM List, James Morris,
	Serge E. Hallyn, Andy Lutomirski

On Tue, May 12, 2020 at 11:46 AM Eric W. Biederman
<ebiederm@xmission.com> wrote:
>
> I am still thinking about this one, but here is where I am at.  At a
> practical level passing the file descriptor of the script to interpreter
> seems like something we should encourage in the long term.  It removes
> races and it is cheaper because then the interpreter does not have to
> turn around and open the script itself.

Yeah, I think we should continue to support it, because I think it's
the right thing to do (and we might just end up having compatibility
issues if we don't).

How about trying to move the logic to the common code, out of binfmt_misc?

IOW, how about something very similar to your "brpm->preserve_creds"
thing that you did for the credentials (also for binfmt_misc, which
shouldn't surprise anybody: binfmt_misc is simply the "this is the
generic thing for letting user mode do the final details").

> Calling fd_install in binfmt_misc still seems wrong, as that exposes
> the new file descriptor to user space with the old creds.

Right.  And it really would be good to simply not have these kinds of
very special cases inside the low-level binfmt code: I'd much rather
have the special cases in the generic code, so that we see what the
ordering is etc. One of the big problems with all these binfmt
callbacks has been the fact that it makes it so hard to think about
and change the generic code, because the low-level binfmt handlers all
do their own special thing.

So moving it to generic code would likely simplify things from that
angle, even if the actual complexity of the feature itself remains.

Besides, we really have exposed this to other code anyway thanks to
that whole bprm->interp_data thing, and the AT_EXECFD AUX entries that
we have. So it's not really "internal" to binfmt_misc _anyway_.

So how about we just move the fd_binary logic to the generic execve
code, and just binfmt_misc set the flag for "yes, please do this",
exactly like "preserve_creds"?

> It is possible although unlikely for userspace to find the file
> descriptor without consulting AT_EXECFD so just to be conservative I
> think we should install the file descriptor in begin_new_exec even if
> the next interpreter does not support AT_EXECFD.

Ack. I think the AT_EXECFD thing is a sign that this isn't internal to
binfmt_misc, but it also shouldn't be gating this issue. In reality,
ELF is the only real binary format that matters - the script/misc
binfmts are just indirection entries - and it supports AT_EXECFD, so
let's just ignore the theoretical case of "maybe nobody exposes it".

So yes, just make it part of begin_new_exec(), and there's no reason
to support more than a single fd. No stacks or arrays of these things
required, I feel. It's not like AT_EXECFD supports the notion of
multiple fd's being reported anyway, nor does it make any sense to
have some kind of nested misc->misc binfmt nesting.

So making that whole interp_data and fd_binary thing be a generic
layer thing would make the search_binary_handler() code in binfmt_misc
be a pure tailcall too, and then the conversion to a loop ends up
working and being the right thing.

No?

                Linus

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH 3/5] exec: Remove recursion from search_binary_handler
  2020-05-13  0:20                 ` Linus Torvalds
@ 2020-05-13  2:39                   ` Rob Landley
  2020-05-13 19:51                     ` Linus Torvalds
  2020-05-14 16:49                   ` Eric W. Biederman
  1 sibling, 1 reply; 122+ messages in thread
From: Rob Landley @ 2020-05-13  2:39 UTC (permalink / raw)
  To: Linus Torvalds, Eric W. Biederman
  Cc: Kees Cook, Tetsuo Handa, Linux Kernel Mailing List,
	Oleg Nesterov, Jann Horn, Greg Ungerer, Bernd Edlinger,
	linux-fsdevel, Al Viro, Alexey Dobriyan, Andrew Morton,
	Casey Schaufler, LSM List, James Morris, Serge E. Hallyn,
	Andy Lutomirski



On 5/12/20 7:20 PM, Linus Torvalds wrote:
> On Tue, May 12, 2020 at 11:46 AM Eric W. Biederman
> <ebiederm@xmission.com> wrote:
>>
>> I am still thinking about this one, but here is where I am at.  At a
>> practical level passing the file descriptor of the script to interpreter
>> seems like something we should encourage in the long term.  It removes
>> races and it is cheaper because then the interpreter does not have to
>> turn around and open the script itself.
> 
> Yeah, I think we should continue to support it, because I think it's
> the right thing to do (and we might just end up having compatibility
> issues if we don't).
...
>> It is possible although unlikely for userspace to find the file
>> descriptor without consulting AT_EXECFD so just to be conservative I
>> think we should install the file descriptor in begin_new_exec even if
>> the next interpreter does not support AT_EXECFD.
> 
> Ack. I think the AT_EXECFD thing is a sign that this isn't internal to
> binfmt_misc, but it also shouldn't be gating this issue. In reality,
> ELF is the only real binary format that matters - the script/misc
> binfmts are just indirection entries - and it supports AT_EXECFD, so
> let's just ignore the theoretical case of "maybe nobody exposes it".

Would this potentially make the re-exec-yourself case easier to do at some
point? (Which nommu needs to do, and /proc/self/exe isn't always available.)

Here's the first time I asked about that:

https://lore.kernel.org/lkml/200612261823.07927.rob@landley.net/

Here's the most recent:

https://lkml.org/lkml/2017/9/5/246

Here's someone else asking and being basically told "chroot isn't a thing":

http://lkml.iu.edu/hypermail/linux/kernel/0906.3/00584.html

(See also "CVE-2019-5736" and the workarounds thereto.)

Rob

P.S. Yes I'm aware it would only work properly with static binaries. Not the
first thing that's true for.

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH 3/5] exec: Remove recursion from search_binary_handler
  2020-05-13  2:39                   ` Rob Landley
@ 2020-05-13 19:51                     ` Linus Torvalds
  0 siblings, 0 replies; 122+ messages in thread
From: Linus Torvalds @ 2020-05-13 19:51 UTC (permalink / raw)
  To: Rob Landley
  Cc: Eric W. Biederman, Kees Cook, Tetsuo Handa,
	Linux Kernel Mailing List, Oleg Nesterov, Jann Horn,
	Greg Ungerer, Bernd Edlinger, linux-fsdevel, Al Viro,
	Alexey Dobriyan, Andrew Morton, Casey Schaufler, LSM List,
	James Morris, Serge E. Hallyn, Andy Lutomirski

On Tue, May 12, 2020 at 7:32 PM Rob Landley <rob@landley.net> wrote:
>
> On 5/12/20 7:20 PM, Linus Torvalds wrote:
> > Ack. I think the AT_EXECFD thing is a sign that this isn't internal to
> > binfmt_misc, but it also shouldn't be gating this issue. In reality,
> > ELF is the only real binary format that matters - the script/misc
> > binfmts are just indirection entries - and it supports AT_EXECFD, so
> > let's just ignore the theoretical case of "maybe nobody exposes it".
>
> Would this potentially make the re-exec-yourself case easier to do at some
> point? (Which nommu needs to do, and /proc/self/exe isn't always available.)

AT_EXECFD may be an ELF thing, but normal ELF binaries don't do that
"we have a fd". So it only triggers for binfmt_misc (and only when the
flag is set for "I want the fd").

So no, this wouldn't help re-exec-yourself in general.

Although I guess we could add an ELF section note that does that whole
"executable fd" thing for other things too.

Everything is possible in theory..

               Linus

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH 3/5] exec: Remove recursion from search_binary_handler
  2020-05-11 19:10             ` Rob Landley
@ 2020-05-13 21:59               ` Eric W. Biederman
  2020-05-14 18:46                 ` Rob Landley
  0 siblings, 1 reply; 122+ messages in thread
From: Eric W. Biederman @ 2020-05-13 21:59 UTC (permalink / raw)
  To: Rob Landley
  Cc: Linus Torvalds, Tetsuo Handa, Linux Kernel Mailing List,
	Oleg Nesterov, Jann Horn, Kees Cook, Greg Ungerer,
	Bernd Edlinger, linux-fsdevel, Al Viro, Alexey Dobriyan,
	Andrew Morton, Casey Schaufler, LSM List, James Morris,
	Serge E. Hallyn, Andy Lutomirski, dalias

Rob Landley <rob@landley.net> writes:

> On 5/11/20 9:33 AM, Eric W. Biederman wrote:
>> What I do see is that interp_data is just a parameter that is smuggled
>> into the call of search binary handler.  And the next binary handler
>> needs to be binfmt_elf for it to make much sense, as only binfmt_elf
>> (and binfmt_elf_fdpic) deals with BINPRM_FLAGS_EXECFD.
>
> The binfmt_elf_fdpic driver is separate from binfmt_elf for the same reason
> ext2/ext3/ext4 used to have 3 drivers: fdpic is really just binfmt_elf with the
> 4 main sections (text, data, bss, rodata) able to move independently of each
> other (each tracked with its own base pointer).
>
> It's kind of -fPIE on steroids, and various security people have sniffed at it
> over the years to give ASLR more degrees of freedom on with-MMU systems. Many
> moons ago Rich Felker proposed teaching the fdpic loader how to load normal ELF
> binaries so there's just the one loader (there's a flag in the ELF header to say
> whether the sections are independent or not).

Careful with your terminology.  ELF sections are for .o's For
executables ELF have segments.  And reading through the code it is the
program segments that are independently relocatable.

There is a flag but it is defined per architecture and I don't think one
of the architectures define it.

I looked at ARM and apparently with an MMU ARM turns fdpic binaries into
PIE executables.  I am not certain why.

The registers passed to the entry point are also different for both
cases.

I think it would have been nice if the fdpic support had used a
different ELF type, instead of a different depending on using a
different architecture.

All that aside the core dumping code looks to be essentially the same
between binfmt_elf.c and binfmt_elf_fdpic.c.  Do you think people would
be interested in refactoring binfmt_elf.c and binfmt_elf_fdpic.c so that
they could share the same core dumping code?

Eric

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH 3/5] exec: Remove recursion from search_binary_handler
  2020-05-12 23:51                         ` Kees Cook
@ 2020-05-14 14:56                           ` Eric W. Biederman
  2020-05-14 16:56                             ` Casey Schaufler
  0 siblings, 1 reply; 122+ messages in thread
From: Eric W. Biederman @ 2020-05-14 14:56 UTC (permalink / raw)
  To: Kees Cook
  Cc: Linus Torvalds, Tetsuo Handa, Linux Kernel Mailing List,
	Oleg Nesterov, Jann Horn, Greg Ungerer, Rob Landley,
	Bernd Edlinger, linux-fsdevel, Al Viro, Alexey Dobriyan,
	Andrew Morton, Casey Schaufler, LSM List, James Morris,
	Serge E. Hallyn, Andy Lutomirski

Kees Cook <keescook@chromium.org> writes:

> On Tue, May 12, 2020 at 04:47:14PM -0700, Kees Cook wrote:
>> And now I wonder if qemu actually uses the resulting AT_EXECFD ...
>
> It does, though I'm not sure if this is to support crossing mount points,
> dropping privileges, or something else, since it does fall back to just
> trying to open the file.
>
>     execfd = qemu_getauxval(AT_EXECFD);
>     if (execfd == 0) {
>         execfd = open(filename, O_RDONLY);
>         if (execfd < 0) {
>             printf("Error while loading %s: %s\n", filename, strerror(errno));
>             _exit(EXIT_FAILURE);
>         }
>     }

My hunch is that the fallback exists from a time when the kernel did not
implement AT_EXECFD, or so that qemu can run on kernels that don't
implement AT_EXECFD.  It doesn't really matter unless the executable is
suid, or otherwise changes privileges.


I looked into this a bit to remind myself why exec works the way it
works, with changing privileges.

The classic attack is pointing a symlink at a #! script that is suid or
otherwise changes privileges.  The kernel will open the script and set
the privileges, read the interpreter from the first line, and proceed to
exec the interpreter.  The interpreter will then open the script using
the pathname supplied by the kernel.  The name of the symlink.
Before the interpreter reopens the script the attack would replace
the symlink with a script that does something else, but gets to run
with the privileges of the script.


Defending against that time of check vs time of use attack is why
bprm_fill_uid, and cap_bprm_set_creds use the credentials derived from
the interpreter instead of the credentials derived from the script.


The other defense is to replace the pathname of the executable that the
intepreter will open with /dev/fd/N.

All of this predates Linux entirely.  I do remember this was fixed at
some point in Linux but I don't remember the details.  I can just read
the solution that was picked in the code.



All of this makes me wonder how are the LSMs protected against this
attack.

Let's see the following LSMS implement brpm_set_creds:
tomoyo   - Abuses bprm_set_creds to call tomoyo_load_policy [ safe ]
smack    - Requires CAP_MAC_ADMIN to smack setxattrs        [ vulnerable? ]
           Uses those xattrs in smack_bprm_set_creds
apparmor - Everything is based on names so the symlink      [ safe? ]
           attack won't work as it has the wrong name.
           As long as the trusted names can't be renamed
           apparmor appears good.
selinux  - Appears to let anyone set selinux xattrs         [ safe? ]
           Requires permission for a sid transfer
           As the attack appears not to allow anything that
           would not be allowed anyway it looks like selinux
           is safe.

LSM folks, especially Casey am I reading this correctly?  Did I
correctly infer how your LSMs deal with the time of check to time of use
attack on the script name?

Eric


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH 3/5] exec: Remove recursion from search_binary_handler
  2020-05-13  0:20                 ` Linus Torvalds
  2020-05-13  2:39                   ` Rob Landley
@ 2020-05-14 16:49                   ` Eric W. Biederman
  1 sibling, 0 replies; 122+ messages in thread
From: Eric W. Biederman @ 2020-05-14 16:49 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Kees Cook, Tetsuo Handa, Linux Kernel Mailing List,
	Oleg Nesterov, Jann Horn, Greg Ungerer, Rob Landley,
	Bernd Edlinger, linux-fsdevel, Al Viro, Alexey Dobriyan,
	Andrew Morton, Casey Schaufler, LSM List, James Morris,
	Serge E. Hallyn, Andy Lutomirski

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Tue, May 12, 2020 at 11:46 AM Eric W. Biederman
> <ebiederm@xmission.com> wrote:
>>
>> I am still thinking about this one, but here is where I am at.  At a
>> practical level passing the file descriptor of the script to interpreter
>> seems like something we should encourage in the long term.  It removes
>> races and it is cheaper because then the interpreter does not have to
>> turn around and open the script itself.
>
> Yeah, I think we should continue to support it, because I think it's
> the right thing to do (and we might just end up having compatibility
> issues if we don't).
>
> How about trying to move the logic to the common code, out of binfmt_misc?
>
> IOW, how about something very similar to your "brpm->preserve_creds"
> thing that you did for the credentials (also for binfmt_misc, which
> shouldn't surprise anybody: binfmt_misc is simply the "this is the
> generic thing for letting user mode do the final details").
>
>> Calling fd_install in binfmt_misc still seems wrong, as that exposes
>> the new file descriptor to user space with the old creds.
>
> Right.  And it really would be good to simply not have these kinds of
> very special cases inside the low-level binfmt code: I'd much rather
> have the special cases in the generic code, so that we see what the
> ordering is etc. One of the big problems with all these binfmt
> callbacks has been the fact that it makes it so hard to think about
> and change the generic code, because the low-level binfmt handlers all
> do their own special thing.
>
> So moving it to generic code would likely simplify things from that
> angle, even if the actual complexity of the feature itself remains.
>
> Besides, we really have exposed this to other code anyway thanks to
> that whole bprm->interp_data thing, and the AT_EXECFD AUX entries that
> we have. So it's not really "internal" to binfmt_misc _anyway_.
>
> So how about we just move the fd_binary logic to the generic execve
> code, and just binfmt_misc set the flag for "yes, please do this",
> exactly like "preserve_creds"?
>
>> It is possible although unlikely for userspace to find the file
>> descriptor without consulting AT_EXECFD so just to be conservative I
>> think we should install the file descriptor in begin_new_exec even if
>> the next interpreter does not support AT_EXECFD.
>
> Ack. I think the AT_EXECFD thing is a sign that this isn't internal to
> binfmt_misc, but it also shouldn't be gating this issue. In reality,
> ELF is the only real binary format that matters - the script/misc
> binfmts are just indirection entries - and it supports AT_EXECFD, so
> let's just ignore the theoretical case of "maybe nobody exposes it".
>
> So yes, just make it part of begin_new_exec(), and there's no reason
> to support more than a single fd. No stacks or arrays of these things
> required, I feel. It's not like AT_EXECFD supports the notion of
> multiple fd's being reported anyway, nor does it make any sense to
> have some kind of nested misc->misc binfmt nesting.
>
> So making that whole interp_data and fd_binary thing be a generic
> layer thing would make the search_binary_handler() code in binfmt_misc
> be a pure tailcall too, and then the conversion to a loop ends up
> working and being the right thing.

That is pretty much what I have been thinking.  I have just been taking
it slow so I find as many funny corner cases as I can.

Nothing ever clears the BINPRM_FLAGS_EXECFD so the current code can
not support nesting.

Now I do think a nested misc->misc binfmt thing can make sense in
principal.  I have an old dos spectrum emulator that I use to play some
of the games that I grew up with.  Running that emulator makes me two
emulators deep.  I can also imagine writting a domain specific language
in python or perl, and setting things up so scripts in the domain
specific language can be run directly.

So I think I need to deliberately test and prevent a nested misc->misc,
just so data structures don't get stomped.  If the cases where it could
useful prove sufficiently interesting we can enable them later.

Eric

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH 3/5] exec: Remove recursion from search_binary_handler
  2020-05-14 14:56                           ` Eric W. Biederman
@ 2020-05-14 16:56                             ` Casey Schaufler
  2020-05-14 17:02                               ` Eric W. Biederman
  0 siblings, 1 reply; 122+ messages in thread
From: Casey Schaufler @ 2020-05-14 16:56 UTC (permalink / raw)
  To: Eric W. Biederman, Kees Cook
  Cc: Linus Torvalds, Tetsuo Handa, Linux Kernel Mailing List,
	Oleg Nesterov, Jann Horn, Greg Ungerer, Rob Landley,
	Bernd Edlinger, linux-fsdevel, Al Viro, Alexey Dobriyan,
	Andrew Morton, LSM List, James Morris, Serge E. Hallyn,
	Andy Lutomirski, Casey Schaufler

On 5/14/2020 7:56 AM, Eric W. Biederman wrote:
> Kees Cook <keescook@chromium.org> writes:
>
>> On Tue, May 12, 2020 at 04:47:14PM -0700, Kees Cook wrote:
>>> And now I wonder if qemu actually uses the resulting AT_EXECFD ...
>> It does, though I'm not sure if this is to support crossing mount points,
>> dropping privileges, or something else, since it does fall back to just
>> trying to open the file.
>>
>>     execfd = qemu_getauxval(AT_EXECFD);
>>     if (execfd == 0) {
>>         execfd = open(filename, O_RDONLY);
>>         if (execfd < 0) {
>>             printf("Error while loading %s: %s\n", filename, strerror(errno));
>>             _exit(EXIT_FAILURE);
>>         }
>>     }
> My hunch is that the fallback exists from a time when the kernel did not
> implement AT_EXECFD, or so that qemu can run on kernels that don't
> implement AT_EXECFD.  It doesn't really matter unless the executable is
> suid, or otherwise changes privileges.
>
>
> I looked into this a bit to remind myself why exec works the way it
> works, with changing privileges.
>
> The classic attack is pointing a symlink at a #! script that is suid or
> otherwise changes privileges.  The kernel will open the script and set
> the privileges, read the interpreter from the first line, and proceed to
> exec the interpreter.  The interpreter will then open the script using
> the pathname supplied by the kernel.  The name of the symlink.
> Before the interpreter reopens the script the attack would replace
> the symlink with a script that does something else, but gets to run
> with the privileges of the script.
>
>
> Defending against that time of check vs time of use attack is why
> bprm_fill_uid, and cap_bprm_set_creds use the credentials derived from
> the interpreter instead of the credentials derived from the script.
>
>
> The other defense is to replace the pathname of the executable that the
> intepreter will open with /dev/fd/N.
>
> All of this predates Linux entirely.  I do remember this was fixed at
> some point in Linux but I don't remember the details.  I can just read
> the solution that was picked in the code.
>
>
>
> All of this makes me wonder how are the LSMs protected against this
> attack.
>
> Let's see the following LSMS implement brpm_set_creds:
> tomoyo   - Abuses bprm_set_creds to call tomoyo_load_policy [ safe ]
> smack    - Requires CAP_MAC_ADMIN to smack setxattrs        [ vulnerable? ]
>            Uses those xattrs in smack_bprm_set_creds

What is the concern? If the xattrs change after the check,
the behavior should still be consistent. 

> apparmor - Everything is based on names so the symlink      [ safe? ]
>            attack won't work as it has the wrong name.
>            As long as the trusted names can't be renamed
>            apparmor appears good.
> selinux  - Appears to let anyone set selinux xattrs         [ safe? ]
>            Requires permission for a sid transfer
>            As the attack appears not to allow anything that
>            would not be allowed anyway it looks like selinux
>            is safe.
>
> LSM folks, especially Casey am I reading this correctly?  Did I
> correctly infer how your LSMs deal with the time of check to time of use
> attack on the script name?
>
> Eric
>


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH 3/5] exec: Remove recursion from search_binary_handler
  2020-05-14 16:56                             ` Casey Schaufler
@ 2020-05-14 17:02                               ` Eric W. Biederman
  0 siblings, 0 replies; 122+ messages in thread
From: Eric W. Biederman @ 2020-05-14 17:02 UTC (permalink / raw)
  To: Casey Schaufler
  Cc: Kees Cook, Linus Torvalds, Tetsuo Handa,
	Linux Kernel Mailing List, Oleg Nesterov, Jann Horn,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton, LSM List, James Morris,
	Serge E. Hallyn, Andy Lutomirski

Casey Schaufler <casey@schaufler-ca.com> writes:

> On 5/14/2020 7:56 AM, Eric W. Biederman wrote:
>> Kees Cook <keescook@chromium.org> writes:
>>
>>> On Tue, May 12, 2020 at 04:47:14PM -0700, Kees Cook wrote:
>>>> And now I wonder if qemu actually uses the resulting AT_EXECFD ...
>>> It does, though I'm not sure if this is to support crossing mount points,
>>> dropping privileges, or something else, since it does fall back to just
>>> trying to open the file.
>>>
>>>     execfd = qemu_getauxval(AT_EXECFD);
>>>     if (execfd == 0) {
>>>         execfd = open(filename, O_RDONLY);
>>>         if (execfd < 0) {
>>>             printf("Error while loading %s: %s\n", filename, strerror(errno));
>>>             _exit(EXIT_FAILURE);
>>>         }
>>>     }
>> My hunch is that the fallback exists from a time when the kernel did not
>> implement AT_EXECFD, or so that qemu can run on kernels that don't
>> implement AT_EXECFD.  It doesn't really matter unless the executable is
>> suid, or otherwise changes privileges.
>>
>>
>> I looked into this a bit to remind myself why exec works the way it
>> works, with changing privileges.
>>
>> The classic attack is pointing a symlink at a #! script that is suid or
>> otherwise changes privileges.  The kernel will open the script and set
>> the privileges, read the interpreter from the first line, and proceed to
>> exec the interpreter.  The interpreter will then open the script using
>> the pathname supplied by the kernel.  The name of the symlink.
>> Before the interpreter reopens the script the attack would replace
>> the symlink with a script that does something else, but gets to run
>> with the privileges of the script.
>>
>>
>> Defending against that time of check vs time of use attack is why
>> bprm_fill_uid, and cap_bprm_set_creds use the credentials derived from
>> the interpreter instead of the credentials derived from the script.
>>
>>
>> The other defense is to replace the pathname of the executable that the
>> intepreter will open with /dev/fd/N.
>>
>> All of this predates Linux entirely.  I do remember this was fixed at
>> some point in Linux but I don't remember the details.  I can just read
>> the solution that was picked in the code.
>>
>>
>>
>> All of this makes me wonder how are the LSMs protected against this
>> attack.
>>
>> Let's see the following LSMS implement brpm_set_creds:
>> tomoyo   - Abuses bprm_set_creds to call tomoyo_load_policy [ safe ]
>> smack    - Requires CAP_MAC_ADMIN to smack setxattrs        [ vulnerable? ]
>>            Uses those xattrs in smack_bprm_set_creds
>
> What is the concern? If the xattrs change after the check,
> the behavior should still be consistent.

The concern is that there are xattrs set on a #! script.  Someone
replaces the script after smack reads the xattr and sets bprm->cred but
before the interpreter reopens the script.

In short if there is one script with xattrs set. I can run any script as
if those xattrs were set on it.

I don't know the smack security model well enough to know if that
is a problem or not.  It looks like it may be a concern because smack
limits who can mess with it's security xattrs.

Eric

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH 3/5] exec: Remove recursion from search_binary_handler
  2020-05-13 21:59               ` Eric W. Biederman
@ 2020-05-14 18:46                 ` Rob Landley
  0 siblings, 0 replies; 122+ messages in thread
From: Rob Landley @ 2020-05-14 18:46 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linus Torvalds, Tetsuo Handa, Linux Kernel Mailing List,
	Oleg Nesterov, Jann Horn, Kees Cook, Greg Ungerer,
	Bernd Edlinger, linux-fsdevel, Al Viro, Alexey Dobriyan,
	Andrew Morton, Casey Schaufler, LSM List, James Morris,
	Serge E. Hallyn, Andy Lutomirski, dalias

On 5/13/20 4:59 PM, Eric W. Biederman wrote:
> Careful with your terminology.  ELF sections are for .o's For
> executables ELF have segments.  And reading through the code it is the
> program segments that are independently relocatable.

Sorry, I have trouble keeping this stuff straight when it's not in front of me.
(I have a paperback copy of the old "linkers and loaders" book and it was the
driest thing I have _ever_ slogged through. Back before the Linux Foundation ate
the FSG I was pushing https://refspecs.linuxbase.org/ to include missing ABI
supplement, I have copies of ones it doesn't collected from now long-dead sites...)

But more recently I've just made puppy eyes at Rich Felker to have him fix this
stuff for me, because I do _not_ retain the terminology here. REL vs RELA vs
PLT, can you have a PLT without a GOT...?

> There is a flag but it is defined per architecture and I don't think one
> of the architectures define it.

They all check for one, but I don't remember there being a #define.

I have a todo item to check more architectures' fdpic binaries, this was from
sh2eb (ala j-core):

  https://github.com/landley/toybox/commit/d61aeaf9e#diff-4442ddbb8949R65

There was the out of tree arm fdpic toolchain from the french guys for cortex-m,
and the original frv paper, and in theory blackfin but nothing they touched ever
got merged upstream anywhere:

In _theory_ you could do fdpic for x86, but as with u-boot for x86 nobody ever
bothers because it's got an x86-only solution. (And then the x86 version of
stuff gets pushed to other platforms because all our device tree files were
GPLed so of course acpi for arm became a thing. Sigh...)

> I looked at ARM and apparently with an MMU ARM turns fdpic binaries into
> PIE executables.  I am not certain why.

Falling back to a more widely tested codepath, I expect. Also maybe it saves 3
registers if all 4 are using the same base register? Map them linearly and it
becomes "single base + offset"? Which of course looses the extra ASLR benefits
the security people wanted, but "undoing what the security people want in the
name of an unmeasurable microbenchmark optimization" is a proud tradition.

Just because the 4 segments are compiled as independently relocatable doesn't
mean they HAVE to be. (You'd think the code would be using different register
numbers to index stuff so you'd STILL be using 4 registers, but I haven't looked
at what arm's doing...)

> The registers passed to the entry point are also different for both
> cases.

From the same machine code chunks? I boggle at what the ld.so fixup is doing then...

> I think it would have been nice if the fdpic support had used a
> different ELF type, instead of a different depending on using a
> different architecture.

This is what you get when a blackfin developer talks to the gnu/binutils developers:

  https://sourceware.org/legacy-ml/binutils/2008-04/msg00350.html

> All that aside the core dumping code looks to be essentially the same
> between binfmt_elf.c and binfmt_elf_fdpic.c.  Do you think people would
> be interested in refactoring binfmt_elf.c and binfmt_elf_fdpic.c so that
> they could share the same core dumping code?

I think merging the two of them together entirely would be a good idea, and
anything that can collapse together I'm happy to regression test on sh2.

I also note that qemu-sh4eb can run these binaries, maybe I can whip up a
qemu-system-sh4eb that runs a nommu fdpic userspace...

[hours later]

Ok, here's me asking Rich Felker a question:

>>> So fdpic binaries run under qemu-sh2eb and there's a qemu-system-sh2eb that
>>> SHOULD also be able to run them under the r2d board emulation, and the kernel
>>> builds fine under the sh2eb compiler but I can't enable fdpic support without
>>> CONFIG_NOMMU, and if I yank that dependency from Kconfig (which only sh2 has,
>>> arm and such do fdpic with or without mmu) the build breaks with:
>>>
>>> /home/landley/toybox/clean/ccc/sh2eb-linux-muslfdpic-cross/bin/sh2eb-linux-muslfdpic-ld:
>>> fs/binfmt_elf_fdpic.o: in function `load_elf_fdpic_binary':
>>> binfmt_elf_fdpic.c:(.text+0x1734): undefined reference to
>>> `elf_fdpic_arch_lay_out_mm'
>>>
>>> The problem is if I switch off CONFIG_MMU in the kernel, buckets of stuff in the
>>> r2d board kernel config changes and suddenly I don't get serial output from the
>>> qemu-system-sh2eb -M r2d boot anymore. Before it was running the kernel but just
>>> failing to run init...

And his response:

>> I don't think qemu-system-sh4eb can boot a nommu kernel. But you don't
>> need to in order to do userspace-only testing. Just build a normal
>> sh4eb kernel. It doesn't need CONFIG_BINFMT_ELF_FDPIC. The normal ELF
>> loader can load FDPIC just fine, because a valid FDPIC ELF file is a
>> valid ELF file, just with more constraints (in same sense a square is
>> a rectangle). The normal ELF loader won't independently float the text
>> and data segments, but that's okay because your emulated system has an
>> MMU and can just map them adjacently like they show up in the ELF file
>> with their untransformed addresses.
>> 
>> Now that I think about it, it's possible that the ARM folks broke this
>> when adding support for enabling CONFIG_BINFMT_ELF_FDPIC with MMU. If
>> so, and you find you really do need the FDPIC loader now because they
>> made the normal ELF loader refuse to do it, I think it will suffice to
>> copy the ARM version of elf_fdpic_arch_lay_out_mm from
>> arch/arm/kernel/elf.c to somewhere it will be compiled on SH.

I.E. testing the kernel fdpic loader under qemu is NOT EASY (because the fdpic
loader refuses to build in a with-mmu context, and the relevant board emulations
refuse to build without), but it can fall back to the conventional ELF loader
which collates the segments and treats fdpic as PIE? (Which... is how qemu-sh2eb
application emulation is loading them...?)

Which was news to me...

> Eric

Rob

^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v2 0/8] exec: Control flow simplifications
  2020-05-09 19:40   ` [PATCH 0/5] exec: Control flow simplifications Eric W. Biederman
                       ` (4 preceding siblings ...)
  2020-05-09 19:42     ` [PATCH 5/5] exec: Move the call of prepare_binprm into search_binary_handler Eric W. Biederman
@ 2020-05-19  0:29     ` Eric W. Biederman
  2020-05-19  0:29       ` [PATCH v2 1/8] exec: Teach prepare_exec_creds how exec treats uids & gids Eric W. Biederman
                         ` (10 more replies)
  5 siblings, 11 replies; 122+ messages in thread
From: Eric W. Biederman @ 2020-05-19  0:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Oleg Nesterov, Jann Horn, Kees Cook,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton, Casey Schaufler,
	linux-security-module, James Morris, Serge E. Hallyn,
	Andy Lutomirski


It is hard to follow the control flow in exec.c as the code has evolved over
time and something that used to work one way now works another.  This set of
changes attempts to address the worst of that, to remove unnecessary work
and to make the code a little easier to follow.

The churn is a bit higher than the last version of this patchset, with
renaming and cleaning up of comments.  I have split security_bprm_set_creds
into security_bprm_creds_for_exec and security_bprm_repopulate_creds.  My
goal was to make it clear that one hook completes its work while the other
recaculates it's work each time a new interpreter is selected.

I have added a new change at the beginning to make it clear that neither
security_bprm_creds_for_exec nor security_bprm_repopulate_creds needs to be
implemented as prepare_exec_creds properly does the work of setting up
credentials unless something special is going on.

I have made the execfd support generic and moved out of binfmt_misc so that
I can remove the recursion.

I have moved reassigning bprm->file into the loop that replaces the
recursion.  In doing so I discovered that binfmt_misc was naughty and
was returning -ENOEXEC in such a way that the search_binary_handler loop
could not continue.  So I added a change to remove that naughtiness.

Eric W. Biederman (8):
      exec: Teach prepare_exec_creds how exec treats uids & gids
      exec: Factor security_bprm_creds_for_exec out of security_bprm_set_creds
      exec: Convert security_bprm_set_creds into security_bprm_repopulate_creds
      exec: Allow load_misc_binary to call prepare_binfmt unconditionally
      exec: Move the call of prepare_binprm into search_binary_handler
      exec/binfmt_script: Don't modify bprm->buf and then return -ENOEXEC
      exec: Generic execfd support
      exec: Remove recursion from search_binary_handler

 arch/alpha/kernel/binfmt_loader.c  | 11 +----
 fs/binfmt_elf.c                    |  4 +-
 fs/binfmt_elf_fdpic.c              |  4 +-
 fs/binfmt_em86.c                   | 13 +----
 fs/binfmt_misc.c                   | 69 ++++-----------------------
 fs/binfmt_script.c                 | 82 ++++++++++++++------------------
 fs/exec.c                          | 97 ++++++++++++++++++++++++++------------
 include/linux/binfmts.h            | 36 ++++++--------
 include/linux/lsm_hook_defs.h      |  3 +-
 include/linux/lsm_hooks.h          | 52 +++++++++++---------
 include/linux/security.h           | 14 ++++--
 kernel/cred.c                      |  3 ++
 security/apparmor/domain.c         |  7 +--
 security/apparmor/include/domain.h |  2 +-
 security/apparmor/lsm.c            |  2 +-
 security/commoncap.c               |  9 ++--
 security/security.c                |  9 +++-
 security/selinux/hooks.c           |  8 ++--
 security/smack/smack_lsm.c         |  9 ++--
 security/tomoyo/tomoyo.c           | 12 ++---
 20 files changed, 202 insertions(+), 244 deletions(-)

^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v2 1/8] exec: Teach prepare_exec_creds how exec treats uids & gids
  2020-05-19  0:29     ` [PATCH v2 0/8] exec: Control flow simplifications Eric W. Biederman
@ 2020-05-19  0:29       ` Eric W. Biederman
  2020-05-19 18:03         ` Kees Cook
  2020-05-19  0:30       ` [PATCH v2 2/8] exec: Factor security_bprm_creds_for_exec out of security_bprm_set_creds Eric W. Biederman
                         ` (9 subsequent siblings)
  10 siblings, 1 reply; 122+ messages in thread
From: Eric W. Biederman @ 2020-05-19  0:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Oleg Nesterov, Jann Horn, Kees Cook,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton, Casey Schaufler,
	linux-security-module, James Morris, Serge E. Hallyn,
	Andy Lutomirski


It is almost possible to use the result of prepare_exec_creds with no
modifications during exec.  Update prepare_exec_creds to initialize
the suid and the fsuid to the euid, and the sgid and the fsgid to the
egid.  This is all that is needed to handle the common case of exec
when nothing special like a setuid exec is happening.

That this preserves the existing behavior of exec can be verified
by examing bprm_fill_uid and cap_bprm_set_creds.

This change makes it clear that the later parts of exec that
update bprm->cred are just need to handle special cases such
as setuid exec and change of domains.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/cred.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/cred.c b/kernel/cred.c
index 71a792616917..421b1149c651 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -315,6 +315,9 @@ struct cred *prepare_exec_creds(void)
 	new->process_keyring = NULL;
 #endif
 
+	new->suid = new->fsuid = new->euid;
+	new->sgid = new->fsgid = new->egid;
+
 	return new;
 }
 
-- 
2.25.0


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v2 2/8] exec: Factor security_bprm_creds_for_exec out of security_bprm_set_creds
  2020-05-19  0:29     ` [PATCH v2 0/8] exec: Control flow simplifications Eric W. Biederman
  2020-05-19  0:29       ` [PATCH v2 1/8] exec: Teach prepare_exec_creds how exec treats uids & gids Eric W. Biederman
@ 2020-05-19  0:30       ` Eric W. Biederman
  2020-05-19 15:34         ` Casey Schaufler
  2020-05-19 18:10         ` Kees Cook
  2020-05-19  0:31       ` [PATCH v2 3/8] exec: Convert security_bprm_set_creds into security_bprm_repopulate_creds Eric W. Biederman
                         ` (8 subsequent siblings)
  10 siblings, 2 replies; 122+ messages in thread
From: Eric W. Biederman @ 2020-05-19  0:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Oleg Nesterov, Jann Horn, Kees Cook,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton, Casey Schaufler,
	linux-security-module, James Morris, Serge E. Hallyn,
	Andy Lutomirski


Today security_bprm_set_creds has several implementations:
apparmor_bprm_set_creds, cap_bprm_set_creds, selinux_bprm_set_creds,
smack_bprm_set_creds, and tomoyo_bprm_set_creds.

Except for cap_bprm_set_creds they all test bprm->called_set_creds and
return immediately if it is true.  The function cap_bprm_set_creds
ignores bprm->calld_sed_creds entirely.

Create a new LSM hook security_bprm_creds_for_exec that is called just
before prepare_binprm in __do_execve_file, resulting in a LSM hook
that is called exactly once for the entire of exec.  Modify the bits
of security_bprm_set_creds that only want to be called once per exec
into security_bprm_creds_for_exec, leaving only cap_bprm_set_creds
behind.

Remove bprm->called_set_creds all of it's former users have been moved
to security_bprm_creds_for_exec.

Add or upate comments a appropriate to bring them up to date and
to reflect this change.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/exec.c                          |  6 +++-
 include/linux/binfmts.h            | 18 +++--------
 include/linux/lsm_hook_defs.h      |  1 +
 include/linux/lsm_hooks.h          | 50 +++++++++++++++++-------------
 include/linux/security.h           |  6 ++++
 security/apparmor/domain.c         |  7 ++---
 security/apparmor/include/domain.h |  2 +-
 security/apparmor/lsm.c            |  2 +-
 security/security.c                |  5 +++
 security/selinux/hooks.c           |  8 ++---
 security/smack/smack_lsm.c         |  9 ++----
 security/tomoyo/tomoyo.c           | 12 ++-----
 12 files changed, 63 insertions(+), 63 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 14b786158aa9..9e70da47f8d9 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1640,7 +1640,6 @@ int prepare_binprm(struct linux_binprm *bprm)
 	retval = security_bprm_set_creds(bprm);
 	if (retval)
 		return retval;
-	bprm->called_set_creds = 1;
 
 	memset(bprm->buf, 0, BINPRM_BUF_SIZE);
 	return kernel_read(bprm->file, bprm->buf, BINPRM_BUF_SIZE, &pos);
@@ -1855,6 +1854,11 @@ static int __do_execve_file(int fd, struct filename *filename,
 	if (retval < 0)
 		goto out;
 
+	/* Set the unchanging part of bprm->cred */
+	retval = security_bprm_creds_for_exec(bprm);
+	if (retval)
+		goto out;
+
 	retval = prepare_binprm(bprm);
 	if (retval < 0)
 		goto out;
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index 1b48e2154766..d1217fcdedea 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -27,22 +27,14 @@ struct linux_binprm {
 	unsigned long argmin; /* rlimit marker for copy_strings() */
 	unsigned int
 		/*
-		 * True after the bprm_set_creds hook has been called once
-		 * (multiple calls can be made via prepare_binprm() for
-		 * binfmt_script/misc).
-		 */
-		called_set_creds:1,
-		/*
-		 * True if most recent call to the commoncaps bprm_set_creds
-		 * hook (due to multiple prepare_binprm() calls from the
-		 * binfmt_script/misc handlers) resulted in elevated
-		 * privileges.
+		 * True if most recent call to cap_bprm_set_creds
+		 * resulted in elevated privileges.
 		 */
 		cap_elevated:1,
 		/*
-		 * Set by bprm_set_creds hook to indicate a privilege-gaining
-		 * exec has happened. Used to sanitize execution environment
-		 * and to set AT_SECURE auxv for glibc.
+		 * Set by bprm_creds_for_exec hook to indicate a
+		 * privilege-gaining exec has happened. Used to set
+		 * AT_SECURE auxv for glibc.
 		 */
 		secureexec:1,
 		/*
diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index 9cd4455528e5..aab0695f41df 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -49,6 +49,7 @@ LSM_HOOK(int, 0, syslog, int type)
 LSM_HOOK(int, 0, settime, const struct timespec64 *ts,
 	 const struct timezone *tz)
 LSM_HOOK(int, 0, vm_enough_memory, struct mm_struct *mm, long pages)
+LSM_HOOK(int, 0, bprm_creds_for_exec, struct linux_binprm *bprm)
 LSM_HOOK(int, 0, bprm_set_creds, struct linux_binprm *bprm)
 LSM_HOOK(int, 0, bprm_check_security, struct linux_binprm *bprm)
 LSM_HOOK(void, LSM_RET_VOID, bprm_committing_creds, struct linux_binprm *bprm)
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index 988ca0df7824..c719af37df20 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -34,40 +34,46 @@
  *
  * Security hooks for program execution operations.
  *
+ * @bprm_creds_for_exec:
+ *	If the setup in prepare_exec_creds did not setup @bprm->cred->security
+ *	properly for executing @bprm->file, update the LSM's portion of
+ *	@bprm->cred->security to be what commit_creds needs to install for the
+ *	new program.  This hook may also optionally check permissions
+ *	(e.g. for transitions between security domains).
+ *	The hook must set @bprm->secureexec to 1 if AT_SECURE should be set to
+ *	request libc enable secure mode.
+ *	@bprm contains the linux_binprm structure.
+ *	Return 0 if the hook is successful and permission is granted.
  * @bprm_set_creds:
- *	Save security information in the bprm->security field, typically based
- *	on information about the bprm->file, for later use by the apply_creds
- *	hook.  This hook may also optionally check permissions (e.g. for
+ *	Assuming that the relevant bits of @bprm->cred->security have been
+ *	previously set, examine @bprm->file and regenerate them.  This is
+ *	so that the credentials derived from the interpreter the code is
+ *	actually going to run are used rather than credentials derived
+ *	from a script.  This done because the interpreter binary needs to
+ *	reopen script, and may end up opening something completely different.
+ *	This hook may also optionally check permissions (e.g. for
  *	transitions between security domains).
- *	This hook may be called multiple times during a single execve, e.g. for
- *	interpreters.  The hook can tell whether it has already been called by
- *	checking to see if @bprm->security is non-NULL.  If so, then the hook
- *	may decide either to retain the security information saved earlier or
- *	to replace it.  The hook must set @bprm->secureexec to 1 if a "secure
- *	exec" has happened as a result of this hook call.  The flag is used to
- *	indicate the need for a sanitized execution environment, and is also
- *	passed in the ELF auxiliary table on the initial stack to indicate
- *	whether libc should enable secure mode.
+ *	The hook must set @bprm->cap_elevated to 1 if AT_SECURE should be set to
+ *	request libc enable secure mode.
  *	@bprm contains the linux_binprm structure.
  *	Return 0 if the hook is successful and permission is granted.
  * @bprm_check_security:
  *	This hook mediates the point when a search for a binary handler will
- *	begin.  It allows a check the @bprm->security value which is set in the
- *	preceding set_creds call.  The primary difference from set_creds is
- *	that the argv list and envp list are reliably available in @bprm.  This
- *	hook may be called multiple times during a single execve; and in each
- *	pass set_creds is called first.
+ *	begin.  It allows a check against the @bprm->cred->security value
+ *	which was set in the preceding creds_for_exec call.  The argv list and
+ *	envp list are reliably available in @bprm.  This hook may be called
+ *	multiple times during a single execve.
  *	@bprm contains the linux_binprm structure.
  *	Return 0 if the hook is successful and permission is granted.
  * @bprm_committing_creds:
  *	Prepare to install the new security attributes of a process being
  *	transformed by an execve operation, based on the old credentials
  *	pointed to by @current->cred and the information set in @bprm->cred by
- *	the bprm_set_creds hook.  @bprm points to the linux_binprm structure.
- *	This hook is a good place to perform state changes on the process such
- *	as closing open file descriptors to which access will no longer be
- *	granted when the attributes are changed.  This is called immediately
- *	before commit_creds().
+ *	the bprm_creds_for_exec hook.  @bprm points to the linux_binprm
+ *	structure.  This hook is a good place to perform state changes on the
+ *	process such as closing open file descriptors to which access will no
+ *	longer be granted when the attributes are changed.  This is called
+ *	immediately before commit_creds().
  * @bprm_committed_creds:
  *	Tidy up after the installation of the new security attributes of a
  *	process being transformed by an execve operation.  The new credentials
diff --git a/include/linux/security.h b/include/linux/security.h
index a8d9310472df..1bd7a6582775 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -276,6 +276,7 @@ int security_quota_on(struct dentry *dentry);
 int security_syslog(int type);
 int security_settime64(const struct timespec64 *ts, const struct timezone *tz);
 int security_vm_enough_memory_mm(struct mm_struct *mm, long pages);
+int security_bprm_creds_for_exec(struct linux_binprm *bprm);
 int security_bprm_set_creds(struct linux_binprm *bprm);
 int security_bprm_check(struct linux_binprm *bprm);
 void security_bprm_committing_creds(struct linux_binprm *bprm);
@@ -569,6 +570,11 @@ static inline int security_vm_enough_memory_mm(struct mm_struct *mm, long pages)
 	return __vm_enough_memory(mm, pages, cap_vm_enough_memory(mm, pages));
 }
 
+static inline int security_bprm_creds_for_exec(struct linux_binprm *bprm)
+{
+	return 0;
+}
+
 static inline int security_bprm_set_creds(struct linux_binprm *bprm)
 {
 	return cap_bprm_set_creds(bprm);
diff --git a/security/apparmor/domain.c b/security/apparmor/domain.c
index 6ceb74e0f789..0b870a647488 100644
--- a/security/apparmor/domain.c
+++ b/security/apparmor/domain.c
@@ -854,14 +854,14 @@ static struct aa_label *handle_onexec(struct aa_label *label,
 }
 
 /**
- * apparmor_bprm_set_creds - set the new creds on the bprm struct
+ * apparmor_bprm_creds_for_exec - Update the new creds on the bprm struct
  * @bprm: binprm for the exec  (NOT NULL)
  *
  * Returns: %0 or error on failure
  *
  * TODO: once the other paths are done see if we can't refactor into a fn
  */
-int apparmor_bprm_set_creds(struct linux_binprm *bprm)
+int apparmor_bprm_creds_for_exec(struct linux_binprm *bprm)
 {
 	struct aa_task_ctx *ctx;
 	struct aa_label *label, *new = NULL;
@@ -875,9 +875,6 @@ int apparmor_bprm_set_creds(struct linux_binprm *bprm)
 		file_inode(bprm->file)->i_mode
 	};
 
-	if (bprm->called_set_creds)
-		return 0;
-
 	ctx = task_ctx(current);
 	AA_BUG(!cred_label(bprm->cred));
 	AA_BUG(!ctx);
diff --git a/security/apparmor/include/domain.h b/security/apparmor/include/domain.h
index 21b875fe2d37..d14928fe1c6f 100644
--- a/security/apparmor/include/domain.h
+++ b/security/apparmor/include/domain.h
@@ -30,7 +30,7 @@ struct aa_domain {
 struct aa_label *x_table_lookup(struct aa_profile *profile, u32 xindex,
 				const char **name);
 
-int apparmor_bprm_set_creds(struct linux_binprm *bprm);
+int apparmor_bprm_creds_for_exec(struct linux_binprm *bprm);
 
 void aa_free_domain_entries(struct aa_domain *domain);
 int aa_change_hat(const char *hats[], int count, u64 token, int flags);
diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c
index b621ad74f54a..3623ab08279d 100644
--- a/security/apparmor/lsm.c
+++ b/security/apparmor/lsm.c
@@ -1232,7 +1232,7 @@ static struct security_hook_list apparmor_hooks[] __lsm_ro_after_init = {
 	LSM_HOOK_INIT(cred_prepare, apparmor_cred_prepare),
 	LSM_HOOK_INIT(cred_transfer, apparmor_cred_transfer),
 
-	LSM_HOOK_INIT(bprm_set_creds, apparmor_bprm_set_creds),
+	LSM_HOOK_INIT(bprm_creds_for_exec, apparmor_bprm_creds_for_exec),
 	LSM_HOOK_INIT(bprm_committing_creds, apparmor_bprm_committing_creds),
 	LSM_HOOK_INIT(bprm_committed_creds, apparmor_bprm_committed_creds),
 
diff --git a/security/security.c b/security/security.c
index 7fed24b9d57e..4ee76a729f73 100644
--- a/security/security.c
+++ b/security/security.c
@@ -823,6 +823,11 @@ int security_vm_enough_memory_mm(struct mm_struct *mm, long pages)
 	return __vm_enough_memory(mm, pages, cap_sys_admin);
 }
 
+int security_bprm_creds_for_exec(struct linux_binprm *bprm)
+{
+	return call_int_hook(bprm_creds_for_exec, 0, bprm);
+}
+
 int security_bprm_set_creds(struct linux_binprm *bprm)
 {
 	return call_int_hook(bprm_set_creds, 0, bprm);
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 0b4e32161b77..718345dd76bb 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2286,7 +2286,7 @@ static int check_nnp_nosuid(const struct linux_binprm *bprm,
 	return -EACCES;
 }
 
-static int selinux_bprm_set_creds(struct linux_binprm *bprm)
+static int selinux_bprm_creds_for_exec(struct linux_binprm *bprm)
 {
 	const struct task_security_struct *old_tsec;
 	struct task_security_struct *new_tsec;
@@ -2297,8 +2297,6 @@ static int selinux_bprm_set_creds(struct linux_binprm *bprm)
 
 	/* SELinux context only depends on initial program or script and not
 	 * the script interpreter */
-	if (bprm->called_set_creds)
-		return 0;
 
 	old_tsec = selinux_cred(current_cred());
 	new_tsec = selinux_cred(bprm->cred);
@@ -6385,7 +6383,7 @@ static int selinux_setprocattr(const char *name, void *value, size_t size)
 	/* Permission checking based on the specified context is
 	   performed during the actual operation (execve,
 	   open/mkdir/...), when we know the full context of the
-	   operation.  See selinux_bprm_set_creds for the execve
+	   operation.  See selinux_bprm_creds_for_exec for the execve
 	   checks and may_create for the file creation checks. The
 	   operation will then fail if the context is not permitted. */
 	tsec = selinux_cred(new);
@@ -6914,7 +6912,7 @@ static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = {
 
 	LSM_HOOK_INIT(netlink_send, selinux_netlink_send),
 
-	LSM_HOOK_INIT(bprm_set_creds, selinux_bprm_set_creds),
+	LSM_HOOK_INIT(bprm_creds_for_exec, selinux_bprm_creds_for_exec),
 	LSM_HOOK_INIT(bprm_committing_creds, selinux_bprm_committing_creds),
 	LSM_HOOK_INIT(bprm_committed_creds, selinux_bprm_committed_creds),
 
diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
index 8c61d175e195..0ac8f4518d07 100644
--- a/security/smack/smack_lsm.c
+++ b/security/smack/smack_lsm.c
@@ -891,12 +891,12 @@ static int smack_sb_statfs(struct dentry *dentry)
  */
 
 /**
- * smack_bprm_set_creds - set creds for exec
+ * smack_bprm_creds_for_exec - Update bprm->cred if needed for exec
  * @bprm: the exec information
  *
  * Returns 0 if it gets a blob, -EPERM if exec forbidden and -ENOMEM otherwise
  */
-static int smack_bprm_set_creds(struct linux_binprm *bprm)
+static int smack_bprm_creds_for_exec(struct linux_binprm *bprm)
 {
 	struct inode *inode = file_inode(bprm->file);
 	struct task_smack *bsp = smack_cred(bprm->cred);
@@ -904,9 +904,6 @@ static int smack_bprm_set_creds(struct linux_binprm *bprm)
 	struct superblock_smack *sbsp;
 	int rc;
 
-	if (bprm->called_set_creds)
-		return 0;
-
 	isp = smack_inode(inode);
 	if (isp->smk_task == NULL || isp->smk_task == bsp->smk_task)
 		return 0;
@@ -4598,7 +4595,7 @@ static struct security_hook_list smack_hooks[] __lsm_ro_after_init = {
 	LSM_HOOK_INIT(sb_statfs, smack_sb_statfs),
 	LSM_HOOK_INIT(sb_set_mnt_opts, smack_set_mnt_opts),
 
-	LSM_HOOK_INIT(bprm_set_creds, smack_bprm_set_creds),
+	LSM_HOOK_INIT(bprm_creds_for_exec, smack_bprm_creds_for_exec),
 
 	LSM_HOOK_INIT(inode_alloc_security, smack_inode_alloc_security),
 	LSM_HOOK_INIT(inode_init_security, smack_inode_init_security),
diff --git a/security/tomoyo/tomoyo.c b/security/tomoyo/tomoyo.c
index 716c92ec941a..f9adddc42ac8 100644
--- a/security/tomoyo/tomoyo.c
+++ b/security/tomoyo/tomoyo.c
@@ -63,20 +63,14 @@ static void tomoyo_bprm_committed_creds(struct linux_binprm *bprm)
 
 #ifndef CONFIG_SECURITY_TOMOYO_OMIT_USERSPACE_LOADER
 /**
- * tomoyo_bprm_set_creds - Target for security_bprm_set_creds().
+ * tomoyo_bprm_for_exec - Target for security_bprm_creds_for_exec().
  *
  * @bprm: Pointer to "struct linux_binprm".
  *
  * Returns 0.
  */
-static int tomoyo_bprm_set_creds(struct linux_binprm *bprm)
+static int tomoyo_bprm_creds_for_exec(struct linux_binprm *bprm)
 {
-	/*
-	 * Do only if this function is called for the first time of an execve
-	 * operation.
-	 */
-	if (bprm->called_set_creds)
-		return 0;
 	/*
 	 * Load policy if /sbin/tomoyo-init exists and /sbin/init is requested
 	 * for the first time.
@@ -539,7 +533,7 @@ static struct security_hook_list tomoyo_hooks[] __lsm_ro_after_init = {
 	LSM_HOOK_INIT(task_alloc, tomoyo_task_alloc),
 	LSM_HOOK_INIT(task_free, tomoyo_task_free),
 #ifndef CONFIG_SECURITY_TOMOYO_OMIT_USERSPACE_LOADER
-	LSM_HOOK_INIT(bprm_set_creds, tomoyo_bprm_set_creds),
+	LSM_HOOK_INIT(bprm_creds_for_exec, tomoyo_bprm_creds_for_exec),
 #endif
 	LSM_HOOK_INIT(bprm_check_security, tomoyo_bprm_check_security),
 	LSM_HOOK_INIT(file_fcntl, tomoyo_file_fcntl),
-- 
2.25.0


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v2 3/8] exec: Convert security_bprm_set_creds into security_bprm_repopulate_creds
  2020-05-19  0:29     ` [PATCH v2 0/8] exec: Control flow simplifications Eric W. Biederman
  2020-05-19  0:29       ` [PATCH v2 1/8] exec: Teach prepare_exec_creds how exec treats uids & gids Eric W. Biederman
  2020-05-19  0:30       ` [PATCH v2 2/8] exec: Factor security_bprm_creds_for_exec out of security_bprm_set_creds Eric W. Biederman
@ 2020-05-19  0:31       ` Eric W. Biederman
  2020-05-19 18:21         ` Kees Cook
  2020-05-19 21:52         ` James Morris
  2020-05-19  0:31       ` [PATCH v2 4/8] exec: Allow load_misc_binary to call prepare_binfmt unconditionally Eric W. Biederman
                         ` (7 subsequent siblings)
  10 siblings, 2 replies; 122+ messages in thread
From: Eric W. Biederman @ 2020-05-19  0:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Oleg Nesterov, Jann Horn, Kees Cook,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton, Casey Schaufler,
	linux-security-module, James Morris, Serge E. Hallyn,
	Andy Lutomirski


Rename bprm->cap_elevated to bprm->active_secureexec and initialize it
in prepare_binprm instead of in cap_bprm_set_creds.  Initializing
bprm->active_secureexec in prepare_binprm allows multiple
implementations of security_bprm_repopulate_creds to play nicely with
each other.

Rename security_bprm_set_creds to security_bprm_reopulate_creds to
emphasize that this path recomputes part of bprm->cred.  This
recomputation avoids the time of check vs time of use problems that
are inherent in unix #! interpreters.

In short two renames and a move in the location of initializing
bprm->active_secureexec.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/exec.c                     | 8 ++++----
 include/linux/binfmts.h       | 4 ++--
 include/linux/lsm_hook_defs.h | 2 +-
 include/linux/lsm_hooks.h     | 4 ++--
 include/linux/security.h      | 8 ++++----
 security/commoncap.c          | 9 ++++-----
 security/security.c           | 4 ++--
 7 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 9e70da47f8d9..8e3b93d51d31 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1366,7 +1366,7 @@ int begin_new_exec(struct linux_binprm * bprm)
 	 * the final state of setuid/setgid/fscaps can be merged into the
 	 * secureexec flag.
 	 */
-	bprm->secureexec |= bprm->cap_elevated;
+	bprm->secureexec |= bprm->active_secureexec;
 
 	if (bprm->secureexec) {
 		/* Make sure parent cannot signal privileged process. */
@@ -1634,10 +1634,10 @@ int prepare_binprm(struct linux_binprm *bprm)
 	int retval;
 	loff_t pos = 0;
 
+	/* Recompute parts of bprm->cred based on bprm->file */
+	bprm->active_secureexec = 0;
 	bprm_fill_uid(bprm);
-
-	/* fill in binprm security blob */
-	retval = security_bprm_set_creds(bprm);
+	retval = security_bprm_repopulate_creds(bprm);
 	if (retval)
 		return retval;
 
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index d1217fcdedea..8605ab4a0f89 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -27,10 +27,10 @@ struct linux_binprm {
 	unsigned long argmin; /* rlimit marker for copy_strings() */
 	unsigned int
 		/*
-		 * True if most recent call to cap_bprm_set_creds
+		 * True if most recent call to security_bprm_set_creds
 		 * resulted in elevated privileges.
 		 */
-		cap_elevated:1,
+		active_secureexec:1,
 		/*
 		 * Set by bprm_creds_for_exec hook to indicate a
 		 * privilege-gaining exec has happened. Used to set
diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index aab0695f41df..1e295ba12c0d 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -50,7 +50,7 @@ LSM_HOOK(int, 0, settime, const struct timespec64 *ts,
 	 const struct timezone *tz)
 LSM_HOOK(int, 0, vm_enough_memory, struct mm_struct *mm, long pages)
 LSM_HOOK(int, 0, bprm_creds_for_exec, struct linux_binprm *bprm)
-LSM_HOOK(int, 0, bprm_set_creds, struct linux_binprm *bprm)
+LSM_HOOK(int, 0, bprm_repopulate_creds, struct linux_binprm *bprm)
 LSM_HOOK(int, 0, bprm_check_security, struct linux_binprm *bprm)
 LSM_HOOK(void, LSM_RET_VOID, bprm_committing_creds, struct linux_binprm *bprm)
 LSM_HOOK(void, LSM_RET_VOID, bprm_committed_creds, struct linux_binprm *bprm)
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index c719af37df20..d618ecc4d660 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -44,7 +44,7 @@
  *	request libc enable secure mode.
  *	@bprm contains the linux_binprm structure.
  *	Return 0 if the hook is successful and permission is granted.
- * @bprm_set_creds:
+ * @bprm_repopulate_creds:
  *	Assuming that the relevant bits of @bprm->cred->security have been
  *	previously set, examine @bprm->file and regenerate them.  This is
  *	so that the credentials derived from the interpreter the code is
@@ -53,7 +53,7 @@
  *	reopen script, and may end up opening something completely different.
  *	This hook may also optionally check permissions (e.g. for
  *	transitions between security domains).
- *	The hook must set @bprm->cap_elevated to 1 if AT_SECURE should be set to
+ *	The hook must set @bprm->active_secureexec to 1 if AT_SECURE should be set to
  *	request libc enable secure mode.
  *	@bprm contains the linux_binprm structure.
  *	Return 0 if the hook is successful and permission is granted.
diff --git a/include/linux/security.h b/include/linux/security.h
index 1bd7a6582775..d23f078eb589 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -140,7 +140,7 @@ extern int cap_capset(struct cred *new, const struct cred *old,
 		      const kernel_cap_t *effective,
 		      const kernel_cap_t *inheritable,
 		      const kernel_cap_t *permitted);
-extern int cap_bprm_set_creds(struct linux_binprm *bprm);
+extern int cap_bprm_repopulate_creds(struct linux_binprm *bprm);
 extern int cap_inode_setxattr(struct dentry *dentry, const char *name,
 			      const void *value, size_t size, int flags);
 extern int cap_inode_removexattr(struct dentry *dentry, const char *name);
@@ -277,7 +277,7 @@ int security_syslog(int type);
 int security_settime64(const struct timespec64 *ts, const struct timezone *tz);
 int security_vm_enough_memory_mm(struct mm_struct *mm, long pages);
 int security_bprm_creds_for_exec(struct linux_binprm *bprm);
-int security_bprm_set_creds(struct linux_binprm *bprm);
+int security_bprm_repopulate_creds(struct linux_binprm *bprm);
 int security_bprm_check(struct linux_binprm *bprm);
 void security_bprm_committing_creds(struct linux_binprm *bprm);
 void security_bprm_committed_creds(struct linux_binprm *bprm);
@@ -575,9 +575,9 @@ static inline int security_bprm_creds_for_exec(struct linux_binprm *bprm)
 	return 0;
 }
 
-static inline int security_bprm_set_creds(struct linux_binprm *bprm)
+static inline int security_bprm_repopulate_creds(struct linux_binprm *bprm)
 {
-	return cap_bprm_set_creds(bprm);
+	return cap_bprm_repopluate_creds(bprm);
 }
 
 static inline int security_bprm_check(struct linux_binprm *bprm)
diff --git a/security/commoncap.c b/security/commoncap.c
index f4ee0ae106b2..045b5b80ea40 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -797,14 +797,14 @@ static inline bool nonroot_raised_pE(struct cred *new, const struct cred *old,
 }
 
 /**
- * cap_bprm_set_creds - Set up the proposed credentials for execve().
+ * cap_bprm_repopulate_creds - Set up the proposed credentials for execve().
  * @bprm: The execution parameters, including the proposed creds
  *
  * Set up the proposed credentials for a new execution context being
  * constructed by execve().  The proposed creds in @bprm->cred is altered,
  * which won't take effect immediately.  Returns 0 if successful, -ve on error.
  */
-int cap_bprm_set_creds(struct linux_binprm *bprm)
+int cap_bprm_repopulate_creds(struct linux_binprm *bprm)
 {
 	const struct cred *old = current_cred();
 	struct cred *new = bprm->cred;
@@ -884,12 +884,11 @@ int cap_bprm_set_creds(struct linux_binprm *bprm)
 		return -EPERM;
 
 	/* Check for privilege-elevated exec. */
-	bprm->cap_elevated = 0;
 	if (is_setid ||
 	    (!__is_real(root_uid, new) &&
 	     (effective ||
 	      __cap_grew(permitted, ambient, new))))
-		bprm->cap_elevated = 1;
+		bprm->active_secureexec = 1;
 
 	return 0;
 }
@@ -1346,7 +1345,7 @@ static struct security_hook_list capability_hooks[] __lsm_ro_after_init = {
 	LSM_HOOK_INIT(ptrace_traceme, cap_ptrace_traceme),
 	LSM_HOOK_INIT(capget, cap_capget),
 	LSM_HOOK_INIT(capset, cap_capset),
-	LSM_HOOK_INIT(bprm_set_creds, cap_bprm_set_creds),
+	LSM_HOOK_INIT(bprm_repopulate_creds, cap_bprm_repopulate_creds),
 	LSM_HOOK_INIT(inode_need_killpriv, cap_inode_need_killpriv),
 	LSM_HOOK_INIT(inode_killpriv, cap_inode_killpriv),
 	LSM_HOOK_INIT(inode_getsecurity, cap_inode_getsecurity),
diff --git a/security/security.c b/security/security.c
index 4ee76a729f73..b890b7e2a765 100644
--- a/security/security.c
+++ b/security/security.c
@@ -828,9 +828,9 @@ int security_bprm_creds_for_exec(struct linux_binprm *bprm)
 	return call_int_hook(bprm_creds_for_exec, 0, bprm);
 }
 
-int security_bprm_set_creds(struct linux_binprm *bprm)
+int security_bprm_repopulate_creds(struct linux_binprm *bprm)
 {
-	return call_int_hook(bprm_set_creds, 0, bprm);
+	return call_int_hook(bprm_repopulate_creds, 0, bprm);
 }
 
 int security_bprm_check(struct linux_binprm *bprm)
-- 
2.25.0


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v2 4/8] exec: Allow load_misc_binary to call prepare_binfmt unconditionally
  2020-05-19  0:29     ` [PATCH v2 0/8] exec: Control flow simplifications Eric W. Biederman
                         ` (2 preceding siblings ...)
  2020-05-19  0:31       ` [PATCH v2 3/8] exec: Convert security_bprm_set_creds into security_bprm_repopulate_creds Eric W. Biederman
@ 2020-05-19  0:31       ` Eric W. Biederman
  2020-05-19 18:27         ` Kees Cook
  2020-05-19  0:32       ` [PATCH v2 5/8] exec: Move the call of prepare_binprm into search_binary_handler Eric W. Biederman
                         ` (6 subsequent siblings)
  10 siblings, 1 reply; 122+ messages in thread
From: Eric W. Biederman @ 2020-05-19  0:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Oleg Nesterov, Jann Horn, Kees Cook,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton, Casey Schaufler,
	linux-security-module, James Morris, Serge E. Hallyn,
	Andy Lutomirski


Add a flag preserve_creds that binfmt_misc can set to prevent
credentials from being updated.  This allows binfmt_misc to always
call prepare_binfmt.  Allowing the credential computation logic to be
consolidated.

Not replacing the credentials with the interpreters credentials is
safe because because an open file descriptor to the executable is
passed to the interpreter.   As the interpreter does not need to
reopen the executable it is guaranteed to see the same file that
exec sees.

Ref: c407c033de84 ("[PATCH] binfmt_misc: improve calculation of interpreter's credentials")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/binfmt_misc.c        | 15 +++------------
 fs/exec.c               | 19 ++++++++++++-------
 include/linux/binfmts.h |  2 ++
 3 files changed, 17 insertions(+), 19 deletions(-)

diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c
index cdb45829354d..264829745d6f 100644
--- a/fs/binfmt_misc.c
+++ b/fs/binfmt_misc.c
@@ -218,19 +218,10 @@ static int load_misc_binary(struct linux_binprm *bprm)
 		goto error;
 
 	bprm->file = interp_file;
-	if (fmt->flags & MISC_FMT_CREDENTIALS) {
-		loff_t pos = 0;
-
-		/*
-		 * No need to call prepare_binprm(), it's already been
-		 * done.  bprm->buf is stale, update from interp_file.
-		 */
-		memset(bprm->buf, 0, BINPRM_BUF_SIZE);
-		retval = kernel_read(bprm->file, bprm->buf, BINPRM_BUF_SIZE,
-				&pos);
-	} else
-		retval = prepare_binprm(bprm);
+	if (fmt->flags & MISC_FMT_CREDENTIALS)
+		bprm->preserve_creds = 1;
 
+	retval = prepare_binprm(bprm);
 	if (retval < 0)
 		goto error;
 
diff --git a/fs/exec.c b/fs/exec.c
index 8e3b93d51d31..028e0e323af5 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1631,15 +1631,20 @@ static void bprm_fill_uid(struct linux_binprm *bprm)
  */
 int prepare_binprm(struct linux_binprm *bprm)
 {
-	int retval;
 	loff_t pos = 0;
 
-	/* Recompute parts of bprm->cred based on bprm->file */
-	bprm->active_secureexec = 0;
-	bprm_fill_uid(bprm);
-	retval = security_bprm_repopulate_creds(bprm);
-	if (retval)
-		return retval;
+	/* Can the interpreter get to the executable without races? */
+	if (!bprm->preserve_creds) {
+		int retval;
+
+		/* Recompute parts of bprm->cred based on bprm->file */
+		bprm->active_secureexec = 0;
+		bprm_fill_uid(bprm);
+		retval = security_bprm_repopulate_creds(bprm);
+		if (retval)
+			return retval;
+	}
+	bprm->preserve_creds = 0;
 
 	memset(bprm->buf, 0, BINPRM_BUF_SIZE);
 	return kernel_read(bprm->file, bprm->buf, BINPRM_BUF_SIZE, &pos);
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index 8605ab4a0f89..dbb5614d62a2 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -26,6 +26,8 @@ struct linux_binprm {
 	unsigned long p; /* current top of mem */
 	unsigned long argmin; /* rlimit marker for copy_strings() */
 	unsigned int
+		/* It is safe to use the creds of a script (see binfmt_misc) */
+		preserve_creds:1,
 		/*
 		 * True if most recent call to security_bprm_set_creds
 		 * resulted in elevated privileges.
-- 
2.25.0


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v2 5/8] exec: Move the call of prepare_binprm into search_binary_handler
  2020-05-19  0:29     ` [PATCH v2 0/8] exec: Control flow simplifications Eric W. Biederman
                         ` (3 preceding siblings ...)
  2020-05-19  0:31       ` [PATCH v2 4/8] exec: Allow load_misc_binary to call prepare_binfmt unconditionally Eric W. Biederman
@ 2020-05-19  0:32       ` Eric W. Biederman
  2020-05-19 18:27         ` Kees Cook
  2020-05-19 21:30         ` James Morris
  2020-05-19  0:33       ` [PATCH v2 6/8] exec/binfmt_script: Don't modify bprm->buf and then return -ENOEXEC Eric W. Biederman
                         ` (5 subsequent siblings)
  10 siblings, 2 replies; 122+ messages in thread
From: Eric W. Biederman @ 2020-05-19  0:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Oleg Nesterov, Jann Horn, Kees Cook,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton, Casey Schaufler,
	linux-security-module, James Morris, Serge E. Hallyn,
	Andy Lutomirski


The code in prepare_binary_handler needs to be run every time
search_binary_handler is called so move the call into search_binary_handler
itself to make the code simpler and easier to understand.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/alpha/kernel/binfmt_loader.c |  3 ---
 fs/binfmt_em86.c                  |  4 ----
 fs/binfmt_misc.c                  |  4 ----
 fs/binfmt_script.c                |  3 ---
 fs/exec.c                         | 12 +++++-------
 include/linux/binfmts.h           |  1 -
 6 files changed, 5 insertions(+), 22 deletions(-)

diff --git a/arch/alpha/kernel/binfmt_loader.c b/arch/alpha/kernel/binfmt_loader.c
index a8d0d6e06526..d712ba51d15a 100644
--- a/arch/alpha/kernel/binfmt_loader.c
+++ b/arch/alpha/kernel/binfmt_loader.c
@@ -35,9 +35,6 @@ static int load_binary(struct linux_binprm *bprm)
 
 	bprm->file = file;
 	bprm->loader = loader;
-	retval = prepare_binprm(bprm);
-	if (retval < 0)
-		return retval;
 	return search_binary_handler(bprm);
 }
 
diff --git a/fs/binfmt_em86.c b/fs/binfmt_em86.c
index 466497860c62..cedde2341ade 100644
--- a/fs/binfmt_em86.c
+++ b/fs/binfmt_em86.c
@@ -91,10 +91,6 @@ static int load_em86(struct linux_binprm *bprm)
 
 	bprm->file = file;
 
-	retval = prepare_binprm(bprm);
-	if (retval < 0)
-		return retval;
-
 	return search_binary_handler(bprm);
 }
 
diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c
index 264829745d6f..50a73afdf9b7 100644
--- a/fs/binfmt_misc.c
+++ b/fs/binfmt_misc.c
@@ -221,10 +221,6 @@ static int load_misc_binary(struct linux_binprm *bprm)
 	if (fmt->flags & MISC_FMT_CREDENTIALS)
 		bprm->preserve_creds = 1;
 
-	retval = prepare_binprm(bprm);
-	if (retval < 0)
-		goto error;
-
 	retval = search_binary_handler(bprm);
 	if (retval < 0)
 		goto error;
diff --git a/fs/binfmt_script.c b/fs/binfmt_script.c
index e9e6a6f4a35f..8d718d8fd0fe 100644
--- a/fs/binfmt_script.c
+++ b/fs/binfmt_script.c
@@ -143,9 +143,6 @@ static int load_script(struct linux_binprm *bprm)
 		return PTR_ERR(file);
 
 	bprm->file = file;
-	retval = prepare_binprm(bprm);
-	if (retval < 0)
-		return retval;
 	return search_binary_handler(bprm);
 }
 
diff --git a/fs/exec.c b/fs/exec.c
index 028e0e323af5..5fc458460e44 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1629,7 +1629,7 @@ static void bprm_fill_uid(struct linux_binprm *bprm)
  *
  * This may be called multiple times for binary chains (scripts for example).
  */
-int prepare_binprm(struct linux_binprm *bprm)
+static int prepare_binprm(struct linux_binprm *bprm)
 {
 	loff_t pos = 0;
 
@@ -1650,8 +1650,6 @@ int prepare_binprm(struct linux_binprm *bprm)
 	return kernel_read(bprm->file, bprm->buf, BINPRM_BUF_SIZE, &pos);
 }
 
-EXPORT_SYMBOL(prepare_binprm);
-
 /*
  * Arguments are '\0' separated strings found at the location bprm->p
  * points to; chop off the first by relocating brpm->p to right after
@@ -1707,6 +1705,10 @@ int search_binary_handler(struct linux_binprm *bprm)
 	if (bprm->recursion_depth > 5)
 		return -ELOOP;
 
+	retval = prepare_binprm(bprm);
+	if (retval < 0)
+		return retval;
+
 	retval = security_bprm_check(bprm);
 	if (retval)
 		return retval;
@@ -1864,10 +1866,6 @@ static int __do_execve_file(int fd, struct filename *filename,
 	if (retval)
 		goto out;
 
-	retval = prepare_binprm(bprm);
-	if (retval < 0)
-		goto out;
-
 	retval = copy_strings_kernel(1, &bprm->filename, bprm);
 	if (retval < 0)
 		goto out;
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index dbb5614d62a2..8c7779d6bf19 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -116,7 +116,6 @@ static inline void insert_binfmt(struct linux_binfmt *fmt)
 
 extern void unregister_binfmt(struct linux_binfmt *);
 
-extern int prepare_binprm(struct linux_binprm *);
 extern int __must_check remove_arg_zero(struct linux_binprm *);
 extern int search_binary_handler(struct linux_binprm *);
 extern int begin_new_exec(struct linux_binprm * bprm);
-- 
2.25.0


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v2 6/8] exec/binfmt_script: Don't modify bprm->buf and then return -ENOEXEC
  2020-05-19  0:29     ` [PATCH v2 0/8] exec: Control flow simplifications Eric W. Biederman
                         ` (4 preceding siblings ...)
  2020-05-19  0:32       ` [PATCH v2 5/8] exec: Move the call of prepare_binprm into search_binary_handler Eric W. Biederman
@ 2020-05-19  0:33       ` Eric W. Biederman
  2020-05-19 19:08         ` Kees Cook
  2020-05-19  0:33       ` [PATCH v2 7/8] exec: Generic execfd support Eric W. Biederman
                         ` (4 subsequent siblings)
  10 siblings, 1 reply; 122+ messages in thread
From: Eric W. Biederman @ 2020-05-19  0:33 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Oleg Nesterov, Jann Horn, Kees Cook,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton, Casey Schaufler,
	linux-security-module, James Morris, Serge E. Hallyn,
	Andy Lutomirski


The return code -ENOEXEC serves to tell search_binary_handler that it
should continue searching for the binfmt to handle a given file.  This
makes return -ENOEXEC with a bprm->buf that is needed to continue the
search problematic.

The current binfmt_script manages to escape problems as it closes and
clears bprm->file before return -ENOEXEC with bprm->buf modified.
This prevents search_binary_handler from looping as it explicitly
handles a NULL bprm->file.

I plan on moving all of the bprm->file managment into fs/exec.c and out
of the binary handlers so this will become a problem.

Move closing bprm->file and the test for BINPRM_PATH_INACCESSIBLE
down below the last return of -ENOEXEC.

Introduce i_sep and i_end to track the end of the first argument and
the end of the parameters respectively.  Using those, constification
of all char * pointers, and the helpers next_terminator and
next_non_spacetab guarantee the parameter parsing will not modify
bprm->buf.

Only modify bprm->buf to terminate the strings i_arg and i_name with
'\0' for passing to copy_strings_kernel.

When replacing loops with next_non_spacetab and next_terminator care
has been take that the logic of the parsing code (short of replacing
characters by '\0') remains the same.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/binfmt_script.c | 80 ++++++++++++++++++++++------------------------
 1 file changed, 38 insertions(+), 42 deletions(-)

diff --git a/fs/binfmt_script.c b/fs/binfmt_script.c
index 8d718d8fd0fe..85e0ef86eb11 100644
--- a/fs/binfmt_script.c
+++ b/fs/binfmt_script.c
@@ -16,14 +16,14 @@
 #include <linux/fs.h>
 
 static inline bool spacetab(char c) { return c == ' ' || c == '\t'; }
-static inline char *next_non_spacetab(char *first, const char *last)
+static inline const char *next_non_spacetab(const char *first, const char *last)
 {
 	for (; first <= last; first++)
 		if (!spacetab(*first))
 			return first;
 	return NULL;
 }
-static inline char *next_terminator(char *first, const char *last)
+static inline const char *next_terminator(const char *first, const char *last)
 {
 	for (; first <= last; first++)
 		if (spacetab(*first) || !*first)
@@ -33,8 +33,7 @@ static inline char *next_terminator(char *first, const char *last)
 
 static int load_script(struct linux_binprm *bprm)
 {
-	const char *i_arg, *i_name;
-	char *cp, *buf_end;
+	const char *i_name, *i_sep, *i_arg, *i_end, *buf_end;
 	struct file *file;
 	int retval;
 
@@ -42,20 +41,6 @@ static int load_script(struct linux_binprm *bprm)
 	if ((bprm->buf[0] != '#') || (bprm->buf[1] != '!'))
 		return -ENOEXEC;
 
-	/*
-	 * If the script filename will be inaccessible after exec, typically
-	 * because it is a "/dev/fd/<fd>/.." path against an O_CLOEXEC fd, give
-	 * up now (on the assumption that the interpreter will want to load
-	 * this file).
-	 */
-	if (bprm->interp_flags & BINPRM_FLAGS_PATH_INACCESSIBLE)
-		return -ENOENT;
-
-	/* Release since we are not mapping a binary into memory. */
-	allow_write_access(bprm->file);
-	fput(bprm->file);
-	bprm->file = NULL;
-
 	/*
 	 * This section handles parsing the #! line into separate
 	 * interpreter path and argument strings. We must be careful
@@ -71,39 +56,48 @@ static int load_script(struct linux_binprm *bprm)
 	 * parse them on its own.
 	 */
 	buf_end = bprm->buf + sizeof(bprm->buf) - 1;
-	cp = strnchr(bprm->buf, sizeof(bprm->buf), '\n');
-	if (!cp) {
-		cp = next_non_spacetab(bprm->buf + 2, buf_end);
-		if (!cp)
+	i_end = strnchr(bprm->buf, sizeof(bprm->buf), '\n');
+	if (!i_end) {
+		i_end = next_non_spacetab(bprm->buf + 2, buf_end);
+		if (!i_end)
 			return -ENOEXEC; /* Entire buf is spaces/tabs */
 		/*
 		 * If there is no later space/tab/NUL we must assume the
 		 * interpreter path is truncated.
 		 */
-		if (!next_terminator(cp, buf_end))
+		if (!next_terminator(i_end, buf_end))
 			return -ENOEXEC;
-		cp = buf_end;
+		i_end = buf_end;
 	}
-	/* NUL-terminate the buffer and any trailing spaces/tabs. */
-	*cp = '\0';
-	while (cp > bprm->buf) {
-		cp--;
-		if ((*cp == ' ') || (*cp == '\t'))
-			*cp = '\0';
-		else
-			break;
-	}
-	for (cp = bprm->buf+2; (*cp == ' ') || (*cp == '\t'); cp++);
-	if (*cp == '\0')
+	/* Trim any trailing spaces/tabs from i_end */
+	while (spacetab(i_end[-1]))
+		i_end--;
+
+	/* Skip over leading spaces/tabs */
+	i_name = next_non_spacetab(bprm->buf+2, i_end);
+	if (!i_name || (i_name == i_end))
 		return -ENOEXEC; /* No interpreter name found */
-	i_name = cp;
+
+	/* Is there an optional argument? */
 	i_arg = NULL;
-	for ( ; *cp && (*cp != ' ') && (*cp != '\t'); cp++)
-		/* nothing */ ;
-	while ((*cp == ' ') || (*cp == '\t'))
-		*cp++ = '\0';
-	if (*cp)
-		i_arg = cp;
+	i_sep = next_terminator(i_name, i_end);
+	if (i_sep && (*i_sep != '\0'))
+		i_arg = next_non_spacetab(i_sep, i_end);
+
+	/*
+	 * If the script filename will be inaccessible after exec, typically
+	 * because it is a "/dev/fd/<fd>/.." path against an O_CLOEXEC fd, give
+	 * up now (on the assumption that the interpreter will want to load
+	 * this file).
+	 */
+	if (bprm->interp_flags & BINPRM_FLAGS_PATH_INACCESSIBLE)
+		return -ENOENT;
+
+	/* Release since we are not mapping a binary into memory. */
+	allow_write_access(bprm->file);
+	fput(bprm->file);
+	bprm->file = NULL;
+
 	/*
 	 * OK, we've parsed out the interpreter name and
 	 * (optional) argument.
@@ -121,7 +115,9 @@ static int load_script(struct linux_binprm *bprm)
 	if (retval < 0)
 		return retval;
 	bprm->argc++;
+	*((char *)i_end) = '\0';
 	if (i_arg) {
+		*((char *)i_sep) = '\0';
 		retval = copy_strings_kernel(1, &i_arg, bprm);
 		if (retval < 0)
 			return retval;
-- 
2.25.0


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v2 7/8] exec: Generic execfd support
  2020-05-19  0:29     ` [PATCH v2 0/8] exec: Control flow simplifications Eric W. Biederman
                         ` (5 preceding siblings ...)
  2020-05-19  0:33       ` [PATCH v2 6/8] exec/binfmt_script: Don't modify bprm->buf and then return -ENOEXEC Eric W. Biederman
@ 2020-05-19  0:33       ` Eric W. Biederman
  2020-05-19 19:46         ` Kees Cook
  2020-05-19 21:59         ` Rob Landley
  2020-05-19  0:34       ` [PATCH v2 8/8] exec: Remove recursion from search_binary_handler Eric W. Biederman
                         ` (3 subsequent siblings)
  10 siblings, 2 replies; 122+ messages in thread
From: Eric W. Biederman @ 2020-05-19  0:33 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Oleg Nesterov, Jann Horn, Kees Cook,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton, Casey Schaufler,
	linux-security-module, James Morris, Serge E. Hallyn,
	Andy Lutomirski


Most of the support for passing the file descriptor of an executable
to an interpreter already lives in the generic code and in binfmt_elf.
Rework the fields in binfmt_elf that deal with executable file
descriptor passing to make executable file descriptor passing a first
class concept.

Move the fd_install from binfmt_misc into begin_new_exec after the new
creds have been installed.  This means that accessing the file through
/proc/<pid>/fd/N is able to see the creds for the new executable
before allowing access to the new executables files.

Performing the install of the executables file descriptor after
the point of no return also means that nothing special needs to
be done on error.  The exiting of the process will close all
of it's open files.

Move the would_dump from binfmt_misc into begin_new_exec right
after would_dump is called on the bprm->file.  This makes it
obvious this case exists and that no nesting of bprm->file is
currently supported.

In binfmt_misc the movement of fd_install into generic code means
that it's special error exit path is no longer needed.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 fs/binfmt_elf.c         |  4 ++--
 fs/binfmt_elf_fdpic.c   |  4 ++--
 fs/binfmt_misc.c        | 40 ++++++++--------------------------------
 fs/exec.c               | 15 +++++++++++++++
 include/linux/binfmts.h | 10 +++++-----
 5 files changed, 32 insertions(+), 41 deletions(-)

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 396d5c2e6b5e..441c85f04dfd 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -273,8 +273,8 @@ create_elf_tables(struct linux_binprm *bprm, const struct elfhdr *exec,
 		NEW_AUX_ENT(AT_BASE_PLATFORM,
 			    (elf_addr_t)(unsigned long)u_base_platform);
 	}
-	if (bprm->interp_flags & BINPRM_FLAGS_EXECFD) {
-		NEW_AUX_ENT(AT_EXECFD, bprm->interp_data);
+	if (bprm->have_execfd) {
+		NEW_AUX_ENT(AT_EXECFD, bprm->execfd);
 	}
 #undef NEW_AUX_ENT
 	/* AT_NULL is zero; clear the rest too */
diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c
index 896e3ca9bf85..2d5e9eb12075 100644
--- a/fs/binfmt_elf_fdpic.c
+++ b/fs/binfmt_elf_fdpic.c
@@ -628,10 +628,10 @@ static int create_elf_fdpic_tables(struct linux_binprm *bprm,
 			    (elf_addr_t) (unsigned long) u_base_platform);
 	}
 
-	if (bprm->interp_flags & BINPRM_FLAGS_EXECFD) {
+	if (bprm->have_execfd) {
 		nr = 0;
 		csp -= 2 * sizeof(unsigned long);
-		NEW_AUX_ENT(AT_EXECFD, bprm->interp_data);
+		NEW_AUX_ENT(AT_EXECFD, bprm->execfd);
 	}
 
 	nr = 0;
diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c
index 50a73afdf9b7..ad2866f28f0c 100644
--- a/fs/binfmt_misc.c
+++ b/fs/binfmt_misc.c
@@ -134,7 +134,6 @@ static int load_misc_binary(struct linux_binprm *bprm)
 	Node *fmt;
 	struct file *interp_file = NULL;
 	int retval;
-	int fd_binary = -1;
 
 	retval = -ENOEXEC;
 	if (!enabled)
@@ -161,29 +160,12 @@ static int load_misc_binary(struct linux_binprm *bprm)
 	}
 
 	if (fmt->flags & MISC_FMT_OPEN_BINARY) {
-
-		/* if the binary should be opened on behalf of the
-		 * interpreter than keep it open and assign descriptor
-		 * to it
-		 */
-		fd_binary = get_unused_fd_flags(0);
-		if (fd_binary < 0) {
-			retval = fd_binary;
-			goto ret;
-		}
-		fd_install(fd_binary, bprm->file);
-
-		/* if the binary is not readable than enforce mm->dumpable=0
-		   regardless of the interpreter's permissions */
-		would_dump(bprm, bprm->file);
+		/* Pass the open binary to the interpreter */
+		bprm->have_execfd = 1;
+		bprm->executable = bprm->file;
 
 		allow_write_access(bprm->file);
 		bprm->file = NULL;
-
-		/* mark the bprm that fd should be passed to interp */
-		bprm->interp_flags |= BINPRM_FLAGS_EXECFD;
-		bprm->interp_data = fd_binary;
-
 	} else {
 		allow_write_access(bprm->file);
 		fput(bprm->file);
@@ -192,19 +174,19 @@ static int load_misc_binary(struct linux_binprm *bprm)
 	/* make argv[1] be the path to the binary */
 	retval = copy_strings_kernel(1, &bprm->interp, bprm);
 	if (retval < 0)
-		goto error;
+		goto ret;
 	bprm->argc++;
 
 	/* add the interp as argv[0] */
 	retval = copy_strings_kernel(1, &fmt->interpreter, bprm);
 	if (retval < 0)
-		goto error;
+		goto ret;
 	bprm->argc++;
 
 	/* Update interp in case binfmt_script needs it. */
 	retval = bprm_change_interp(fmt->interpreter, bprm);
 	if (retval < 0)
-		goto error;
+		goto ret;
 
 	if (fmt->flags & MISC_FMT_OPEN_FILE) {
 		interp_file = file_clone_open(fmt->interp_file);
@@ -215,7 +197,7 @@ static int load_misc_binary(struct linux_binprm *bprm)
 	}
 	retval = PTR_ERR(interp_file);
 	if (IS_ERR(interp_file))
-		goto error;
+		goto ret;
 
 	bprm->file = interp_file;
 	if (fmt->flags & MISC_FMT_CREDENTIALS)
@@ -223,17 +205,11 @@ static int load_misc_binary(struct linux_binprm *bprm)
 
 	retval = search_binary_handler(bprm);
 	if (retval < 0)
-		goto error;
+		goto ret;
 
 ret:
 	dput(fmt->dentry);
 	return retval;
-error:
-	if (fd_binary > 0)
-		ksys_close(fd_binary);
-	bprm->interp_flags = 0;
-	bprm->interp_data = 0;
-	goto ret;
 }
 
 /* Command parsers */
diff --git a/fs/exec.c b/fs/exec.c
index 5fc458460e44..ca91393893ea 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1323,7 +1323,10 @@ int begin_new_exec(struct linux_binprm * bprm)
 	 */
 	set_mm_exe_file(bprm->mm, bprm->file);
 
+	/* If the binary is not readable than enforce mm->dumpable=0 */
 	would_dump(bprm, bprm->file);
+	if (bprm->have_execfd)
+		would_dump(bprm, bprm->executable);
 
 	/*
 	 * Release all of the old mmap stuff
@@ -1427,6 +1430,16 @@ int begin_new_exec(struct linux_binprm * bprm)
 	 * credentials; any time after this it may be unlocked.
 	 */
 	security_bprm_committed_creds(bprm);
+
+	/* Pass the opened binary to the interpreter. */
+	if (bprm->have_execfd) {
+		retval = get_unused_fd_flags(0);
+		if (retval < 0)
+			goto out_unlock;
+		fd_install(retval, bprm->executable);
+		bprm->executable = NULL;
+		bprm->execfd = retval;
+	}
 	return 0;
 
 out_unlock:
@@ -1516,6 +1529,8 @@ static void free_bprm(struct linux_binprm *bprm)
 		allow_write_access(bprm->file);
 		fput(bprm->file);
 	}
+	if (bprm->executable)
+		fput(bprm->executable);
 	/* If a binfmt changed the interp, free it. */
 	if (bprm->interp != bprm->filename)
 		kfree(bprm->interp);
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index 8c7779d6bf19..653508b25815 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -26,6 +26,9 @@ struct linux_binprm {
 	unsigned long p; /* current top of mem */
 	unsigned long argmin; /* rlimit marker for copy_strings() */
 	unsigned int
+		/* Should an execfd be passed to userspace? */
+		have_execfd:1,
+
 		/* It is safe to use the creds of a script (see binfmt_misc) */
 		preserve_creds:1,
 		/*
@@ -48,6 +51,7 @@ struct linux_binprm {
 	unsigned int taso:1;
 #endif
 	unsigned int recursion_depth; /* only for search_binary_handler() */
+	struct file * executable; /* Executable to pass to the interpreter */
 	struct file * file;
 	struct cred *cred;	/* new credentials */
 	int unsafe;		/* how unsafe this exec is (mask of LSM_UNSAFE_*) */
@@ -58,7 +62,7 @@ struct linux_binprm {
 				   of the time same as filename, but could be
 				   different for binfmt_{misc,script} */
 	unsigned interp_flags;
-	unsigned interp_data;
+	int execfd;		/* File descriptor of the executable */
 	unsigned long loader, exec;
 
 	struct rlimit rlim_stack; /* Saved RLIMIT_STACK used during exec. */
@@ -69,10 +73,6 @@ struct linux_binprm {
 #define BINPRM_FLAGS_ENFORCE_NONDUMP_BIT 0
 #define BINPRM_FLAGS_ENFORCE_NONDUMP (1 << BINPRM_FLAGS_ENFORCE_NONDUMP_BIT)
 
-/* fd of the binary should be passed to the interpreter */
-#define BINPRM_FLAGS_EXECFD_BIT 1
-#define BINPRM_FLAGS_EXECFD (1 << BINPRM_FLAGS_EXECFD_BIT)
-
 /* filename of the binary will be inaccessible after exec */
 #define BINPRM_FLAGS_PATH_INACCESSIBLE_BIT 2
 #define BINPRM_FLAGS_PATH_INACCESSIBLE (1 << BINPRM_FLAGS_PATH_INACCESSIBLE_BIT)
-- 
2.25.0


^ permalink raw reply	[flat|nested] 122+ messages in thread

* [PATCH v2 8/8] exec: Remove recursion from search_binary_handler
  2020-05-19  0:29     ` [PATCH v2 0/8] exec: Control flow simplifications Eric W. Biederman
                         ` (6 preceding siblings ...)
  2020-05-19  0:33       ` [PATCH v2 7/8] exec: Generic execfd support Eric W. Biederman
@ 2020-05-19  0:34       ` Eric W. Biederman
  2020-05-19 20:37         ` Kees Cook
  2020-05-19  1:25       ` [PATCH v2 0/8] exec: Control flow simplifications Linus Torvalds
                         ` (2 subsequent siblings)
  10 siblings, 1 reply; 122+ messages in thread
From: Eric W. Biederman @ 2020-05-19  0:34 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Oleg Nesterov, Jann Horn, Kees Cook,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton, Casey Schaufler,
	linux-security-module, James Morris, Serge E. Hallyn,
	Andy Lutomirski


Recursion in kernel code is generally a bad idea as it can overflow
the kernel stack.  Recursion in exec also hides that the code is
looping and that the loop changes bprm->file.

Instead of recursing in search_binary_handler have the methods that
would recurse set bprm->interpreter and return 0.  Modify exec_binprm
to loop when bprm->interpreter is set.  Consolidate all of the
reassignments of bprm->file in that loop to make it clear what is
going on.

The structure of the new loop in exec_binprm is that all errors return
immediately, while successful completion (ret == 0 &&
!bprm->interpreter) just breaks out of the loop and runs what
exec_bprm has always run upon successful completion.

Fail if the an interpreter is being call after execfd has been set.
The code has never properly handled an interpreter being called with
execfd being set and with reassignments of bprm->file and the
assignment of bprm->executable in generic code it has finally become
possible to test and fail when if this problematic condition happens.

With the reassignments of bprm->file and the assignment of
bprm->executable moved into the generic code add a test to see if
bprm->executable is being reassigned.

In search_binary_handler remove the test for !bprm->file.  With all
reassignments of bprm->file moved to exec_binprm bprm->file can never
be NULL in search_binary_handler.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 arch/alpha/kernel/binfmt_loader.c |  8 ++---
 fs/binfmt_em86.c                  |  9 ++----
 fs/binfmt_misc.c                  | 18 ++---------
 fs/binfmt_script.c                |  9 ++----
 fs/exec.c                         | 51 ++++++++++++++++++++-----------
 include/linux/binfmts.h           |  3 +-
 6 files changed, 43 insertions(+), 55 deletions(-)

diff --git a/arch/alpha/kernel/binfmt_loader.c b/arch/alpha/kernel/binfmt_loader.c
index d712ba51d15a..e4be7a543ecf 100644
--- a/arch/alpha/kernel/binfmt_loader.c
+++ b/arch/alpha/kernel/binfmt_loader.c
@@ -19,10 +19,6 @@ static int load_binary(struct linux_binprm *bprm)
 	if (bprm->loader)
 		return -ENOEXEC;
 
-	allow_write_access(bprm->file);
-	fput(bprm->file);
-	bprm->file = NULL;
-
 	loader = bprm->vma->vm_end - sizeof(void *);
 
 	file = open_exec("/sbin/loader");
@@ -33,9 +29,9 @@ static int load_binary(struct linux_binprm *bprm)
 	/* Remember if the application is TASO.  */
 	bprm->taso = eh->ah.entry < 0x100000000UL;
 
-	bprm->file = file;
+	bprm->interpreter = file;
 	bprm->loader = loader;
-	return search_binary_handler(bprm);
+	return 0;
 }
 
 static struct linux_binfmt loader_format = {
diff --git a/fs/binfmt_em86.c b/fs/binfmt_em86.c
index cedde2341ade..995883693cb2 100644
--- a/fs/binfmt_em86.c
+++ b/fs/binfmt_em86.c
@@ -48,10 +48,6 @@ static int load_em86(struct linux_binprm *bprm)
 	if (bprm->interp_flags & BINPRM_FLAGS_PATH_INACCESSIBLE)
 		return -ENOENT;
 
-	allow_write_access(bprm->file);
-	fput(bprm->file);
-	bprm->file = NULL;
-
 	/* Unlike in the script case, we don't have to do any hairy
 	 * parsing to find our interpreter... it's hardcoded!
 	 */
@@ -89,9 +85,8 @@ static int load_em86(struct linux_binprm *bprm)
 	if (IS_ERR(file))
 		return PTR_ERR(file);
 
-	bprm->file = file;
-
-	return search_binary_handler(bprm);
+	bprm->interpreter = file;
+	return 0;
 }
 
 static struct linux_binfmt em86_format = {
diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c
index ad2866f28f0c..53968ea07b57 100644
--- a/fs/binfmt_misc.c
+++ b/fs/binfmt_misc.c
@@ -159,18 +159,9 @@ static int load_misc_binary(struct linux_binprm *bprm)
 			goto ret;
 	}
 
-	if (fmt->flags & MISC_FMT_OPEN_BINARY) {
-		/* Pass the open binary to the interpreter */
+	if (fmt->flags & MISC_FMT_OPEN_BINARY)
 		bprm->have_execfd = 1;
-		bprm->executable = bprm->file;
 
-		allow_write_access(bprm->file);
-		bprm->file = NULL;
-	} else {
-		allow_write_access(bprm->file);
-		fput(bprm->file);
-		bprm->file = NULL;
-	}
 	/* make argv[1] be the path to the binary */
 	retval = copy_strings_kernel(1, &bprm->interp, bprm);
 	if (retval < 0)
@@ -199,14 +190,11 @@ static int load_misc_binary(struct linux_binprm *bprm)
 	if (IS_ERR(interp_file))
 		goto ret;
 
-	bprm->file = interp_file;
+	bprm->interpreter = interp_file;
 	if (fmt->flags & MISC_FMT_CREDENTIALS)
 		bprm->preserve_creds = 1;
 
-	retval = search_binary_handler(bprm);
-	if (retval < 0)
-		goto ret;
-
+	retval = 0;
 ret:
 	dput(fmt->dentry);
 	return retval;
diff --git a/fs/binfmt_script.c b/fs/binfmt_script.c
index 85e0ef86eb11..0e8b953d12cf 100644
--- a/fs/binfmt_script.c
+++ b/fs/binfmt_script.c
@@ -93,11 +93,6 @@ static int load_script(struct linux_binprm *bprm)
 	if (bprm->interp_flags & BINPRM_FLAGS_PATH_INACCESSIBLE)
 		return -ENOENT;
 
-	/* Release since we are not mapping a binary into memory. */
-	allow_write_access(bprm->file);
-	fput(bprm->file);
-	bprm->file = NULL;
-
 	/*
 	 * OK, we've parsed out the interpreter name and
 	 * (optional) argument.
@@ -138,8 +133,8 @@ static int load_script(struct linux_binprm *bprm)
 	if (IS_ERR(file))
 		return PTR_ERR(file);
 
-	bprm->file = file;
-	return search_binary_handler(bprm);
+	bprm->interpreter = file;
+	return 0;
 }
 
 static struct linux_binfmt script_format = {
diff --git a/fs/exec.c b/fs/exec.c
index ca91393893ea..47d831e5efde 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1710,16 +1710,12 @@ EXPORT_SYMBOL(remove_arg_zero);
 /*
  * cycle the list of binary formats handler, until one recognizes the image
  */
-int search_binary_handler(struct linux_binprm *bprm)
+static int search_binary_handler(struct linux_binprm *bprm)
 {
 	bool need_retry = IS_ENABLED(CONFIG_MODULES);
 	struct linux_binfmt *fmt;
 	int retval;
 
-	/* This allows 4 levels of binfmt rewrites before failing hard. */
-	if (bprm->recursion_depth > 5)
-		return -ELOOP;
-
 	retval = prepare_binprm(bprm);
 	if (retval < 0)
 		return retval;
@@ -1736,14 +1732,11 @@ int search_binary_handler(struct linux_binprm *bprm)
 			continue;
 		read_unlock(&binfmt_lock);
 
-		bprm->recursion_depth++;
 		retval = fmt->load_binary(bprm);
-		bprm->recursion_depth--;
 
 		read_lock(&binfmt_lock);
 		put_binfmt(fmt);
-		if (bprm->point_of_no_return || !bprm->file ||
-		    (retval != -ENOEXEC)) {
+		if (bprm->point_of_no_return || (retval != -ENOEXEC)) {
 			read_unlock(&binfmt_lock);
 			return retval;
 		}
@@ -1762,12 +1755,11 @@ int search_binary_handler(struct linux_binprm *bprm)
 
 	return retval;
 }
-EXPORT_SYMBOL(search_binary_handler);
 
 static int exec_binprm(struct linux_binprm *bprm)
 {
 	pid_t old_pid, old_vpid;
-	int ret;
+	int ret, depth;
 
 	/* Need to fetch pid before load_binary changes it */
 	old_pid = current->pid;
@@ -1775,15 +1767,38 @@ static int exec_binprm(struct linux_binprm *bprm)
 	old_vpid = task_pid_nr_ns(current, task_active_pid_ns(current->parent));
 	rcu_read_unlock();
 
-	ret = search_binary_handler(bprm);
-	if (ret >= 0) {
-		audit_bprm(bprm);
-		trace_sched_process_exec(current, old_pid, bprm);
-		ptrace_event(PTRACE_EVENT_EXEC, old_vpid);
-		proc_exec_connector(current);
+	/* This allows 4 levels of binfmt rewrites before failing hard. */
+	for (depth = 0;; depth++) {
+		struct file *exec;
+		if (depth > 5)
+			return -ELOOP;
+
+		ret = search_binary_handler(bprm);
+		if (ret < 0)
+			return ret;
+		if (!bprm->interpreter)
+			break;
+
+		exec = bprm->file;
+		bprm->file = bprm->interpreter;
+		bprm->interpreter = NULL;
+
+		allow_write_access(exec);
+		if (unlikely(bprm->have_execfd)) {
+			if (bprm->executable) {
+				fput(exec);
+				return -ENOEXEC;
+			}
+			bprm->executable = exec;
+		} else
+			fput(exec);
 	}
 
-	return ret;
+	audit_bprm(bprm);
+	trace_sched_process_exec(current, old_pid, bprm);
+	ptrace_event(PTRACE_EVENT_EXEC, old_vpid);
+	proc_exec_connector(current);
+	return 0;
 }
 
 /*
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index 653508b25815..7fc05929c967 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -50,8 +50,8 @@ struct linux_binprm {
 #ifdef __alpha__
 	unsigned int taso:1;
 #endif
-	unsigned int recursion_depth; /* only for search_binary_handler() */
 	struct file * executable; /* Executable to pass to the interpreter */
+	struct file * interpreter;
 	struct file * file;
 	struct cred *cred;	/* new credentials */
 	int unsafe;		/* how unsafe this exec is (mask of LSM_UNSAFE_*) */
@@ -117,7 +117,6 @@ static inline void insert_binfmt(struct linux_binfmt *fmt)
 extern void unregister_binfmt(struct linux_binfmt *);
 
 extern int __must_check remove_arg_zero(struct linux_binprm *);
-extern int search_binary_handler(struct linux_binprm *);
 extern int begin_new_exec(struct linux_binprm * bprm);
 extern void setup_new_exec(struct linux_binprm * bprm);
 extern void finalize_exec(struct linux_binprm *bprm);
-- 
2.25.0


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 0/8] exec: Control flow simplifications
  2020-05-19  0:29     ` [PATCH v2 0/8] exec: Control flow simplifications Eric W. Biederman
                         ` (7 preceding siblings ...)
  2020-05-19  0:34       ` [PATCH v2 8/8] exec: Remove recursion from search_binary_handler Eric W. Biederman
@ 2020-05-19  1:25       ` Linus Torvalds
  2020-05-19 21:55       ` Kees Cook
  2020-05-20 22:12       ` Eric W. Biederman
  10 siblings, 0 replies; 122+ messages in thread
From: Linus Torvalds @ 2020-05-19  1:25 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Linux Kernel Mailing List, Oleg Nesterov, Jann Horn, Kees Cook,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton, Casey Schaufler,
	LSM List, James Morris, Serge E. Hallyn, Andy Lutomirski

On Mon, May 18, 2020 at 5:32 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> It is hard to follow the control flow in exec.c as the code has evolved over
> time and something that used to work one way now works another.  This set of
> changes attempts to address the worst of that, to remove unnecessary work
> and to make the code a little easier to follow.

It is indeed hard to follow, and maybe I missed something, but from
what I can tell, your series looks all sane. It certainly seems to
make things much more straightforward.

Of course, exactly _because_ it's such a messy area, maybe it
introduces something odd, but all the patches look relatively
straightforward. And you remove more lines of code than you add, which
is always nice to see.

So ack from me.

Oleg? Jann? Anybody? Do you see anything strange that I missed?

                Linus

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 2/8] exec: Factor security_bprm_creds_for_exec out of security_bprm_set_creds
  2020-05-19  0:30       ` [PATCH v2 2/8] exec: Factor security_bprm_creds_for_exec out of security_bprm_set_creds Eric W. Biederman
@ 2020-05-19 15:34         ` Casey Schaufler
  2020-05-19 18:10         ` Kees Cook
  1 sibling, 0 replies; 122+ messages in thread
From: Casey Schaufler @ 2020-05-19 15:34 UTC (permalink / raw)
  To: Eric W. Biederman, linux-kernel
  Cc: Linus Torvalds, Oleg Nesterov, Jann Horn, Kees Cook,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton, linux-security-module,
	James Morris, Serge E. Hallyn, Andy Lutomirski

On 5/18/2020 5:30 PM, Eric W. Biederman wrote:
> Today security_bprm_set_creds has several implementations:
> apparmor_bprm_set_creds, cap_bprm_set_creds, selinux_bprm_set_creds,
> smack_bprm_set_creds, and tomoyo_bprm_set_creds.
>
> Except for cap_bprm_set_creds they all test bprm->called_set_creds and
> return immediately if it is true.  The function cap_bprm_set_creds
> ignores bprm->calld_sed_creds entirely.
>
> Create a new LSM hook security_bprm_creds_for_exec that is called just
> before prepare_binprm in __do_execve_file, resulting in a LSM hook
> that is called exactly once for the entire of exec.  Modify the bits
> of security_bprm_set_creds that only want to be called once per exec
> into security_bprm_creds_for_exec, leaving only cap_bprm_set_creds
> behind.
>
> Remove bprm->called_set_creds all of it's former users have been moved
> to security_bprm_creds_for_exec.
>
> Add or upate comments a appropriate to bring them up to date and
> to reflect this change.
>
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

For the LSM and Smack bits

Acked-by: Casey Schaufler <casey@schaufler-ca.com>

> ---
>  fs/exec.c                          |  6 +++-
>  include/linux/binfmts.h            | 18 +++--------
>  include/linux/lsm_hook_defs.h      |  1 +
>  include/linux/lsm_hooks.h          | 50 +++++++++++++++++-------------
>  include/linux/security.h           |  6 ++++
>  security/apparmor/domain.c         |  7 ++---
>  security/apparmor/include/domain.h |  2 +-
>  security/apparmor/lsm.c            |  2 +-
>  security/security.c                |  5 +++
>  security/selinux/hooks.c           |  8 ++---
>  security/smack/smack_lsm.c         |  9 ++----
>  security/tomoyo/tomoyo.c           | 12 ++-----
>  12 files changed, 63 insertions(+), 63 deletions(-)
>
> diff --git a/fs/exec.c b/fs/exec.c
> index 14b786158aa9..9e70da47f8d9 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1640,7 +1640,6 @@ int prepare_binprm(struct linux_binprm *bprm)
>  	retval = security_bprm_set_creds(bprm);
>  	if (retval)
>  		return retval;
> -	bprm->called_set_creds = 1;
>  
>  	memset(bprm->buf, 0, BINPRM_BUF_SIZE);
>  	return kernel_read(bprm->file, bprm->buf, BINPRM_BUF_SIZE, &pos);
> @@ -1855,6 +1854,11 @@ static int __do_execve_file(int fd, struct filename *filename,
>  	if (retval < 0)
>  		goto out;
>  
> +	/* Set the unchanging part of bprm->cred */
> +	retval = security_bprm_creds_for_exec(bprm);
> +	if (retval)
> +		goto out;
> +
>  	retval = prepare_binprm(bprm);
>  	if (retval < 0)
>  		goto out;
> diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
> index 1b48e2154766..d1217fcdedea 100644
> --- a/include/linux/binfmts.h
> +++ b/include/linux/binfmts.h
> @@ -27,22 +27,14 @@ struct linux_binprm {
>  	unsigned long argmin; /* rlimit marker for copy_strings() */
>  	unsigned int
>  		/*
> -		 * True after the bprm_set_creds hook has been called once
> -		 * (multiple calls can be made via prepare_binprm() for
> -		 * binfmt_script/misc).
> -		 */
> -		called_set_creds:1,
> -		/*
> -		 * True if most recent call to the commoncaps bprm_set_creds
> -		 * hook (due to multiple prepare_binprm() calls from the
> -		 * binfmt_script/misc handlers) resulted in elevated
> -		 * privileges.
> +		 * True if most recent call to cap_bprm_set_creds
> +		 * resulted in elevated privileges.
>  		 */
>  		cap_elevated:1,
>  		/*
> -		 * Set by bprm_set_creds hook to indicate a privilege-gaining
> -		 * exec has happened. Used to sanitize execution environment
> -		 * and to set AT_SECURE auxv for glibc.
> +		 * Set by bprm_creds_for_exec hook to indicate a
> +		 * privilege-gaining exec has happened. Used to set
> +		 * AT_SECURE auxv for glibc.
>  		 */
>  		secureexec:1,
>  		/*
> diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
> index 9cd4455528e5..aab0695f41df 100644
> --- a/include/linux/lsm_hook_defs.h
> +++ b/include/linux/lsm_hook_defs.h
> @@ -49,6 +49,7 @@ LSM_HOOK(int, 0, syslog, int type)
>  LSM_HOOK(int, 0, settime, const struct timespec64 *ts,
>  	 const struct timezone *tz)
>  LSM_HOOK(int, 0, vm_enough_memory, struct mm_struct *mm, long pages)
> +LSM_HOOK(int, 0, bprm_creds_for_exec, struct linux_binprm *bprm)
>  LSM_HOOK(int, 0, bprm_set_creds, struct linux_binprm *bprm)
>  LSM_HOOK(int, 0, bprm_check_security, struct linux_binprm *bprm)
>  LSM_HOOK(void, LSM_RET_VOID, bprm_committing_creds, struct linux_binprm *bprm)
> diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
> index 988ca0df7824..c719af37df20 100644
> --- a/include/linux/lsm_hooks.h
> +++ b/include/linux/lsm_hooks.h
> @@ -34,40 +34,46 @@
>   *
>   * Security hooks for program execution operations.
>   *
> + * @bprm_creds_for_exec:
> + *	If the setup in prepare_exec_creds did not setup @bprm->cred->security
> + *	properly for executing @bprm->file, update the LSM's portion of
> + *	@bprm->cred->security to be what commit_creds needs to install for the
> + *	new program.  This hook may also optionally check permissions
> + *	(e.g. for transitions between security domains).
> + *	The hook must set @bprm->secureexec to 1 if AT_SECURE should be set to
> + *	request libc enable secure mode.
> + *	@bprm contains the linux_binprm structure.
> + *	Return 0 if the hook is successful and permission is granted.
>   * @bprm_set_creds:
> - *	Save security information in the bprm->security field, typically based
> - *	on information about the bprm->file, for later use by the apply_creds
> - *	hook.  This hook may also optionally check permissions (e.g. for
> + *	Assuming that the relevant bits of @bprm->cred->security have been
> + *	previously set, examine @bprm->file and regenerate them.  This is
> + *	so that the credentials derived from the interpreter the code is
> + *	actually going to run are used rather than credentials derived
> + *	from a script.  This done because the interpreter binary needs to
> + *	reopen script, and may end up opening something completely different.
> + *	This hook may also optionally check permissions (e.g. for
>   *	transitions between security domains).
> - *	This hook may be called multiple times during a single execve, e.g. for
> - *	interpreters.  The hook can tell whether it has already been called by
> - *	checking to see if @bprm->security is non-NULL.  If so, then the hook
> - *	may decide either to retain the security information saved earlier or
> - *	to replace it.  The hook must set @bprm->secureexec to 1 if a "secure
> - *	exec" has happened as a result of this hook call.  The flag is used to
> - *	indicate the need for a sanitized execution environment, and is also
> - *	passed in the ELF auxiliary table on the initial stack to indicate
> - *	whether libc should enable secure mode.
> + *	The hook must set @bprm->cap_elevated to 1 if AT_SECURE should be set to
> + *	request libc enable secure mode.
>   *	@bprm contains the linux_binprm structure.
>   *	Return 0 if the hook is successful and permission is granted.
>   * @bprm_check_security:
>   *	This hook mediates the point when a search for a binary handler will
> - *	begin.  It allows a check the @bprm->security value which is set in the
> - *	preceding set_creds call.  The primary difference from set_creds is
> - *	that the argv list and envp list are reliably available in @bprm.  This
> - *	hook may be called multiple times during a single execve; and in each
> - *	pass set_creds is called first.
> + *	begin.  It allows a check against the @bprm->cred->security value
> + *	which was set in the preceding creds_for_exec call.  The argv list and
> + *	envp list are reliably available in @bprm.  This hook may be called
> + *	multiple times during a single execve.
>   *	@bprm contains the linux_binprm structure.
>   *	Return 0 if the hook is successful and permission is granted.
>   * @bprm_committing_creds:
>   *	Prepare to install the new security attributes of a process being
>   *	transformed by an execve operation, based on the old credentials
>   *	pointed to by @current->cred and the information set in @bprm->cred by
> - *	the bprm_set_creds hook.  @bprm points to the linux_binprm structure.
> - *	This hook is a good place to perform state changes on the process such
> - *	as closing open file descriptors to which access will no longer be
> - *	granted when the attributes are changed.  This is called immediately
> - *	before commit_creds().
> + *	the bprm_creds_for_exec hook.  @bprm points to the linux_binprm
> + *	structure.  This hook is a good place to perform state changes on the
> + *	process such as closing open file descriptors to which access will no
> + *	longer be granted when the attributes are changed.  This is called
> + *	immediately before commit_creds().
>   * @bprm_committed_creds:
>   *	Tidy up after the installation of the new security attributes of a
>   *	process being transformed by an execve operation.  The new credentials
> diff --git a/include/linux/security.h b/include/linux/security.h
> index a8d9310472df..1bd7a6582775 100644
> --- a/include/linux/security.h
> +++ b/include/linux/security.h
> @@ -276,6 +276,7 @@ int security_quota_on(struct dentry *dentry);
>  int security_syslog(int type);
>  int security_settime64(const struct timespec64 *ts, const struct timezone *tz);
>  int security_vm_enough_memory_mm(struct mm_struct *mm, long pages);
> +int security_bprm_creds_for_exec(struct linux_binprm *bprm);
>  int security_bprm_set_creds(struct linux_binprm *bprm);
>  int security_bprm_check(struct linux_binprm *bprm);
>  void security_bprm_committing_creds(struct linux_binprm *bprm);
> @@ -569,6 +570,11 @@ static inline int security_vm_enough_memory_mm(struct mm_struct *mm, long pages)
>  	return __vm_enough_memory(mm, pages, cap_vm_enough_memory(mm, pages));
>  }
>  
> +static inline int security_bprm_creds_for_exec(struct linux_binprm *bprm)
> +{
> +	return 0;
> +}
> +
>  static inline int security_bprm_set_creds(struct linux_binprm *bprm)
>  {
>  	return cap_bprm_set_creds(bprm);
> diff --git a/security/apparmor/domain.c b/security/apparmor/domain.c
> index 6ceb74e0f789..0b870a647488 100644
> --- a/security/apparmor/domain.c
> +++ b/security/apparmor/domain.c
> @@ -854,14 +854,14 @@ static struct aa_label *handle_onexec(struct aa_label *label,
>  }
>  
>  /**
> - * apparmor_bprm_set_creds - set the new creds on the bprm struct
> + * apparmor_bprm_creds_for_exec - Update the new creds on the bprm struct
>   * @bprm: binprm for the exec  (NOT NULL)
>   *
>   * Returns: %0 or error on failure
>   *
>   * TODO: once the other paths are done see if we can't refactor into a fn
>   */
> -int apparmor_bprm_set_creds(struct linux_binprm *bprm)
> +int apparmor_bprm_creds_for_exec(struct linux_binprm *bprm)
>  {
>  	struct aa_task_ctx *ctx;
>  	struct aa_label *label, *new = NULL;
> @@ -875,9 +875,6 @@ int apparmor_bprm_set_creds(struct linux_binprm *bprm)
>  		file_inode(bprm->file)->i_mode
>  	};
>  
> -	if (bprm->called_set_creds)
> -		return 0;
> -
>  	ctx = task_ctx(current);
>  	AA_BUG(!cred_label(bprm->cred));
>  	AA_BUG(!ctx);
> diff --git a/security/apparmor/include/domain.h b/security/apparmor/include/domain.h
> index 21b875fe2d37..d14928fe1c6f 100644
> --- a/security/apparmor/include/domain.h
> +++ b/security/apparmor/include/domain.h
> @@ -30,7 +30,7 @@ struct aa_domain {
>  struct aa_label *x_table_lookup(struct aa_profile *profile, u32 xindex,
>  				const char **name);
>  
> -int apparmor_bprm_set_creds(struct linux_binprm *bprm);
> +int apparmor_bprm_creds_for_exec(struct linux_binprm *bprm);
>  
>  void aa_free_domain_entries(struct aa_domain *domain);
>  int aa_change_hat(const char *hats[], int count, u64 token, int flags);
> diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c
> index b621ad74f54a..3623ab08279d 100644
> --- a/security/apparmor/lsm.c
> +++ b/security/apparmor/lsm.c
> @@ -1232,7 +1232,7 @@ static struct security_hook_list apparmor_hooks[] __lsm_ro_after_init = {
>  	LSM_HOOK_INIT(cred_prepare, apparmor_cred_prepare),
>  	LSM_HOOK_INIT(cred_transfer, apparmor_cred_transfer),
>  
> -	LSM_HOOK_INIT(bprm_set_creds, apparmor_bprm_set_creds),
> +	LSM_HOOK_INIT(bprm_creds_for_exec, apparmor_bprm_creds_for_exec),
>  	LSM_HOOK_INIT(bprm_committing_creds, apparmor_bprm_committing_creds),
>  	LSM_HOOK_INIT(bprm_committed_creds, apparmor_bprm_committed_creds),
>  
> diff --git a/security/security.c b/security/security.c
> index 7fed24b9d57e..4ee76a729f73 100644
> --- a/security/security.c
> +++ b/security/security.c
> @@ -823,6 +823,11 @@ int security_vm_enough_memory_mm(struct mm_struct *mm, long pages)
>  	return __vm_enough_memory(mm, pages, cap_sys_admin);
>  }
>  
> +int security_bprm_creds_for_exec(struct linux_binprm *bprm)
> +{
> +	return call_int_hook(bprm_creds_for_exec, 0, bprm);
> +}
> +
>  int security_bprm_set_creds(struct linux_binprm *bprm)
>  {
>  	return call_int_hook(bprm_set_creds, 0, bprm);
> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> index 0b4e32161b77..718345dd76bb 100644
> --- a/security/selinux/hooks.c
> +++ b/security/selinux/hooks.c
> @@ -2286,7 +2286,7 @@ static int check_nnp_nosuid(const struct linux_binprm *bprm,
>  	return -EACCES;
>  }
>  
> -static int selinux_bprm_set_creds(struct linux_binprm *bprm)
> +static int selinux_bprm_creds_for_exec(struct linux_binprm *bprm)
>  {
>  	const struct task_security_struct *old_tsec;
>  	struct task_security_struct *new_tsec;
> @@ -2297,8 +2297,6 @@ static int selinux_bprm_set_creds(struct linux_binprm *bprm)
>  
>  	/* SELinux context only depends on initial program or script and not
>  	 * the script interpreter */
> -	if (bprm->called_set_creds)
> -		return 0;
>  
>  	old_tsec = selinux_cred(current_cred());
>  	new_tsec = selinux_cred(bprm->cred);
> @@ -6385,7 +6383,7 @@ static int selinux_setprocattr(const char *name, void *value, size_t size)
>  	/* Permission checking based on the specified context is
>  	   performed during the actual operation (execve,
>  	   open/mkdir/...), when we know the full context of the
> -	   operation.  See selinux_bprm_set_creds for the execve
> +	   operation.  See selinux_bprm_creds_for_exec for the execve
>  	   checks and may_create for the file creation checks. The
>  	   operation will then fail if the context is not permitted. */
>  	tsec = selinux_cred(new);
> @@ -6914,7 +6912,7 @@ static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = {
>  
>  	LSM_HOOK_INIT(netlink_send, selinux_netlink_send),
>  
> -	LSM_HOOK_INIT(bprm_set_creds, selinux_bprm_set_creds),
> +	LSM_HOOK_INIT(bprm_creds_for_exec, selinux_bprm_creds_for_exec),
>  	LSM_HOOK_INIT(bprm_committing_creds, selinux_bprm_committing_creds),
>  	LSM_HOOK_INIT(bprm_committed_creds, selinux_bprm_committed_creds),
>  
> diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
> index 8c61d175e195..0ac8f4518d07 100644
> --- a/security/smack/smack_lsm.c
> +++ b/security/smack/smack_lsm.c
> @@ -891,12 +891,12 @@ static int smack_sb_statfs(struct dentry *dentry)
>   */
>  
>  /**
> - * smack_bprm_set_creds - set creds for exec
> + * smack_bprm_creds_for_exec - Update bprm->cred if needed for exec
>   * @bprm: the exec information
>   *
>   * Returns 0 if it gets a blob, -EPERM if exec forbidden and -ENOMEM otherwise
>   */
> -static int smack_bprm_set_creds(struct linux_binprm *bprm)
> +static int smack_bprm_creds_for_exec(struct linux_binprm *bprm)
>  {
>  	struct inode *inode = file_inode(bprm->file);
>  	struct task_smack *bsp = smack_cred(bprm->cred);
> @@ -904,9 +904,6 @@ static int smack_bprm_set_creds(struct linux_binprm *bprm)
>  	struct superblock_smack *sbsp;
>  	int rc;
>  
> -	if (bprm->called_set_creds)
> -		return 0;
> -
>  	isp = smack_inode(inode);
>  	if (isp->smk_task == NULL || isp->smk_task == bsp->smk_task)
>  		return 0;
> @@ -4598,7 +4595,7 @@ static struct security_hook_list smack_hooks[] __lsm_ro_after_init = {
>  	LSM_HOOK_INIT(sb_statfs, smack_sb_statfs),
>  	LSM_HOOK_INIT(sb_set_mnt_opts, smack_set_mnt_opts),
>  
> -	LSM_HOOK_INIT(bprm_set_creds, smack_bprm_set_creds),
> +	LSM_HOOK_INIT(bprm_creds_for_exec, smack_bprm_creds_for_exec),
>  
>  	LSM_HOOK_INIT(inode_alloc_security, smack_inode_alloc_security),
>  	LSM_HOOK_INIT(inode_init_security, smack_inode_init_security),
> diff --git a/security/tomoyo/tomoyo.c b/security/tomoyo/tomoyo.c
> index 716c92ec941a..f9adddc42ac8 100644
> --- a/security/tomoyo/tomoyo.c
> +++ b/security/tomoyo/tomoyo.c
> @@ -63,20 +63,14 @@ static void tomoyo_bprm_committed_creds(struct linux_binprm *bprm)
>  
>  #ifndef CONFIG_SECURITY_TOMOYO_OMIT_USERSPACE_LOADER
>  /**
> - * tomoyo_bprm_set_creds - Target for security_bprm_set_creds().
> + * tomoyo_bprm_for_exec - Target for security_bprm_creds_for_exec().
>   *
>   * @bprm: Pointer to "struct linux_binprm".
>   *
>   * Returns 0.
>   */
> -static int tomoyo_bprm_set_creds(struct linux_binprm *bprm)
> +static int tomoyo_bprm_creds_for_exec(struct linux_binprm *bprm)
>  {
> -	/*
> -	 * Do only if this function is called for the first time of an execve
> -	 * operation.
> -	 */
> -	if (bprm->called_set_creds)
> -		return 0;
>  	/*
>  	 * Load policy if /sbin/tomoyo-init exists and /sbin/init is requested
>  	 * for the first time.
> @@ -539,7 +533,7 @@ static struct security_hook_list tomoyo_hooks[] __lsm_ro_after_init = {
>  	LSM_HOOK_INIT(task_alloc, tomoyo_task_alloc),
>  	LSM_HOOK_INIT(task_free, tomoyo_task_free),
>  #ifndef CONFIG_SECURITY_TOMOYO_OMIT_USERSPACE_LOADER
> -	LSM_HOOK_INIT(bprm_set_creds, tomoyo_bprm_set_creds),
> +	LSM_HOOK_INIT(bprm_creds_for_exec, tomoyo_bprm_creds_for_exec),
>  #endif
>  	LSM_HOOK_INIT(bprm_check_security, tomoyo_bprm_check_security),
>  	LSM_HOOK_INIT(file_fcntl, tomoyo_file_fcntl),


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 1/8] exec: Teach prepare_exec_creds how exec treats uids & gids
  2020-05-19  0:29       ` [PATCH v2 1/8] exec: Teach prepare_exec_creds how exec treats uids & gids Eric W. Biederman
@ 2020-05-19 18:03         ` Kees Cook
  2020-05-19 18:28           ` Linus Torvalds
  0 siblings, 1 reply; 122+ messages in thread
From: Kees Cook @ 2020-05-19 18:03 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, Jann Horn,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton, Casey Schaufler,
	linux-security-module, James Morris, Serge E. Hallyn,
	Andy Lutomirski

On Mon, May 18, 2020 at 07:29:41PM -0500, Eric W. Biederman wrote:
> 
> It is almost possible to use the result of prepare_exec_creds with no
> modifications during exec.  Update prepare_exec_creds to initialize
> the suid and the fsuid to the euid, and the sgid and the fsgid to the
> egid.  This is all that is needed to handle the common case of exec
> when nothing special like a setuid exec is happening.
> 
> That this preserves the existing behavior of exec can be verified
> by examing bprm_fill_uid and cap_bprm_set_creds.

Yup, agreed.

> This change makes it clear that the later parts of exec that
> update bprm->cred are just need to handle special cases such
> as setuid exec and change of domains.

One question, though: why add this, since the repeat calling of the caps
LSM hook will do this? Is there a call ordering change here, or is this
just to make the new LSM hook more robust?

Regardless, this looks correct, if perhaps redundant. :)

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 2/8] exec: Factor security_bprm_creds_for_exec out of security_bprm_set_creds
  2020-05-19  0:30       ` [PATCH v2 2/8] exec: Factor security_bprm_creds_for_exec out of security_bprm_set_creds Eric W. Biederman
  2020-05-19 15:34         ` Casey Schaufler
@ 2020-05-19 18:10         ` Kees Cook
  2020-05-19 21:28           ` James Morris
  1 sibling, 1 reply; 122+ messages in thread
From: Kees Cook @ 2020-05-19 18:10 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, Jann Horn,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton, Casey Schaufler,
	linux-security-module, James Morris, Serge E. Hallyn,
	Andy Lutomirski

On Mon, May 18, 2020 at 07:30:10PM -0500, Eric W. Biederman wrote:
> 
> Today security_bprm_set_creds has several implementations:
> apparmor_bprm_set_creds, cap_bprm_set_creds, selinux_bprm_set_creds,
> smack_bprm_set_creds, and tomoyo_bprm_set_creds.
> 
> Except for cap_bprm_set_creds they all test bprm->called_set_creds and
> return immediately if it is true.  The function cap_bprm_set_creds
> ignores bprm->calld_sed_creds entirely.
> 
> Create a new LSM hook security_bprm_creds_for_exec that is called just
> before prepare_binprm in __do_execve_file, resulting in a LSM hook
> that is called exactly once for the entire of exec.  Modify the bits
> of security_bprm_set_creds that only want to be called once per exec
> into security_bprm_creds_for_exec, leaving only cap_bprm_set_creds
> behind.
> 
> Remove bprm->called_set_creds all of it's former users have been moved
> to security_bprm_creds_for_exec.
> 
> Add or upate comments a appropriate to bring them up to date and
> to reflect this change.

Yup, awesome. One nit below.

Reviewed-by: Kees Cook <keescook@chromium.org>

> [...]
> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> index 0b4e32161b77..718345dd76bb 100644
> --- a/security/selinux/hooks.c
> +++ b/security/selinux/hooks.c
> [...]
> @@ -2297,8 +2297,6 @@ static int selinux_bprm_set_creds(struct linux_binprm *bprm)
>  
>  	/* SELinux context only depends on initial program or script and not
>  	 * the script interpreter */
> -	if (bprm->called_set_creds)
> -		return 0;
>  
>  	old_tsec = selinux_cred(current_cred());
>  	new_tsec = selinux_cred(bprm->cred);

As you've done in the other LSMs, I think this comment can be removed
(or moved to the top of the function) too.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 3/8] exec: Convert security_bprm_set_creds into security_bprm_repopulate_creds
  2020-05-19  0:31       ` [PATCH v2 3/8] exec: Convert security_bprm_set_creds into security_bprm_repopulate_creds Eric W. Biederman
@ 2020-05-19 18:21         ` Kees Cook
  2020-05-19 19:03           ` Eric W. Biederman
  2020-05-19 21:52         ` James Morris
  1 sibling, 1 reply; 122+ messages in thread
From: Kees Cook @ 2020-05-19 18:21 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, Jann Horn,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton, Casey Schaufler,
	linux-security-module, James Morris, Serge E. Hallyn,
	Andy Lutomirski

On Mon, May 18, 2020 at 07:31:14PM -0500, Eric W. Biederman wrote:
> 
> Rename bprm->cap_elevated to bprm->active_secureexec and initialize it
> in prepare_binprm instead of in cap_bprm_set_creds.  Initializing
> bprm->active_secureexec in prepare_binprm allows multiple
> implementations of security_bprm_repopulate_creds to play nicely with
> each other.
> 
> Rename security_bprm_set_creds to security_bprm_reopulate_creds to
> emphasize that this path recomputes part of bprm->cred.  This
> recomputation avoids the time of check vs time of use problems that
> are inherent in unix #! interpreters.
> 
> In short two renames and a move in the location of initializing
> bprm->active_secureexec.

I like this much better than the direct call to the capabilities hook.
Thanks!

Reviewed-by: Kees Cook <keescook@chromium.org>

One nit is a bikeshed on the name "active_secureexec", since
the word "active" isn't really associated with any other part of the
binfmt logic. It's supposed to be "latest state from the binfmt loop",
so instead of "active", I considered these words that I also didn't
like: "current", "this", "recent", and "now". Is "latest" better than
"active"? Probably not.

> [...]
> diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
> index d1217fcdedea..8605ab4a0f89 100644
> --- a/include/linux/binfmts.h
> +++ b/include/linux/binfmts.h
> @@ -27,10 +27,10 @@ struct linux_binprm {
>  	unsigned long argmin; /* rlimit marker for copy_strings() */
>  	unsigned int
>  		/*
> -		 * True if most recent call to cap_bprm_set_creds
> +		 * True if most recent call to security_bprm_set_creds
>  		 * resulted in elevated privileges.
>  		 */
> -		cap_elevated:1,
> +		active_secureexec:1,

Also, I'd like it if this comment could be made more verbose as well, for
anyone trying to understand the binfmt execution flow for the first time.
Perhaps:

		/*
		 * Must be set True during the any call to
		 * bprm_set_creds hook where the execution would
		 * reuslt in elevated privileges. (The hook can be
		 * called multiple times during nested interpreter
		 * resolution across binfmt_script, binfmt_misc, etc).
		 */


-- 
Kees Cook

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 4/8] exec: Allow load_misc_binary to call prepare_binfmt unconditionally
  2020-05-19  0:31       ` [PATCH v2 4/8] exec: Allow load_misc_binary to call prepare_binfmt unconditionally Eric W. Biederman
@ 2020-05-19 18:27         ` Kees Cook
  2020-05-19 19:08           ` Eric W. Biederman
  0 siblings, 1 reply; 122+ messages in thread
From: Kees Cook @ 2020-05-19 18:27 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, Jann Horn,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton, Casey Schaufler,
	linux-security-module, James Morris, Serge E. Hallyn,
	Andy Lutomirski

On Mon, May 18, 2020 at 07:31:51PM -0500, Eric W. Biederman wrote:
> 
> Add a flag preserve_creds that binfmt_misc can set to prevent
> credentials from being updated.  This allows binfmt_misc to always
> call prepare_binfmt.  Allowing the credential computation logic to be

typo: prepare_binprm()

> consolidated.
> 
> Not replacing the credentials with the interpreters credentials is
> safe because because an open file descriptor to the executable is
> passed to the interpreter.   As the interpreter does not need to
> reopen the executable it is guaranteed to see the same file that
> exec sees.

Yup, looks good. Note below on comment.

Reviewed-by: Kees Cook <keescook@chromium.org>

> [...]
> diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
> index 8605ab4a0f89..dbb5614d62a2 100644
> --- a/include/linux/binfmts.h
> +++ b/include/linux/binfmts.h
> @@ -26,6 +26,8 @@ struct linux_binprm {
>  	unsigned long p; /* current top of mem */
>  	unsigned long argmin; /* rlimit marker for copy_strings() */
>  	unsigned int
> +		/* It is safe to use the creds of a script (see binfmt_misc) */
> +		preserve_creds:1,

How about:

		/*
		 * A binfmt handler will set this to True before calling
		 * prepare_binprm() if it is safe to reuse the previous
		 * credentials, based on bprm->file (see binfmt_misc).
		 */

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 5/8] exec: Move the call of prepare_binprm into search_binary_handler
  2020-05-19  0:32       ` [PATCH v2 5/8] exec: Move the call of prepare_binprm into search_binary_handler Eric W. Biederman
@ 2020-05-19 18:27         ` Kees Cook
  2020-05-19 21:30         ` James Morris
  1 sibling, 0 replies; 122+ messages in thread
From: Kees Cook @ 2020-05-19 18:27 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, Jann Horn,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton, Casey Schaufler,
	linux-security-module, James Morris, Serge E. Hallyn,
	Andy Lutomirski

On Mon, May 18, 2020 at 07:32:18PM -0500, Eric W. Biederman wrote:
> 
> The code in prepare_binary_handler needs to be run every time
> search_binary_handler is called so move the call into search_binary_handler
> itself to make the code simpler and easier to understand.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 1/8] exec: Teach prepare_exec_creds how exec treats uids & gids
  2020-05-19 18:03         ` Kees Cook
@ 2020-05-19 18:28           ` Linus Torvalds
  2020-05-19 18:57             ` Eric W. Biederman
  0 siblings, 1 reply; 122+ messages in thread
From: Linus Torvalds @ 2020-05-19 18:28 UTC (permalink / raw)
  To: Kees Cook
  Cc: Eric W. Biederman, Linux Kernel Mailing List, Oleg Nesterov,
	Jann Horn, Greg Ungerer, Rob Landley, Bernd Edlinger,
	linux-fsdevel, Al Viro, Alexey Dobriyan, Andrew Morton,
	Casey Schaufler, LSM List, James Morris, Serge E. Hallyn,
	Andy Lutomirski

On Tue, May 19, 2020 at 11:03 AM Kees Cook <keescook@chromium.org> wrote:
>
> One question, though: why add this, since the repeat calling of the caps
> LSM hook will do this?

I assume it's for the "preserve_creds" case where we don't even end up
setting creds at all.

Yeah, at some point we'll hit a bprm handler that doesn't set
'preserve_creds', and it all does get set in the end, but that's not
statically all that obvious.

I think it makes sense to initialize as much as possible from the
generic code, and rely as little as possible on what the binfmt
handlers end up actually doing.

              Linus

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 1/8] exec: Teach prepare_exec_creds how exec treats uids & gids
  2020-05-19 18:28           ` Linus Torvalds
@ 2020-05-19 18:57             ` Eric W. Biederman
  0 siblings, 0 replies; 122+ messages in thread
From: Eric W. Biederman @ 2020-05-19 18:57 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Kees Cook, Linux Kernel Mailing List, Oleg Nesterov, Jann Horn,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton, Casey Schaufler,
	LSM List, James Morris, Serge E. Hallyn, Andy Lutomirski

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Tue, May 19, 2020 at 11:03 AM Kees Cook <keescook@chromium.org> wrote:
>>
>> One question, though: why add this, since the repeat calling of the caps
>> LSM hook will do this?
>
> I assume it's for the "preserve_creds" case where we don't even end up
> setting creds at all.
>
> Yeah, at some point we'll hit a bprm handler that doesn't set
> 'preserve_creds', and it all does get set in the end, but that's not
> statically all that obvious.
>
> I think it makes sense to initialize as much as possible from the
> generic code, and rely as little as possible on what the binfmt
> handlers end up actually doing.

Where this initially came from was I was looking at how to clean up the
case of no_new_privs/ptrace of a suid executable when we don't have
enough permissions.   Just being able to create creds that kept
everything as they were looked very useful and there was just this one
little bit missing.

I included the change to prepare_exec_creds in this patchset to
emphasize that neither security_bprm_creds_for_exec nor
security_bprm_repopulate_creds need to do anything if there is nothing
special going on.

At the very least that helps me think through what the LSMs are required
to do, and what those hooks are for.  AKA privilege changing execs.

So I was thinking rely on the LSMs as little as possible rather than
rely on the binfmt handlers as little as possible.  But it is the same
idea.

And yes it makes everything easier to analyze if everything starts off
in a known good state.

Eric

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 3/8] exec: Convert security_bprm_set_creds into security_bprm_repopulate_creds
  2020-05-19 18:21         ` Kees Cook
@ 2020-05-19 19:03           ` Eric W. Biederman
  2020-05-19 19:14             ` Kees Cook
  0 siblings, 1 reply; 122+ messages in thread
From: Eric W. Biederman @ 2020-05-19 19:03 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, Jann Horn,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton, Casey Schaufler,
	linux-security-module, James Morris, Serge E. Hallyn,
	Andy Lutomirski

Kees Cook <keescook@chromium.org> writes:

> On Mon, May 18, 2020 at 07:31:14PM -0500, Eric W. Biederman wrote:
>> 
>> Rename bprm->cap_elevated to bprm->active_secureexec and initialize it
>> in prepare_binprm instead of in cap_bprm_set_creds.  Initializing
>> bprm->active_secureexec in prepare_binprm allows multiple
>> implementations of security_bprm_repopulate_creds to play nicely with
>> each other.
>> 
>> Rename security_bprm_set_creds to security_bprm_reopulate_creds to
>> emphasize that this path recomputes part of bprm->cred.  This
>> recomputation avoids the time of check vs time of use problems that
>> are inherent in unix #! interpreters.
>> 
>> In short two renames and a move in the location of initializing
>> bprm->active_secureexec.
>
> I like this much better than the direct call to the capabilities hook.
> Thanks!
>
> Reviewed-by: Kees Cook <keescook@chromium.org>
>
> One nit is a bikeshed on the name "active_secureexec", since
> the word "active" isn't really associated with any other part of the
> binfmt logic. It's supposed to be "latest state from the binfmt loop",
> so instead of "active", I considered these words that I also didn't
> like: "current", "this", "recent", and "now". Is "latest" better than
> "active"? Probably not.

I had pretty much the same problem.  Active at least conveys that it
is still malleable and might change.

>> [...]
>> diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
>> index d1217fcdedea..8605ab4a0f89 100644
>> --- a/include/linux/binfmts.h
>> +++ b/include/linux/binfmts.h
>> @@ -27,10 +27,10 @@ struct linux_binprm {
>>  	unsigned long argmin; /* rlimit marker for copy_strings() */
>>  	unsigned int
>>  		/*
>> -		 * True if most recent call to cap_bprm_set_creds
>> +		 * True if most recent call to security_bprm_set_creds
>>  		 * resulted in elevated privileges.
>>  		 */
>> -		cap_elevated:1,
>> +		active_secureexec:1,
>
> Also, I'd like it if this comment could be made more verbose as well, for
> anyone trying to understand the binfmt execution flow for the first time.
> Perhaps:
>
> 		/*
> 		 * Must be set True during the any call to
> 		 * bprm_set_creds hook where the execution would
> 		 * reuslt in elevated privileges. (The hook can be
> 		 * called multiple times during nested interpreter
> 		 * resolution across binfmt_script, binfmt_misc, etc).
> 		 */
Well it is not during but after the call that it becomes true.
I think most recent covers the case of multiple calls.

I think having the loop explicitly in the code a few patches
later makes it clear that there is a loop dealing with interpreters.

Conciseness has a virtue in that it is easy to absorb.  Seeing
active says most recent and secureexec does not is enough to ask
questions and look at the code.

Eric


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 6/8] exec/binfmt_script: Don't modify bprm->buf and then return -ENOEXEC
  2020-05-19  0:33       ` [PATCH v2 6/8] exec/binfmt_script: Don't modify bprm->buf and then return -ENOEXEC Eric W. Biederman
@ 2020-05-19 19:08         ` Kees Cook
  2020-05-19 19:19           ` Eric W. Biederman
  0 siblings, 1 reply; 122+ messages in thread
From: Kees Cook @ 2020-05-19 19:08 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, Jann Horn,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton, Casey Schaufler,
	linux-security-module, James Morris, Serge E. Hallyn,
	Andy Lutomirski

On Mon, May 18, 2020 at 07:33:21PM -0500, Eric W. Biederman wrote:
> 
> The return code -ENOEXEC serves to tell search_binary_handler that it
> should continue searching for the binfmt to handle a given file.  This
> makes return -ENOEXEC with a bprm->buf that is needed to continue the
> search problematic.
> 
> The current binfmt_script manages to escape problems as it closes and
> clears bprm->file before return -ENOEXEC with bprm->buf modified.
> This prevents search_binary_handler from looping as it explicitly
> handles a NULL bprm->file.
> 
> I plan on moving all of the bprm->file managment into fs/exec.c and out
> of the binary handlers so this will become a problem.
> 
> Move closing bprm->file and the test for BINPRM_PATH_INACCESSIBLE
> down below the last return of -ENOEXEC.
> 
> Introduce i_sep and i_end to track the end of the first argument and
> the end of the parameters respectively.  Using those, constification
> of all char * pointers, and the helpers next_terminator and
> next_non_spacetab guarantee the parameter parsing will not modify
> bprm->buf.

I'm quite pleased this could be implemented using the existing helpers!
It seems Linus and I were on the right track with these. :)

> 
> Only modify bprm->buf to terminate the strings i_arg and i_name with
> '\0' for passing to copy_strings_kernel.
> 
> When replacing loops with next_non_spacetab and next_terminator care
> has been take that the logic of the parsing code (short of replacing
> characters by '\0') remains the same.

Ah, interesting. As in, bprm->buf must not be modified unless the binfmt
handler is going to succeed. I think this requirement should be
documented in the binfmt struct header file.

> [...]
> diff --git a/fs/binfmt_script.c b/fs/binfmt_script.c
> index 8d718d8fd0fe..85e0ef86eb11 100644
> --- a/fs/binfmt_script.c
> +++ b/fs/binfmt_script.c
> @@ -71,39 +56,48 @@ static int load_script(struct linux_binprm *bprm)
>  	 * parse them on its own.
>  	 */
>  	buf_end = bprm->buf + sizeof(bprm->buf) - 1;
> -	cp = strnchr(bprm->buf, sizeof(bprm->buf), '\n');
> -	if (!cp) {
> -		cp = next_non_spacetab(bprm->buf + 2, buf_end);
> -		if (!cp)
> +	i_end = strnchr(bprm->buf, sizeof(bprm->buf), '\n');
> +	if (!i_end) {
> +		i_end = next_non_spacetab(bprm->buf + 2, buf_end);
> +		if (!i_end)
>  			return -ENOEXEC; /* Entire buf is spaces/tabs */
>  		/*
>  		 * If there is no later space/tab/NUL we must assume the
>  		 * interpreter path is truncated.
>  		 */
> -		if (!next_terminator(cp, buf_end))
> +		if (!next_terminator(i_end, buf_end))
>  			return -ENOEXEC;
> -		cp = buf_end;
> +		i_end = buf_end;
>  	}
> -	/* NUL-terminate the buffer and any trailing spaces/tabs. */
> -	*cp = '\0';
> -	while (cp > bprm->buf) {
> -		cp--;
> -		if ((*cp == ' ') || (*cp == '\t'))
> -			*cp = '\0';
> -		else
> -			break;
> -	}
> -	for (cp = bprm->buf+2; (*cp == ' ') || (*cp == '\t'); cp++);
> -	if (*cp == '\0')
> +	/* Trim any trailing spaces/tabs from i_end */
> +	while (spacetab(i_end[-1]))
> +		i_end--;
> +
> +	/* Skip over leading spaces/tabs */
> +	i_name = next_non_spacetab(bprm->buf+2, i_end);
> +	if (!i_name || (i_name == i_end))
>  		return -ENOEXEC; /* No interpreter name found */
> -	i_name = cp;
> +
> +	/* Is there an optional argument? */
>  	i_arg = NULL;
> -	for ( ; *cp && (*cp != ' ') && (*cp != '\t'); cp++)
> -		/* nothing */ ;
> -	while ((*cp == ' ') || (*cp == '\t'))
> -		*cp++ = '\0';
> -	if (*cp)
> -		i_arg = cp;
> +	i_sep = next_terminator(i_name, i_end);
> +	if (i_sep && (*i_sep != '\0'))
> +		i_arg = next_non_spacetab(i_sep, i_end);
> +
> +	/*
> +	 * If the script filename will be inaccessible after exec, typically
> +	 * because it is a "/dev/fd/<fd>/.." path against an O_CLOEXEC fd, give
> +	 * up now (on the assumption that the interpreter will want to load
> +	 * this file).
> +	 */
> +	if (bprm->interp_flags & BINPRM_FLAGS_PATH_INACCESSIBLE)
> +		return -ENOENT;
> +
> +	/* Release since we are not mapping a binary into memory. */
> +	allow_write_access(bprm->file);
> +	fput(bprm->file);
> +	bprm->file = NULL;
> +
>  	/*
>  	 * OK, we've parsed out the interpreter name and
>  	 * (optional) argument.
> @@ -121,7 +115,9 @@ static int load_script(struct linux_binprm *bprm)
>  	if (retval < 0)
>  		return retval;
>  	bprm->argc++;
> +	*((char *)i_end) = '\0';
>  	if (i_arg) {
> +		*((char *)i_sep) = '\0';
>  		retval = copy_strings_kernel(1, &i_arg, bprm);
>  		if (retval < 0)
>  			return retval;

I think this is all correct, though I'm always suspicious of my visual
inspection of string parsers. ;)

I had a worry the \n was not handled correctly in some case. I.e. before
any \n was converted into \0, and so next_terminator() didn't need to
consider \n separately. (next_non_spacetab() doesn't care since \n and \0
are both not ' ' nor '\t'.) For next_terminator(), though, I was worried
there was a case where *i_end == '\n', and next_terminator()
will return NULL instead of "last" due to *last being '\n' instead of
'\0', causing a problem, but you're using the adjusted i_end so I think
it's correct. And you've handled i_name == i_end.

I will see if I can find my testing scripts I used when commit
b5372fe5dc84 originally landed to double-check... until then:

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 4/8] exec: Allow load_misc_binary to call prepare_binfmt unconditionally
  2020-05-19 18:27         ` Kees Cook
@ 2020-05-19 19:08           ` Eric W. Biederman
  2020-05-19 19:17             ` Kees Cook
  0 siblings, 1 reply; 122+ messages in thread
From: Eric W. Biederman @ 2020-05-19 19:08 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, Jann Horn,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton, Casey Schaufler,
	linux-security-module, James Morris, Serge E. Hallyn,
	Andy Lutomirski

Kees Cook <keescook@chromium.org> writes:

> On Mon, May 18, 2020 at 07:31:51PM -0500, Eric W. Biederman wrote:
>> 
>> Add a flag preserve_creds that binfmt_misc can set to prevent
>> credentials from being updated.  This allows binfmt_misc to always
>> call prepare_binfmt.  Allowing the credential computation logic to be
>
> typo: prepare_binprm()

Thank you.

>> consolidated.
>> 
>> Not replacing the credentials with the interpreters credentials is
>> safe because because an open file descriptor to the executable is
>> passed to the interpreter.   As the interpreter does not need to
>> reopen the executable it is guaranteed to see the same file that
>> exec sees.
>
> Yup, looks good. Note below on comment.
>
> Reviewed-by: Kees Cook <keescook@chromium.org>
>
>> [...]
>> diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
>> index 8605ab4a0f89..dbb5614d62a2 100644
>> --- a/include/linux/binfmts.h
>> +++ b/include/linux/binfmts.h
>> @@ -26,6 +26,8 @@ struct linux_binprm {
>>  	unsigned long p; /* current top of mem */
>>  	unsigned long argmin; /* rlimit marker for copy_strings() */
>>  	unsigned int
>> +		/* It is safe to use the creds of a script (see binfmt_misc) */
>> +		preserve_creds:1,
>
> How about:
>
> 		/*
> 		 * A binfmt handler will set this to True before calling
> 		 * prepare_binprm() if it is safe to reuse the previous
> 		 * credentials, based on bprm->file (see binfmt_misc).
> 		 */

I think that is more words saying less.

While I agree it might be better.  I don't see what your comment adds to
the understanding.  What do you see my comment not saying that is important?

Eric


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 3/8] exec: Convert security_bprm_set_creds into security_bprm_repopulate_creds
  2020-05-19 19:03           ` Eric W. Biederman
@ 2020-05-19 19:14             ` Kees Cook
  2020-05-20 20:22               ` Eric W. Biederman
  0 siblings, 1 reply; 122+ messages in thread
From: Kees Cook @ 2020-05-19 19:14 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, Jann Horn,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton, Casey Schaufler,
	linux-security-module, James Morris, Serge E. Hallyn,
	Andy Lutomirski

On Tue, May 19, 2020 at 02:03:23PM -0500, Eric W. Biederman wrote:
> Kees Cook <keescook@chromium.org> writes:
> 
> > On Mon, May 18, 2020 at 07:31:14PM -0500, Eric W. Biederman wrote:
> >> [...]
> >> diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
> >> index d1217fcdedea..8605ab4a0f89 100644
> >> --- a/include/linux/binfmts.h
> >> +++ b/include/linux/binfmts.h
> >> @@ -27,10 +27,10 @@ struct linux_binprm {
> >>  	unsigned long argmin; /* rlimit marker for copy_strings() */
> >>  	unsigned int
> >>  		/*
> >> -		 * True if most recent call to cap_bprm_set_creds
> >> +		 * True if most recent call to security_bprm_set_creds
> >>  		 * resulted in elevated privileges.
> >>  		 */
> >> -		cap_elevated:1,
> >> +		active_secureexec:1,
> >
> > Also, I'd like it if this comment could be made more verbose as well, for
> > anyone trying to understand the binfmt execution flow for the first time.
> > Perhaps:
> >
> > 		/*
> > 		 * Must be set True during the any call to
> > 		 * bprm_set_creds hook where the execution would
> > 		 * reuslt in elevated privileges. (The hook can be
> > 		 * called multiple times during nested interpreter
> > 		 * resolution across binfmt_script, binfmt_misc, etc).
> > 		 */
> Well it is not during but after the call that it becomes true.
> I think most recent covers the case of multiple calls.

I'm thinking of an LSM writing reading these comments to decide what
they need to do to the flags, so it's a direction to them to set it to
true if they have determined that privilege was gained. (Though in
theory, this is all moot since only the commoncap hook cares.)

> I think having the loop explicitly in the code a few patches
> later makes it clear that there is a loop dealing with interpreters.
> 
> Conciseness has a virtue in that it is easy to absorb.  Seeing
> active says most recent and secureexec does not is enough to ask
> questions and look at the code.

I still think a hint about the nature of nested exec resolution would be
nice in here somewhere, especially given that this value is zeroed
before each call to the hook.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 4/8] exec: Allow load_misc_binary to call prepare_binfmt unconditionally
  2020-05-19 19:08           ` Eric W. Biederman
@ 2020-05-19 19:17             ` Kees Cook
  0 siblings, 0 replies; 122+ messages in thread
From: Kees Cook @ 2020-05-19 19:17 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, Jann Horn,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton, Casey Schaufler,
	linux-security-module, James Morris, Serge E. Hallyn,
	Andy Lutomirski

On Tue, May 19, 2020 at 02:08:34PM -0500, Eric W. Biederman wrote:
> Kees Cook <keescook@chromium.org> writes:
> 
> > On Mon, May 18, 2020 at 07:31:51PM -0500, Eric W. Biederman wrote:
> >> [...]
> >> diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
> >> index 8605ab4a0f89..dbb5614d62a2 100644
> >> --- a/include/linux/binfmts.h
> >> +++ b/include/linux/binfmts.h
> >> @@ -26,6 +26,8 @@ struct linux_binprm {
> >>  	unsigned long p; /* current top of mem */
> >>  	unsigned long argmin; /* rlimit marker for copy_strings() */
> >>  	unsigned int
> >> +		/* It is safe to use the creds of a script (see binfmt_misc) */
> >> +		preserve_creds:1,
> >
> > How about:
> >
> > 		/*
> > 		 * A binfmt handler will set this to True before calling
> > 		 * prepare_binprm() if it is safe to reuse the previous
> > 		 * credentials, based on bprm->file (see binfmt_misc).
> > 		 */
> 
> I think that is more words saying less.
> 
> While I agree it might be better.  I don't see what your comment adds to
> the understanding.  What do you see my comment not saying that is important?

I think your comment is aimed at the consumer of preserve_creds (i.e.
the fs/exec.c code), whereas I think the comment should be directed at
a binfmt author, who wants to answer the question "why would I set this
flag?" Though I strongly hope we never have new binfmts. ;)

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 6/8] exec/binfmt_script: Don't modify bprm->buf and then return -ENOEXEC
  2020-05-19 19:08         ` Kees Cook
@ 2020-05-19 19:19           ` Eric W. Biederman
  0 siblings, 0 replies; 122+ messages in thread
From: Eric W. Biederman @ 2020-05-19 19:19 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, Jann Horn,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton, Casey Schaufler,
	linux-security-module, James Morris, Serge E. Hallyn,
	Andy Lutomirski

Kees Cook <keescook@chromium.org> writes:

> On Mon, May 18, 2020 at 07:33:21PM -0500, Eric W. Biederman wrote:
>> 
>> When replacing loops with next_non_spacetab and next_terminator care
>> has been take that the logic of the parsing code (short of replacing
>> characters by '\0') remains the same.
>
> Ah, interesting. As in, bprm->buf must not be modified unless the binfmt
> handler is going to succeed. I think this requirement should be
> documented in the binfmt struct header file.

I think the best way to document this is to modify bprm->buf to be
"const char buf[BINPRM_BUF_SIZE]" or something like that and not
allow any modifications by anything except for the code that
initially reads in contets of the file.

That unfortunately requires copy_strings_kernel which has become
copy_string_kernel to take a length.  Then I don't need to modify the
buffer at all here.

I believe binfmt_scripts is a bit unique in wanting to modify the buffer
because it is parsing strings.

The requirement is that a binfmt should not modify bprm unless it will
succeed or fail with an error that is not -ENOEXEC.  The fundamental
issue is that search_binary_handler will reuse bprm if -ENOEXEC is
returned.

Until the next patch there is an escape hatch by clearing and closing
bprm->file but that goes away.  Which is why I need this patch.

I guess I can see adding a comment about the general case of not
changing bprm unless you are doing something other than returning
-ENOEXEC and letting the search continue.

Eric


>> [...]
>> diff --git a/fs/binfmt_script.c b/fs/binfmt_script.c
>> index 8d718d8fd0fe..85e0ef86eb11 100644
>> --- a/fs/binfmt_script.c
>> +++ b/fs/binfmt_script.c
>> @@ -71,39 +56,48 @@ static int load_script(struct linux_binprm *bprm)
>>  	 * parse them on its own.
>>  	 */
>>  	buf_end = bprm->buf + sizeof(bprm->buf) - 1;
>> -	cp = strnchr(bprm->buf, sizeof(bprm->buf), '\n');
>> -	if (!cp) {
>> -		cp = next_non_spacetab(bprm->buf + 2, buf_end);
>> -		if (!cp)
>> +	i_end = strnchr(bprm->buf, sizeof(bprm->buf), '\n');
>> +	if (!i_end) {
>> +		i_end = next_non_spacetab(bprm->buf + 2, buf_end);
>> +		if (!i_end)
>>  			return -ENOEXEC; /* Entire buf is spaces/tabs */
>>  		/*
>>  		 * If there is no later space/tab/NUL we must assume the
>>  		 * interpreter path is truncated.
>>  		 */
>> -		if (!next_terminator(cp, buf_end))
>> +		if (!next_terminator(i_end, buf_end))
>>  			return -ENOEXEC;
>> -		cp = buf_end;
>> +		i_end = buf_end;
>>  	}
>> -	/* NUL-terminate the buffer and any trailing spaces/tabs. */
>> -	*cp = '\0';
>> -	while (cp > bprm->buf) {
>> -		cp--;
>> -		if ((*cp == ' ') || (*cp == '\t'))
>> -			*cp = '\0';
>> -		else
>> -			break;
>> -	}
>> -	for (cp = bprm->buf+2; (*cp == ' ') || (*cp == '\t'); cp++);
>> -	if (*cp == '\0')
>> +	/* Trim any trailing spaces/tabs from i_end */
>> +	while (spacetab(i_end[-1]))
>> +		i_end--;
>> +
>> +	/* Skip over leading spaces/tabs */
>> +	i_name = next_non_spacetab(bprm->buf+2, i_end);
>> +	if (!i_name || (i_name == i_end))
>>  		return -ENOEXEC; /* No interpreter name found */
>> -	i_name = cp;
>> +
>> +	/* Is there an optional argument? */
>>  	i_arg = NULL;
>> -	for ( ; *cp && (*cp != ' ') && (*cp != '\t'); cp++)
>> -		/* nothing */ ;
>> -	while ((*cp == ' ') || (*cp == '\t'))
>> -		*cp++ = '\0';
>> -	if (*cp)
>> -		i_arg = cp;
>> +	i_sep = next_terminator(i_name, i_end);
>> +	if (i_sep && (*i_sep != '\0'))
>> +		i_arg = next_non_spacetab(i_sep, i_end);
>> +
>> +	/*
>> +	 * If the script filename will be inaccessible after exec, typically
>> +	 * because it is a "/dev/fd/<fd>/.." path against an O_CLOEXEC fd, give
>> +	 * up now (on the assumption that the interpreter will want to load
>> +	 * this file).
>> +	 */
>> +	if (bprm->interp_flags & BINPRM_FLAGS_PATH_INACCESSIBLE)
>> +		return -ENOENT;
>> +
>> +	/* Release since we are not mapping a binary into memory. */
>> +	allow_write_access(bprm->file);
>> +	fput(bprm->file);
>> +	bprm->file = NULL;
>> +
>>  	/*
>>  	 * OK, we've parsed out the interpreter name and
>>  	 * (optional) argument.
>> @@ -121,7 +115,9 @@ static int load_script(struct linux_binprm *bprm)
>>  	if (retval < 0)
>>  		return retval;
>>  	bprm->argc++;
>> +	*((char *)i_end) = '\0';
>>  	if (i_arg) {
>> +		*((char *)i_sep) = '\0';
>>  		retval = copy_strings_kernel(1, &i_arg, bprm);
>>  		if (retval < 0)
>>  			return retval;
>
> I think this is all correct, though I'm always suspicious of my visual
> inspection of string parsers. ;)
>
> I had a worry the \n was not handled correctly in some case. I.e. before
> any \n was converted into \0, and so next_terminator() didn't need to
> consider \n separately. (next_non_spacetab() doesn't care since \n and \0
> are both not ' ' nor '\t'.) For next_terminator(), though, I was worried
> there was a case where *i_end == '\n', and next_terminator()
> will return NULL instead of "last" due to *last being '\n' instead of
> '\0', causing a problem, but you're using the adjusted i_end so I think
> it's correct. And you've handled i_name == i_end.
>
> I will see if I can find my testing scripts I used when commit
> b5372fe5dc84 originally landed to double-check... until then:
>
> Reviewed-by: Kees Cook <keescook@chromium.org>

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 7/8] exec: Generic execfd support
  2020-05-19  0:33       ` [PATCH v2 7/8] exec: Generic execfd support Eric W. Biederman
@ 2020-05-19 19:46         ` Kees Cook
  2020-05-19 19:54           ` Linus Torvalds
  2020-05-19 21:59         ` Rob Landley
  1 sibling, 1 reply; 122+ messages in thread
From: Kees Cook @ 2020-05-19 19:46 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, Jann Horn,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton, Casey Schaufler,
	linux-security-module, James Morris, Serge E. Hallyn,
	Andy Lutomirski

On Mon, May 18, 2020 at 07:33:46PM -0500, Eric W. Biederman wrote:
> 
> Most of the support for passing the file descriptor of an executable
> to an interpreter already lives in the generic code and in binfmt_elf.
> Rework the fields in binfmt_elf that deal with executable file
> descriptor passing to make executable file descriptor passing a first
> class concept.
> 
> Move the fd_install from binfmt_misc into begin_new_exec after the new
> creds have been installed.  This means that accessing the file through
> /proc/<pid>/fd/N is able to see the creds for the new executable
> before allowing access to the new executables files.
> 
> Performing the install of the executables file descriptor after
> the point of no return also means that nothing special needs to
> be done on error.  The exiting of the process will close all
> of it's open files.
> 
> Move the would_dump from binfmt_misc into begin_new_exec right
> after would_dump is called on the bprm->file.  This makes it
> obvious this case exists and that no nesting of bprm->file is
> currently supported.
> 
> In binfmt_misc the movement of fd_install into generic code means
> that it's special error exit path is no longer needed.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Yes, this is so much nicer. :) My head did spin a little between changing
the management of bprm->executable between this patch and the next,
but I'm okay now. ;)

Reviewed-by: Kees Cook <keescook@chromium.org>

nits/thoughts below...

> [...]
> diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
> index 8c7779d6bf19..653508b25815 100644
> --- a/include/linux/binfmts.h
> +++ b/include/linux/binfmts.h
> [...]
> @@ -48,6 +51,7 @@ struct linux_binprm {
>  	unsigned int taso:1;
>  #endif
>  	unsigned int recursion_depth; /* only for search_binary_handler() */
> +	struct file * executable; /* Executable to pass to the interpreter */
>  	struct file * file;
>  	struct cred *cred;	/* new credentials */

nit: can we fix the "* " stuff here? This should be *file and *executable.

> [...]
> @@ -69,10 +73,6 @@ struct linux_binprm {
>  #define BINPRM_FLAGS_ENFORCE_NONDUMP_BIT 0
>  #define BINPRM_FLAGS_ENFORCE_NONDUMP (1 << BINPRM_FLAGS_ENFORCE_NONDUMP_BIT)
>  
> -/* fd of the binary should be passed to the interpreter */
> -#define BINPRM_FLAGS_EXECFD_BIT 1
> -#define BINPRM_FLAGS_EXECFD (1 << BINPRM_FLAGS_EXECFD_BIT)
> -
>  /* filename of the binary will be inaccessible after exec */
>  #define BINPRM_FLAGS_PATH_INACCESSIBLE_BIT 2
>  #define BINPRM_FLAGS_PATH_INACCESSIBLE (1 << BINPRM_FLAGS_PATH_INACCESSIBLE_BIT)

nit: may as well renumber BINPRM_FLAGS_PATH_INACCESSIBLE_BIT to 1,
they're not UAPI. And, actually, nothing uses the *_BIT defines, so
probably the entire chunk of code could just be reduced to:

/* either interpreter or executable was unreadable */
#define BINPRM_FLAGS_ENFORCE_NONDUMP    BIT(0)
/* filename of the binary will be inaccessible after exec */
#define BINPRM_FLAGS_PATH_INACCESSIBLE  BIT(1)

Though frankly, I wonder if interp_flags could just be removed in favor
of two new bit members, especially since interp_data is gone:

+               /* Either interpreter or executable was unreadable. */
+               nondumpable:1;
+               /* Filename of the binary will be inaccessible after exec. */
+               path_inaccessible:1;
...
-       unsigned interp_flags;
...etc

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 7/8] exec: Generic execfd support
  2020-05-19 19:46         ` Kees Cook
@ 2020-05-19 19:54           ` Linus Torvalds
  2020-05-19 20:20             ` Eric W. Biederman
  0 siblings, 1 reply; 122+ messages in thread
From: Linus Torvalds @ 2020-05-19 19:54 UTC (permalink / raw)
  To: Kees Cook
  Cc: Eric W. Biederman, Linux Kernel Mailing List, Oleg Nesterov,
	Jann Horn, Greg Ungerer, Rob Landley, Bernd Edlinger,
	linux-fsdevel, Al Viro, Alexey Dobriyan, Andrew Morton,
	Casey Schaufler, LSM List, James Morris, Serge E. Hallyn,
	Andy Lutomirski

On Tue, May 19, 2020 at 12:46 PM Kees Cook <keescook@chromium.org> wrote:
>
> Though frankly, I wonder if interp_flags could just be removed in favor
> of two new bit members, especially since interp_data is gone:

Yeah, I think that might be a good cleanup - but please keep it as a
separate thing at the end of the series (or maybe the beginning)

                Linus

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 7/8] exec: Generic execfd support
  2020-05-19 19:54           ` Linus Torvalds
@ 2020-05-19 20:20             ` Eric W. Biederman
  0 siblings, 0 replies; 122+ messages in thread
From: Eric W. Biederman @ 2020-05-19 20:20 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Kees Cook, Linux Kernel Mailing List, Oleg Nesterov, Jann Horn,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton, Casey Schaufler,
	LSM List, James Morris, Serge E. Hallyn, Andy Lutomirski

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Tue, May 19, 2020 at 12:46 PM Kees Cook <keescook@chromium.org> wrote:
>>
>> Though frankly, I wonder if interp_flags could just be removed in favor
>> of two new bit members, especially since interp_data is gone:
>
> Yeah, I think that might be a good cleanup - but please keep it as a
> separate thing at the end of the series (or maybe the beginning)

I will.

With a little care we can replace setting BINPRM_FLAGS_ENFORCE_NONDUMP
and clearing bprm->mm->dumpable.

Which is the direction I have been looking.

Now that I think about it I believe that the loop in exec_binprm should
be clearing BINPRM_FLAGS_PATH_INACCESSIBLE as it is only relevant to
fexec/execveat with a close on exec file descriptor.

Eric

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 8/8] exec: Remove recursion from search_binary_handler
  2020-05-19  0:34       ` [PATCH v2 8/8] exec: Remove recursion from search_binary_handler Eric W. Biederman
@ 2020-05-19 20:37         ` Kees Cook
  0 siblings, 0 replies; 122+ messages in thread
From: Kees Cook @ 2020-05-19 20:37 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, Jann Horn,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton, Casey Schaufler,
	linux-security-module, James Morris, Serge E. Hallyn,
	Andy Lutomirski

On Mon, May 18, 2020 at 07:34:19PM -0500, Eric W. Biederman wrote:
> 
> Recursion in kernel code is generally a bad idea as it can overflow
> the kernel stack.  Recursion in exec also hides that the code is
> looping and that the loop changes bprm->file.
> 
> Instead of recursing in search_binary_handler have the methods that
> would recurse set bprm->interpreter and return 0.  Modify exec_binprm
> to loop when bprm->interpreter is set.  Consolidate all of the
> reassignments of bprm->file in that loop to make it clear what is
> going on.
> 
> The structure of the new loop in exec_binprm is that all errors return
> immediately, while successful completion (ret == 0 &&
> !bprm->interpreter) just breaks out of the loop and runs what
> exec_bprm has always run upon successful completion.
> 
> Fail if the an interpreter is being call after execfd has been set.
> The code has never properly handled an interpreter being called with
> execfd being set and with reassignments of bprm->file and the
> assignment of bprm->executable in generic code it has finally become
> possible to test and fail when if this problematic condition happens.
> 
> With the reassignments of bprm->file and the assignment of
> bprm->executable moved into the generic code add a test to see if
> bprm->executable is being reassigned.
> 
> In search_binary_handler remove the test for !bprm->file.  With all
> reassignments of bprm->file moved to exec_binprm bprm->file can never
> be NULL in search_binary_handler.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Lovely!

Reviewed-by: Kees Cook <keescook@chromium.org>

I spent some time following the file lifetimes of deny/allow_write_access()
and the fget/fput() paths. It all looks correct to me; it's tricky
(especially bprm->executable) but so very much cleaner than before. :)

The only suggestion I could come up with is more comments (surprise) to
help anyone new to this loop realize what the "common" path is (and
similarly, a compiler hint too):

diff --git a/fs/exec.c b/fs/exec.c
index a9f421ec9e27..738051a698e1 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1790,15 +1790,19 @@ static int exec_binprm(struct linux_binprm *bprm)
 	/* This allows 4 levels of binfmt rewrites before failing hard. */
 	for (depth = 0;; depth++) {
 		struct file *exec;
+
 		if (depth > 5)
 			return -ELOOP;
 
 		ret = search_binary_handler(bprm);
+		/* Unrecoverable error, give up. */
 		if (ret < 0)
 			return ret;
-		if (!bprm->interpreter)
+		/* Found final handler, start execution. */
+		if (likely(!bprm->interpreter))
 			break;
 
+		/* Found an interpreter, so try again and attempt to run it. */
 		exec = bprm->file;
 		bprm->file = bprm->interpreter;
 		bprm->interpreter = NULL;

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 2/8] exec: Factor security_bprm_creds_for_exec out of security_bprm_set_creds
  2020-05-19 18:10         ` Kees Cook
@ 2020-05-19 21:28           ` James Morris
  0 siblings, 0 replies; 122+ messages in thread
From: James Morris @ 2020-05-19 21:28 UTC (permalink / raw)
  To: Kees Cook
  Cc: Eric W. Biederman, linux-kernel, Linus Torvalds, Oleg Nesterov,
	Jann Horn, Greg Ungerer, Rob Landley, Bernd Edlinger,
	linux-fsdevel, Al Viro, Alexey Dobriyan, Andrew Morton,
	Casey Schaufler, linux-security-module, Serge E. Hallyn,
	Andy Lutomirski

On Tue, 19 May 2020, Kees Cook wrote:

> >  	/* SELinux context only depends on initial program or script and not
> >  	 * the script interpreter */
> > -	if (bprm->called_set_creds)
> > -		return 0;
> >  
> >  	old_tsec = selinux_cred(current_cred());
> >  	new_tsec = selinux_cred(bprm->cred);
> 
> As you've done in the other LSMs, I think this comment can be removed
> (or moved to the top of the function) too.

I'd prefer moved to top of the function.

-- 
James Morris
<jmorris@namei.org>


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 5/8] exec: Move the call of prepare_binprm into search_binary_handler
  2020-05-19  0:32       ` [PATCH v2 5/8] exec: Move the call of prepare_binprm into search_binary_handler Eric W. Biederman
  2020-05-19 18:27         ` Kees Cook
@ 2020-05-19 21:30         ` James Morris
  1 sibling, 0 replies; 122+ messages in thread
From: James Morris @ 2020-05-19 21:30 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, Jann Horn,
	Kees Cook, Greg Ungerer, Rob Landley, Bernd Edlinger,
	linux-fsdevel, Al Viro, Alexey Dobriyan, Andrew Morton,
	Casey Schaufler, linux-security-module, Serge E. Hallyn,
	Andy Lutomirski

On Mon, 18 May 2020, Eric W. Biederman wrote:

> 
> The code in prepare_binary_handler needs to be run every time
> search_binary_handler is called so move the call into search_binary_handler
> itself to make the code simpler and easier to understand.
> 
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

Nice cleanup.

Reviewed-by: James Morris <jamorris@linux.microsoft.com>

-- 
James Morris
<jmorris@namei.org>


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 3/8] exec: Convert security_bprm_set_creds into security_bprm_repopulate_creds
  2020-05-19  0:31       ` [PATCH v2 3/8] exec: Convert security_bprm_set_creds into security_bprm_repopulate_creds Eric W. Biederman
  2020-05-19 18:21         ` Kees Cook
@ 2020-05-19 21:52         ` James Morris
  2020-05-20 12:40           ` Eric W. Biederman
  1 sibling, 1 reply; 122+ messages in thread
From: James Morris @ 2020-05-19 21:52 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, Jann Horn,
	Kees Cook, Greg Ungerer, Rob Landley, Bernd Edlinger,
	linux-fsdevel, Al Viro, Alexey Dobriyan, Andrew Morton,
	Casey Schaufler, linux-security-module, Serge E. Hallyn,
	Andy Lutomirski

On Mon, 18 May 2020, Eric W. Biederman wrote:

> diff --git a/fs/exec.c b/fs/exec.c
> index 9e70da47f8d9..8e3b93d51d31 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1366,7 +1366,7 @@ int begin_new_exec(struct linux_binprm * bprm)
>  	 * the final state of setuid/setgid/fscaps can be merged into the
>  	 * secureexec flag.
>  	 */
> -	bprm->secureexec |= bprm->cap_elevated;
> +	bprm->secureexec |= bprm->active_secureexec;

Which kernel tree are these patches for? Seems like begin_new_exec() is 
from a prerequisite patchset.


-- 
James Morris
<jmorris@namei.org>


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 0/8] exec: Control flow simplifications
  2020-05-19  0:29     ` [PATCH v2 0/8] exec: Control flow simplifications Eric W. Biederman
                         ` (8 preceding siblings ...)
  2020-05-19  1:25       ` [PATCH v2 0/8] exec: Control flow simplifications Linus Torvalds
@ 2020-05-19 21:55       ` Kees Cook
  2020-05-20 13:02         ` Eric W. Biederman
  2020-05-20 22:12       ` Eric W. Biederman
  10 siblings, 1 reply; 122+ messages in thread
From: Kees Cook @ 2020-05-19 21:55 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, Jann Horn,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton, Casey Schaufler,
	linux-security-module, James Morris, Serge E. Hallyn,
	Andy Lutomirski

On Mon, May 18, 2020 at 07:29:00PM -0500, Eric W. Biederman wrote:
>  arch/alpha/kernel/binfmt_loader.c  | 11 +----
>  fs/binfmt_elf.c                    |  4 +-
>  fs/binfmt_elf_fdpic.c              |  4 +-
>  fs/binfmt_em86.c                   | 13 +----
>  fs/binfmt_misc.c                   | 69 ++++-----------------------
>  fs/binfmt_script.c                 | 82 ++++++++++++++------------------
>  fs/exec.c                          | 97 ++++++++++++++++++++++++++------------
>  include/linux/binfmts.h            | 36 ++++++--------
>  include/linux/lsm_hook_defs.h      |  3 +-
>  include/linux/lsm_hooks.h          | 52 +++++++++++---------
>  include/linux/security.h           | 14 ++++--
>  kernel/cred.c                      |  3 ++
>  security/apparmor/domain.c         |  7 +--
>  security/apparmor/include/domain.h |  2 +-
>  security/apparmor/lsm.c            |  2 +-
>  security/commoncap.c               |  9 ++--
>  security/security.c                |  9 +++-
>  security/selinux/hooks.c           |  8 ++--
>  security/smack/smack_lsm.c         |  9 ++--
>  security/tomoyo/tomoyo.c           | 12 ++---
>  20 files changed, 202 insertions(+), 244 deletions(-)

Oh, BTW, heads up on this (trivially but annoyingly) conflicting with
the copy_strings_kernel/copy_string/kernel change:

https://ozlabs.org/~akpm/mmotm/broken-out/exec-simplify-the-copy_strings_kernel-calling-convention.patch

Is it worth pulling that and these into your tree?

https://ozlabs.org/~akpm/mmotm/broken-out/exec-open-code-copy_string_kernel.patch

https://ozlabs.org/~akpm/mmotm/broken-out/umh-fix-refcount-underflow-in-fork_usermode_blob.patch


-- 
Kees Cook

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 7/8] exec: Generic execfd support
  2020-05-19  0:33       ` [PATCH v2 7/8] exec: Generic execfd support Eric W. Biederman
  2020-05-19 19:46         ` Kees Cook
@ 2020-05-19 21:59         ` Rob Landley
  2020-05-20 16:05           ` Eric W. Biederman
  1 sibling, 1 reply; 122+ messages in thread
From: Rob Landley @ 2020-05-19 21:59 UTC (permalink / raw)
  To: Eric W. Biederman, linux-kernel
  Cc: Linus Torvalds, Oleg Nesterov, Jann Horn, Kees Cook,
	Greg Ungerer, Bernd Edlinger, linux-fsdevel, Al Viro,
	Alexey Dobriyan, Andrew Morton, Casey Schaufler,
	linux-security-module, James Morris, Serge E. Hallyn,
	Andy Lutomirski

On 5/18/20 7:33 PM, Eric W. Biederman wrote:
> 
> Most of the support for passing the file descriptor of an executable
> to an interpreter already lives in the generic code and in binfmt_elf.
> Rework the fields in binfmt_elf that deal with executable file
> descriptor passing to make executable file descriptor passing a first
> class concept.

I was reading this to try to figure out how to do execve(NULL, argv[], envp) to
re-exec self after a vfork() in a chroot with no /proc, and hit the most trivial
quibble ever:

> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1323,7 +1323,10 @@ int begin_new_exec(struct linux_binprm * bprm)
>  	 */
>  	set_mm_exe_file(bprm->mm, bprm->file);
>  
> +	/* If the binary is not readable than enforce mm->dumpable=0 */

then

Rob

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 3/8] exec: Convert security_bprm_set_creds into security_bprm_repopulate_creds
  2020-05-19 21:52         ` James Morris
@ 2020-05-20 12:40           ` Eric W. Biederman
  0 siblings, 0 replies; 122+ messages in thread
From: Eric W. Biederman @ 2020-05-20 12:40 UTC (permalink / raw)
  To: James Morris
  Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, Jann Horn,
	Kees Cook, Greg Ungerer, Rob Landley, Bernd Edlinger,
	linux-fsdevel, Al Viro, Alexey Dobriyan, Andrew Morton,
	Casey Schaufler, linux-security-module, Serge E. Hallyn,
	Andy Lutomirski

James Morris <jmorris@namei.org> writes:

> On Mon, 18 May 2020, Eric W. Biederman wrote:
>
>> diff --git a/fs/exec.c b/fs/exec.c
>> index 9e70da47f8d9..8e3b93d51d31 100644
>> --- a/fs/exec.c
>> +++ b/fs/exec.c
>> @@ -1366,7 +1366,7 @@ int begin_new_exec(struct linux_binprm * bprm)
>>  	 * the final state of setuid/setgid/fscaps can be merged into the
>>  	 * secureexec flag.
>>  	 */
>> -	bprm->secureexec |= bprm->cap_elevated;
>> +	bprm->secureexec |= bprm->active_secureexec;
>
> Which kernel tree are these patches for? Seems like begin_new_exec() is 
> from a prerequisite patchset.

The base is:
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git exec-next

I should have mentioned.  I am several round deep in cleaning up exec
already.

begin_new_exec is essentially forget_old_exec.

Eric



^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 0/8] exec: Control flow simplifications
  2020-05-19 21:55       ` Kees Cook
@ 2020-05-20 13:02         ` Eric W. Biederman
  0 siblings, 0 replies; 122+ messages in thread
From: Eric W. Biederman @ 2020-05-20 13:02 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, Jann Horn,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton, Casey Schaufler,
	linux-security-module, James Morris, Serge E. Hallyn,
	Andy Lutomirski, Christoph Hellwig

Kees Cook <keescook@chromium.org> writes:

> On Mon, May 18, 2020 at 07:29:00PM -0500, Eric W. Biederman wrote:
>>  arch/alpha/kernel/binfmt_loader.c  | 11 +----
>>  fs/binfmt_elf.c                    |  4 +-
>>  fs/binfmt_elf_fdpic.c              |  4 +-
>>  fs/binfmt_em86.c                   | 13 +----
>>  fs/binfmt_misc.c                   | 69 ++++-----------------------
>>  fs/binfmt_script.c                 | 82 ++++++++++++++------------------
>>  fs/exec.c                          | 97 ++++++++++++++++++++++++++------------
>>  include/linux/binfmts.h            | 36 ++++++--------
>>  include/linux/lsm_hook_defs.h      |  3 +-
>>  include/linux/lsm_hooks.h          | 52 +++++++++++---------
>>  include/linux/security.h           | 14 ++++--
>>  kernel/cred.c                      |  3 ++
>>  security/apparmor/domain.c         |  7 +--
>>  security/apparmor/include/domain.h |  2 +-
>>  security/apparmor/lsm.c            |  2 +-
>>  security/commoncap.c               |  9 ++--
>>  security/security.c                |  9 +++-
>>  security/selinux/hooks.c           |  8 ++--
>>  security/smack/smack_lsm.c         |  9 ++--
>>  security/tomoyo/tomoyo.c           | 12 ++---
>>  20 files changed, 202 insertions(+), 244 deletions(-)
>
> Oh, BTW, heads up on this (trivially but annoyingly) conflicting with
> the copy_strings_kernel/copy_string/kernel change:
>
> https://ozlabs.org/~akpm/mmotm/broken-out/exec-simplify-the-copy_strings_kernel-calling-convention.patch
>
> Is it worth pulling that and these into your tree?
>
> https://ozlabs.org/~akpm/mmotm/broken-out/exec-open-code-copy_string_kernel.patch
>
> https://ozlabs.org/~akpm/mmotm/broken-out/umh-fix-refcount-underflow-in-fork_usermode_blob.patch

Good question.  It is part of the greater set_fs removal work, and I
don't want to mess that up.

I would love to give copy_string_kernel a length parameter so
binfmt_script did not have to modify it's buffer or copy the string,
before calling copy_string_kernel.

Hmm.  I already have to call strdup on i_name in brpm_change_interp.
So I probably just want to bite the bullet and figure out a way to do
strdup earlier.

So unless it makes things easier for Andrew I think it is probably
easier to live with the conflict for now, and use this conversation
as inspiration for my next round of cleanups of binfmt_misc.

Eric


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 7/8] exec: Generic execfd support
  2020-05-19 21:59         ` Rob Landley
@ 2020-05-20 16:05           ` Eric W. Biederman
  2020-05-21 22:50             ` Rob Landley
  0 siblings, 1 reply; 122+ messages in thread
From: Eric W. Biederman @ 2020-05-20 16:05 UTC (permalink / raw)
  To: Rob Landley
  Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, Jann Horn,
	Kees Cook, Greg Ungerer, Bernd Edlinger, linux-fsdevel, Al Viro,
	Alexey Dobriyan, Andrew Morton, Casey Schaufler,
	linux-security-module, James Morris, Serge E. Hallyn,
	Andy Lutomirski

Rob Landley <rob@landley.net> writes:

> On 5/18/20 7:33 PM, Eric W. Biederman wrote:
>> 
>> Most of the support for passing the file descriptor of an executable
>> to an interpreter already lives in the generic code and in binfmt_elf.
>> Rework the fields in binfmt_elf that deal with executable file
>> descriptor passing to make executable file descriptor passing a first
>> class concept.
>
> I was reading this to try to figure out how to do execve(NULL, argv[], envp) to
> re-exec self after a vfork() in a chroot with no /proc, and hit the most trivial
> quibble ever:

We have /proc/self/exe today.  If I understand you correctly you would
like to do the equivalent of 'execve("/proc/self/exe", argv[], envp[])'
without having proc mounted.

The file descriptor is stored in mm->exe_file.

Probably the most straight forward implementation is to allow
execveat(AT_EXE_FILE, ...).

You can look at binfmt_misc for how to reopen an open file descriptor.

>> --- a/fs/exec.c
>> +++ b/fs/exec.c
>> @@ -1323,7 +1323,10 @@ int begin_new_exec(struct linux_binprm * bprm)
>>  	 */
>>  	set_mm_exe_file(bprm->mm, bprm->file);
>>  
>> +	/* If the binary is not readable than enforce mm->dumpable=0 */
>
> then

It took me a minute yes good catch.

Eric

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 3/8] exec: Convert security_bprm_set_creds into security_bprm_repopulate_creds
  2020-05-19 19:14             ` Kees Cook
@ 2020-05-20 20:22               ` Eric W. Biederman
  2020-05-20 20:53                 ` Kees Cook
  0 siblings, 1 reply; 122+ messages in thread
From: Eric W. Biederman @ 2020-05-20 20:22 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, Jann Horn,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton, Casey Schaufler,
	linux-security-module, James Morris, Serge E. Hallyn,
	Andy Lutomirski

Kees Cook <keescook@chromium.org> writes:

> On Tue, May 19, 2020 at 02:03:23PM -0500, Eric W. Biederman wrote:
>> Kees Cook <keescook@chromium.org> writes:
>> 
>> > On Mon, May 18, 2020 at 07:31:14PM -0500, Eric W. Biederman wrote:
>> >> [...]
>> >> diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
>> >> index d1217fcdedea..8605ab4a0f89 100644
>> >> --- a/include/linux/binfmts.h
>> >> +++ b/include/linux/binfmts.h
>> >> @@ -27,10 +27,10 @@ struct linux_binprm {
>> >>  	unsigned long argmin; /* rlimit marker for copy_strings() */
>> >>  	unsigned int
>> >>  		/*
>> >> -		 * True if most recent call to cap_bprm_set_creds
>> >> +		 * True if most recent call to security_bprm_set_creds
>> >>  		 * resulted in elevated privileges.
>> >>  		 */
>> >> -		cap_elevated:1,
>> >> +		active_secureexec:1,
>> >
>> > Also, I'd like it if this comment could be made more verbose as well, for
>> > anyone trying to understand the binfmt execution flow for the first time.
>> > Perhaps:
>> >
>> > 		/*
>> > 		 * Must be set True during the any call to
>> > 		 * bprm_set_creds hook where the execution would
>> > 		 * reuslt in elevated privileges. (The hook can be
>> > 		 * called multiple times during nested interpreter
>> > 		 * resolution across binfmt_script, binfmt_misc, etc).
>> > 		 */
>> Well it is not during but after the call that it becomes true.
>> I think most recent covers the case of multiple calls.
>
> I'm thinking of an LSM writing reading these comments to decide what
> they need to do to the flags, so it's a direction to them to set it to
> true if they have determined that privilege was gained. (Though in
> theory, this is all moot since only the commoncap hook cares.)

The comments for an LSM writer are in include/linux/lsm_hooks.h

 * @bprm_repopulate_creds:
 *	Assuming that the relevant bits of @bprm->cred->security have been
 *	previously set, examine @bprm->file and regenerate them.  This is
 *	so that the credentials derived from the interpreter the code is
 *	actually going to run are used rather than credentials derived
 *	from a script.  This done because the interpreter binary needs to
 *	reopen script, and may end up opening something completely different.
 *	This hook may also optionally check permissions (e.g. for
 *	transitions between security domains).
 *	The hook must set @bprm->active_secureexec to 1 if AT_SECURE should be set to
 *	request libc enable secure mode.
 *	@bprm contains the linux_binprm structure.
 *	Return 0 if the hook is successful and permission is granted.

I hope that is detailed enough.

I will leave the rest of the comments for the maintainer of the code.

I really don't think we should duplicate the prescriptive comments in
multiple locations.

Eric

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 3/8] exec: Convert security_bprm_set_creds into security_bprm_repopulate_creds
  2020-05-20 20:22               ` Eric W. Biederman
@ 2020-05-20 20:53                 ` Kees Cook
  0 siblings, 0 replies; 122+ messages in thread
From: Kees Cook @ 2020-05-20 20:53 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, Jann Horn,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton, Casey Schaufler,
	linux-security-module, James Morris, Serge E. Hallyn,
	Andy Lutomirski

On Wed, May 20, 2020 at 03:22:38PM -0500, Eric W. Biederman wrote:
> Kees Cook <keescook@chromium.org> writes:
> 
> > On Tue, May 19, 2020 at 02:03:23PM -0500, Eric W. Biederman wrote:
> >> Kees Cook <keescook@chromium.org> writes:
> >> 
> >> > On Mon, May 18, 2020 at 07:31:14PM -0500, Eric W. Biederman wrote:
> >> >> [...]
> >> >> diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
> >> >> index d1217fcdedea..8605ab4a0f89 100644
> >> >> --- a/include/linux/binfmts.h
> >> >> +++ b/include/linux/binfmts.h
> >> >> @@ -27,10 +27,10 @@ struct linux_binprm {
> >> >>  	unsigned long argmin; /* rlimit marker for copy_strings() */
> >> >>  	unsigned int
> >> >>  		/*
> >> >> -		 * True if most recent call to cap_bprm_set_creds
> >> >> +		 * True if most recent call to security_bprm_set_creds
> >> >>  		 * resulted in elevated privileges.
> >> >>  		 */
> >> >> -		cap_elevated:1,
> >> >> +		active_secureexec:1,
> >> >
> >> > Also, I'd like it if this comment could be made more verbose as well, for
> >> > anyone trying to understand the binfmt execution flow for the first time.
> >> > Perhaps:
> >> >
> >> > 		/*
> >> > 		 * Must be set True during the any call to
> >> > 		 * bprm_set_creds hook where the execution would
> >> > 		 * reuslt in elevated privileges. (The hook can be
> >> > 		 * called multiple times during nested interpreter
> >> > 		 * resolution across binfmt_script, binfmt_misc, etc).
> >> > 		 */
> >> Well it is not during but after the call that it becomes true.
> >> I think most recent covers the case of multiple calls.
> >
> > I'm thinking of an LSM writing reading these comments to decide what
> > they need to do to the flags, so it's a direction to them to set it to
> > true if they have determined that privilege was gained. (Though in
> > theory, this is all moot since only the commoncap hook cares.)
> 
> The comments for an LSM writer are in include/linux/lsm_hooks.h
> 
>  * @bprm_repopulate_creds:
>  *	Assuming that the relevant bits of @bprm->cred->security have been
>  *	previously set, examine @bprm->file and regenerate them.  This is
>  *	so that the credentials derived from the interpreter the code is
>  *	actually going to run are used rather than credentials derived
>  *	from a script.  This done because the interpreter binary needs to
>  *	reopen script, and may end up opening something completely different.
>  *	This hook may also optionally check permissions (e.g. for
>  *	transitions between security domains).
>  *	The hook must set @bprm->active_secureexec to 1 if AT_SECURE should be set to
>  *	request libc enable secure mode.
>  *	@bprm contains the linux_binprm structure.
>  *	Return 0 if the hook is successful and permission is granted.
> 
> I hope that is detailed enough.
> 
> I will leave the rest of the comments for the maintainer of the code.
> 
> I really don't think we should duplicate the prescriptive comments in
> multiple locations.

Okay, that's fair enough. Thanks!

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 0/8] exec: Control flow simplifications
  2020-05-19  0:29     ` [PATCH v2 0/8] exec: Control flow simplifications Eric W. Biederman
                         ` (9 preceding siblings ...)
  2020-05-19 21:55       ` Kees Cook
@ 2020-05-20 22:12       ` Eric W. Biederman
  2020-05-20 23:43         ` Kees Cook
  10 siblings, 1 reply; 122+ messages in thread
From: Eric W. Biederman @ 2020-05-20 22:12 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Oleg Nesterov, Jann Horn, Kees Cook,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton, Casey Schaufler,
	linux-security-module, James Morris, Serge E. Hallyn,
	Andy Lutomirski


I have pushed this out to:

git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git exec-next

I have collected up the acks and reviewed-by's, and fixed a couple of
typos but that is it.

If we need comment fixes or additional cleanups we can apply that on top
of this series.   This way the code can sit in linux-next until the
merge window opens.

Before I pushed this out I also tested this with Kees new test of
binfmt_misc and did not find any problems.

Eric

The git range-diff of the changes I applied before pushing this out:

1:  f6bb0d6563ca ! 1:  87b047d2be41 exec: Teach prepare_exec_creds how exec treats uids & gids
    @@ Commit message
         update bprm->cred are just need to handle special cases such
         as setuid exec and change of domains.
     
    +    Link: https://lkml.kernel.org/r/871rng22dm.fsf_-_@x220.int.ebiederm.org
    +    Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
    +    Reviewed-by: Kees Cook <keescook@chromium.org>
         Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
     
      ## kernel/cred.c ##
2:  d3b3594be22f ! 2:  b8bff599261c exec: Factor security_bprm_creds_for_exec out of security_bprm_set_creds
    @@ Commit message
         Add or upate comments a appropriate to bring them up to date and
         to reflect this change.
     
    +    Link: https://lkml.kernel.org/r/87v9kszrzh.fsf_-_@x220.int.ebiederm.org
    +    Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
    +    Acked-by: Casey Schaufler <casey@schaufler-ca.com> # For the LSM and Smack bits
    +    Reviewed-by: Kees Cook <keescook@chromium.org>
         Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
     
      ## fs/exec.c ##
3:  65c651a77967 ! 3:  d9d67b76eed6 exec: Convert security_bprm_set_creds into security_bprm_repopulate_creds
    @@ Commit message
         In short two renames and a move in the location of initializing
         bprm->active_secureexec.
     
    +    Link: https://lkml.kernel.org/r/87o8qkzrxp.fsf_-_@x220.int.ebiederm.org
    +    Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
    +    Reviewed-by: Kees Cook <keescook@chromium.org>
         Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
     
      ## fs/exec.c ##
4:  6d0d5da2b45e ! 4:  dbf17e846ea9 exec: Allow load_misc_binary to call prepare_binfmt unconditionally
    @@ Metadata
     Author: Eric W. Biederman <ebiederm@xmission.com>
     
      ## Commit message ##
    -    exec: Allow load_misc_binary to call prepare_binfmt unconditionally
    +    exec: Allow load_misc_binary to call prepare_binprm unconditionally
     
         Add a flag preserve_creds that binfmt_misc can set to prevent
         credentials from being updated.  This allows binfmt_misc to always
    -    call prepare_binfmt.  Allowing the credential computation logic to be
    +    call prepare_binprm.  Allowing the credential computation logic to be
         consolidated.
     
         Not replacing the credentials with the interpreters credentials is
    @@ Commit message
         exec sees.
     
         Ref: c407c033de84 ("[PATCH] binfmt_misc: improve calculation of interpreter's credentials")
    +    Link: https://lkml.kernel.org/r/87imgszrwo.fsf_-_@x220.int.ebiederm.org
    +    Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
    +    Reviewed-by: Kees Cook <keescook@chromium.org>
         Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
     
      ## fs/binfmt_misc.c ##
5:  af7db65c2483 ! 5:  8a8f3bb8ec41 exec: Move the call of prepare_binprm into search_binary_handler
    @@ Commit message
         search_binary_handler is called so move the call into search_binary_handler
         itself to make the code simpler and easier to understand.
     
    +    Link: https://lkml.kernel.org/r/87d070zrvx.fsf_-_@x220.int.ebiederm.org
    +    Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
    +    Reviewed-by: Kees Cook <keescook@chromium.org>
    +    Reviewed-by: James Morris <jamorris@linux.microsoft.com>
         Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
     
      ## arch/alpha/kernel/binfmt_loader.c ##
6:  69fccdf33a87 ! 6:  01dbc34d75bf exec/binfmt_script: Don't modify bprm->buf and then return -ENOEXEC
    @@ Commit message
         has been take that the logic of the parsing code (short of replacing
         characters by '\0') remains the same.
     
    +    Link: https://lkml.kernel.org/r/874ksczru6.fsf_-_@x220.int.ebiederm.org
    +    Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
    +    Reviewed-by: Kees Cook <keescook@chromium.org>
         Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
     
      ## fs/binfmt_script.c ##
7:  30fe957c6dce ! 7:  6962a6b4de92 exec: Generic execfd support
    @@ Commit message
         In binfmt_misc the movement of fd_install into generic code means
         that it's special error exit path is no longer needed.
     
    +    Link: https://lkml.kernel.org/r/87y2poyd91.fsf_-_@x220.int.ebiederm.org
    +    Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
    +    Reviewed-by: Kees Cook <keescook@chromium.org>
         Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
     
      ## fs/binfmt_elf.c ##
    @@ fs/exec.c: int begin_new_exec(struct linux_binprm * bprm)
      	 */
      	set_mm_exe_file(bprm->mm, bprm->file);
      
    -+	/* If the binary is not readable than enforce mm->dumpable=0 */
    ++	/* If the binary is not readable then enforce mm->dumpable=0 */
      	would_dump(bprm, bprm->file);
     +	if (bprm->have_execfd)
     +		would_dump(bprm, bprm->executable);
8:  f0a27d0fde69 ! 8:  226ce5863881 exec: Remove recursion from search_binary_handler
    @@ Commit message
         reassignments of bprm->file moved to exec_binprm bprm->file can never
         be NULL in search_binary_handler.
     
    +    Link: https://lkml.kernel.org/r/87sgfwyd84.fsf_-_@x220.int.ebiederm.org
    +    Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
    +    Reviewed-by: Kees Cook <keescook@chromium.org>
         Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
     
      ## arch/alpha/kernel/binfmt_loader.c ##


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 0/8] exec: Control flow simplifications
  2020-05-20 22:12       ` Eric W. Biederman
@ 2020-05-20 23:43         ` Kees Cook
  2020-05-21 11:53           ` Eric W. Biederman
  0 siblings, 1 reply; 122+ messages in thread
From: Kees Cook @ 2020-05-20 23:43 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, Jann Horn,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton, Casey Schaufler,
	linux-security-module, James Morris, Serge E. Hallyn,
	Andy Lutomirski

On Wed, May 20, 2020 at 05:12:10PM -0500, Eric W. Biederman wrote:
> 
> I have pushed this out to:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git exec-next
> 
> I have collected up the acks and reviewed-by's, and fixed a couple of
> typos but that is it.

Awesome!

> If we need comment fixes or additional cleanups we can apply that on top
> of this series.   This way the code can sit in linux-next until the
> merge window opens.
> 
> Before I pushed this out I also tested this with Kees new test of
> binfmt_misc and did not find any problems.

Did this mean to say binfmt_script? It'd be nice to get a binfmt_misc
test too, though.

Thanks!

-Kees

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 0/8] exec: Control flow simplifications
  2020-05-20 23:43         ` Kees Cook
@ 2020-05-21 11:53           ` Eric W. Biederman
  0 siblings, 0 replies; 122+ messages in thread
From: Eric W. Biederman @ 2020-05-21 11:53 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, Jann Horn,
	Greg Ungerer, Rob Landley, Bernd Edlinger, linux-fsdevel,
	Al Viro, Alexey Dobriyan, Andrew Morton, Casey Schaufler,
	linux-security-module, James Morris, Serge E. Hallyn,
	Andy Lutomirski

Kees Cook <keescook@chromium.org> writes:

> On Wed, May 20, 2020 at 05:12:10PM -0500, Eric W. Biederman wrote:
>> 
>> I have pushed this out to:
>> 
>> git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git exec-next
>> 
>> I have collected up the acks and reviewed-by's, and fixed a couple of
>> typos but that is it.
>
> Awesome!
>
>> If we need comment fixes or additional cleanups we can apply that on top
>> of this series.   This way the code can sit in linux-next until the
>> merge window opens.
>> 
>> Before I pushed this out I also tested this with Kees new test of
>> binfmt_misc and did not find any problems.
>
> Did this mean to say binfmt_script? It'd be nice to get a binfmt_misc
> test too, though.

Yes.  Sorry.  I meant your binfmt_script test.

Eric

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 7/8] exec: Generic execfd support
  2020-05-20 16:05           ` Eric W. Biederman
@ 2020-05-21 22:50             ` Rob Landley
  2020-05-22  3:28               ` Eric W. Biederman
  0 siblings, 1 reply; 122+ messages in thread
From: Rob Landley @ 2020-05-21 22:50 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, Jann Horn,
	Kees Cook, Greg Ungerer, Bernd Edlinger, linux-fsdevel, Al Viro,
	Alexey Dobriyan, Andrew Morton, Casey Schaufler,
	linux-security-module, James Morris, Serge E. Hallyn,
	Andy Lutomirski

On 5/20/20 11:05 AM, Eric W. Biederman wrote:
> Rob Landley <rob@landley.net> writes:
> 
>> On 5/18/20 7:33 PM, Eric W. Biederman wrote:
>>>
>>> Most of the support for passing the file descriptor of an executable
>>> to an interpreter already lives in the generic code and in binfmt_elf.
>>> Rework the fields in binfmt_elf that deal with executable file
>>> descriptor passing to make executable file descriptor passing a first
>>> class concept.
>>
>> I was reading this to try to figure out how to do execve(NULL, argv[], envp) to
>> re-exec self after a vfork() in a chroot with no /proc, and hit the most trivial
>> quibble ever:
> 
> We have /proc/self/exe today.

Not when you first enter a container that's just created a new namespace, or
initramfs first launches PID 1 and runs a shell script to set up the environment
and your (subshell) and background& support only has vfork and not fork, or just
plain "somebody did a chroot"...

(Yes a nommu system with range registers can want _security_ without
_address_translation_. Strange but true! I haven't actually sat down to try to
implement nommu containers yet, but I've done worse things on many occasions.
Remember: the S in IoT stands for Security.)

> If I understand you correctly you would
> like to do the equivalent of 'execve("/proc/self/exe", argv[], envp[])'
> without having proc mounted.

Toybox would _like_ proc mounted, but can't assume it. I'm writing a new
bash-compatible shell with nommu support, which means in order to do subshell
and background tasks if (!CONFIG_FORK) I need to create a pipe pair, vfork(),
have the child exec itself to unblock the parent, and then read the context data
that just got discarded through the pipe from the parent. ("Wheee." And you can
quote me on that.)

I've implemented that already
(https://github.com/landley/toybox/blob/0.8.3/toys/pending/sh.c#L674 and reentry
is L2516, yeah it's a work in progress), but "exec self" requires /proc/self/exe
and since I gave up on getting
http://lkml.iu.edu/hypermail/linux/kernel/2005.1/09399.html in (I should
apologize to Randy but I just haven't got the spoons to face
https://landley.net/notes-2017.html#14-09-2017 again; three strikes and the
patch stays out) I need /init to be a shell script to set up an initramfs that's
made by pointing CONFIG_INITRAMFS_SOURCE at a directory that was made without
running the build as root, because there's no /dev/console and you can't mknod
as a non-root user.

Maybe instead of fixing CONFIG_DEVTMPFS_MOUNT to apply to initramfs I could
instead add a CONFIG_INITRAMFS_EXTRA=blah.txt to usr/{Kconfig,Makefile} to
append user-supplied extra lines to the end of the gen_initramfs.sh output and
make a /dev/console that way (kinda like genext2fs and mksquashfs), but getting
that in through the linux-kernel bureaucracy means consulting a 27 step
checklist supplementing the basic 17 step submission procedure (with
bibliographic references) explaining how to fill out the forms, perform the
validation steps, go through the proper channels, and get the appropriate series
of signatures and approvals, and I just haven't got the stomach for it anymore.
I was participating here as a hobbyist. Linux-kernel has aged into a rigid
bureaucracy. It's no fun anymore.

Which means any kernel patch I write I have to forward port regularly, sometimes
for a very long time. Heck, I gave linux-kernel three strikes at miniconfig
fifteen years ago now:

  http://lkml.iu.edu/hypermail/linux/kernel/0511.2/0479.html
  https://lwn.net/Articles/161086/
  https://lkml.org/lkml/2006/7/6/404

And was still maintaining it out of tree a decade later:

  https://landley.net/aboriginal/FAQ.html#dev_miniconfig
  https://github.com/landley/aboriginal/blob/master/more/miniconfig.sh

These days I've moved on to a microconfig format that mostly fits on one line,
ala the KCONF= stuff in toybox's built in:

  https://github.com/landley/toybox/blob/master/scripts/mkroot.sh#L136

For example, the User Mode Linux miniconfig from my ancient
https://landley.net/writing/docs/UML.html would translate to microconfig as:

  BINFMT_ELF,HOSTFS,LBD,BLK_DEV,BLK_DEV_LOOP,STDERR_CONSOLE,UNIX98_PTYS,EXT2_FS

The current kernel also needs "64BIT" because my host toolchain doesn't have the
-m32 headers installed, but then it builds fine ala:

make ARCH=um allnoconfig KCONFIG_ALLCONFIG=<(echo
BINFMT_ELF,HOSTFS,LBD,BLK_DEV,BLK_DEV_LOOP,STDERR_CONSOLE,UNIX98_PTYS,EXT2_FS,64BIT
| sed -E 's/([^,]*)(,|$)/CONFIG_\1=y\n/g')

Of course running the resulting ./linux says:

  Checking PROT_EXEC mmap in /dev/shm...Operation not permitted
  /dev/shm must be not mounted noexec

But *shrug*, Devuan did that not me. I haven't really used UML since QEMU
started working. Shouldn't the old "create file, map file, delete file" trick
stop flushing the data to backing store no matter where the file lives? I mean,
that trick dates back to the VAX, and we argued about it on the UML list a
decade ago (circa
https://sourceforge.net/p/user-mode-linux/mailman/message/14000710/) but...
fixing random things that are wrong with Linux is not my problem anymore. I'm
only in this thread because I'm cc'd.

Spending five years repeatedly posting perl removal patches and ending up with
intentional sabotage at the end from the guy who'd added perl in the first place
when the Gratuitous Build Dependency Removal patches finally got traction
(https://landley.net/notes-2013.html#28-03-2013) kinda put me off doing that again.

> The file descriptor is stored in mm->exe_file.
> Probably the most straight forward implementation is to allow
> execveat(AT_EXE_FILE, ...).

Cool, that works.

> You can look at binfmt_misc for how to reopen an open file descriptor.

Added to the todo heap.

Thanks,

Rob

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 7/8] exec: Generic execfd support
  2020-05-21 22:50             ` Rob Landley
@ 2020-05-22  3:28               ` Eric W. Biederman
  2020-05-22  4:51                 ` Rob Landley
  0 siblings, 1 reply; 122+ messages in thread
From: Eric W. Biederman @ 2020-05-22  3:28 UTC (permalink / raw)
  To: Rob Landley
  Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, Jann Horn,
	Kees Cook, Greg Ungerer, Bernd Edlinger, linux-fsdevel, Al Viro,
	Alexey Dobriyan, Andrew Morton, Casey Schaufler,
	linux-security-module, James Morris, Serge E. Hallyn,
	Andy Lutomirski


Rob Landley <rob@landley.net> writes:

> On 5/20/20 11:05 AM, Eric W. Biederman wrote:

> Toybox would _like_ proc mounted, but can't assume it. I'm writing a new
> bash-compatible shell with nommu support, which means in order to do subshell
> and background tasks if (!CONFIG_FORK) I need to create a pipe pair, vfork(),
> have the child exec itself to unblock the parent, and then read the context data
> that just got discarded through the pipe from the parent. ("Wheee." And you can
> quote me on that.)

Do you have clone(CLONE_VM) ?  If my quick skim of the kernel sources is
correct that should be the same as vfork except without causing the
parent to wait for you.  Which I think would remove the need to reexec
yourself.

>> The file descriptor is stored in mm->exe_file.
>> Probably the most straight forward implementation is to allow
>> execveat(AT_EXE_FILE, ...).
>
> Cool, that works.
>
>> You can look at binfmt_misc for how to reopen an open file descriptor.
>
> Added to the todo heap.

Yes I don't think it would be a lot of code.

I think you might be better served with clone(CLONE_VM) as it doesn't
block so you don't need to feed yourself your context over a pipe.

Eric

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 7/8] exec: Generic execfd support
  2020-05-22  3:28               ` Eric W. Biederman
@ 2020-05-22  4:51                 ` Rob Landley
  2020-05-22 13:35                   ` Eric W. Biederman
  0 siblings, 1 reply; 122+ messages in thread
From: Rob Landley @ 2020-05-22  4:51 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, Jann Horn,
	Kees Cook, Greg Ungerer, Bernd Edlinger, linux-fsdevel, Al Viro,
	Alexey Dobriyan, Andrew Morton, Casey Schaufler,
	linux-security-module, James Morris, Serge E. Hallyn,
	Andy Lutomirski

On 5/21/20 10:28 PM, Eric W. Biederman wrote:
> 
> Rob Landley <rob@landley.net> writes:
> 
>> On 5/20/20 11:05 AM, Eric W. Biederman wrote:
> 
>> Toybox would _like_ proc mounted, but can't assume it. I'm writing a new
>> bash-compatible shell with nommu support, which means in order to do subshell
>> and background tasks if (!CONFIG_FORK) I need to create a pipe pair, vfork(),
>> have the child exec itself to unblock the parent, and then read the context data
>> that just got discarded through the pipe from the parent. ("Wheee." And you can
>> quote me on that.)
> 
> Do you have clone(CLONE_VM) ?  If my quick skim of the kernel sources is
> correct that should be the same as vfork except without causing the
> parent to wait for you.  Which I think would remove the need to reexec
> yourself.

As with perpetual motion, that only seems like it would work if you don't
understand what's going on.

A nommu system uses physical addresses, not virtual ones, so every process sees
the same addresses. So if I allocate a new block of memory and memcpy the
contents of the old one into the new one, any pointers in the copy point back
into the ORIGINAL block of memory. Trying to adjust the pointers in the copy is
the exact same problem as trying to do garbage collection in C: it's an AI
complete problem.

Any attempt to "implement a full fork" on nommu hits this problem: copying an
existing mapping to a new address range means any address values in the new
mapping point into the OLD mapping. Things like fdpic fix this up at exec time
(traversing elf tables and relocating), but not at runtime. If you can solve the
"relocate at runtime all addresses within an existing mapping, and all other
mappings that might point to this mapping, including local variables on the
stack that point to a structure member or halfway into a string rather than the
start of an allocation, without adjusting unrelated values coincidentally within
RANGE of a mapping" problem, THEN you can fork on a nommu system.

What vfork() does is pause the parent and have the child continue AS the parent
for a bit (with the system call returning 0). The child starts with all the same
memory mappings the parent has (usually not even a new stack). The child has a
new PID and new resources like its own file descriptor table so close() and
open() don't affect the parent, but if you change a global that's visible to the
parent when it resumes (ant often local variables too: don't return from the
function that called vfork() because if you DON'T have a new stack it'll stomp
the return address the parent needs when IT does it). If the child calls
malloc() the parent needs to free it because it's same heap (because same
mapping of the same physical memory).

Then when the child is ready to discard all those mappings (due to calling
either execve() or _exit(), those are the only two options), the parent resumes
from where it left off with the PID of the child as the system call return value.

The reason the child pauses the parent is so only one process is ever using
those mappings at a given time. Otherwise they're acting like threads without
locking, and usually both are sharing a stack.

P.S. You can use threads _instead_ of fork for some stuff on nommu, but that's
its own can of worms. You still need to vfork() when you do create a child
process you're going to exec, so it doesn't go away, you're just requiring
multiple techniques simultaneously to handle a special case.

P.P.S. vfork() is useful on mmu systems to solve the "don't fork from a thread"
problem. You can vfork() from a thread cheaply and reliably and it only pauses
the one thread you forked from, not every thread in the whole process. If you
fork() from a heavily threadded process you can cause a multi-milisecond latency
spike because even with an mmu the copy on write "keep track of what's shared by
what" generally can't handle the "threads AND processes sharing mappings" case,
so it just gives up and copies it all at fork time, in one go, holding a big
lock while doing so. This causes a large latency spike which vfork() avoids.
(And can cause a large wasteful allocation and memory dirtying which is
immediately freed.)

>>> The file descriptor is stored in mm->exe_file.
>>> Probably the most straight forward implementation is to allow
>>> execveat(AT_EXE_FILE, ...).
>>
>> Cool, that works.
>>
>>> You can look at binfmt_misc for how to reopen an open file descriptor.
>>
>> Added to the todo heap.
> 
> Yes I don't think it would be a lot of code.
> 
> I think you might be better served with clone(CLONE_VM) as it doesn't
> block so you don't need to feed yourself your context over a pipe.

Except that doesn't fix it.

Yes I could use threads instead, but the cure is worse than the disease and the
result is your shell background processes are threads rather than independent
processes (is $$ reporting PID or TID, I really don't want to go there).

> Eric

Rob

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: [PATCH v2 7/8] exec: Generic execfd support
  2020-05-22  4:51                 ` Rob Landley
@ 2020-05-22 13:35                   ` Eric W. Biederman
  0 siblings, 0 replies; 122+ messages in thread
From: Eric W. Biederman @ 2020-05-22 13:35 UTC (permalink / raw)
  To: Rob Landley
  Cc: linux-kernel, Linus Torvalds, Oleg Nesterov, Jann Horn,
	Kees Cook, Greg Ungerer, Bernd Edlinger, linux-fsdevel, Al Viro,
	Alexey Dobriyan, Andrew Morton, Casey Schaufler,
	linux-security-module, James Morris, Serge E. Hallyn,
	Andy Lutomirski

Rob Landley <rob@landley.net> writes:

> On 5/21/20 10:28 PM, Eric W. Biederman wrote:
>> 
>> Rob Landley <rob@landley.net> writes:
>> 
>>> On 5/20/20 11:05 AM, Eric W. Biederman wrote:
>> 
>>>> The file descriptor is stored in mm->exe_file.
>>>> Probably the most straight forward implementation is to allow
>>>> execveat(AT_EXE_FILE, ...).
>>>
>>> Cool, that works.
>>>
>>>> You can look at binfmt_misc for how to reopen an open file descriptor.
>>>
>>> Added to the todo heap.
>> 
>> Yes I don't think it would be a lot of code.
>> 
>> I think you might be better served with clone(CLONE_VM) as it doesn't
>> block so you don't need to feed yourself your context over a pipe.
>
> Except that doesn't fix it.
>
> Yes I could use threads instead, but the cure is worse than the disease and the
> result is your shell background processes are threads rather than independent
> processes (is $$ reporting PID or TID, I really don't want to go
> there).

I was just suggesting clone(CLONE_VM) because it creates a thread in a
separate process.  Which on nommu sounds like it could be almost exactly
what you want.

If you need the separate copies of all of your global variables etc,
re-exec'ing your self could be the easier way to go.

Eric


^ permalink raw reply	[flat|nested] 122+ messages in thread

end of thread, back to index

Thread overview: 122+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-05 19:39 exec: Promised cleanups after introducing exec_update_mutex Eric W. Biederman
2020-05-05 19:41 ` [PATCH 1/7] binfmt: Move install_exec_creds after setup_new_exec to match binfmt_elf Eric W. Biederman
2020-05-05 20:45   ` Kees Cook
2020-05-06 12:42   ` Greg Ungerer
2020-05-06 12:56     ` Eric W. Biederman
2020-05-05 19:41 ` [PATCH 2/7] exec: Make unlocking exec_update_mutex explict Eric W. Biederman
2020-05-05 20:46   ` Kees Cook
2020-05-05 19:42 ` [PATCH 3/7] exec: Rename the flag called_exec_mmap point_of_no_return Eric W. Biederman
2020-05-05 20:49   ` Kees Cook
2020-05-05 19:43 ` [PATCH 4/7] exec: Merge install_exec_creds into setup_new_exec Eric W. Biederman
2020-05-05 20:50   ` Kees Cook
2020-05-05 19:44 ` [PATCH 5/7] exec: In setup_new_exec cache current in the local variable me Eric W. Biederman
2020-05-05 20:51   ` Kees Cook
2020-05-05 19:45 ` [PATCH 6/7] exec: Move most of setup_new_exec into flush_old_exec Eric W. Biederman
2020-05-05 21:29   ` Kees Cook
2020-05-06 14:57     ` Eric W. Biederman
2020-05-06 15:30       ` Kees Cook
2020-05-07 19:51         ` Eric W. Biederman
2020-05-07 21:51     ` Eric W. Biederman
2020-05-08  5:50       ` Kees Cook
2020-05-05 19:46 ` [PATCH 7/7] exec: Rename flush_old_exec begin_new_exec Eric W. Biederman
2020-05-05 21:30   ` Kees Cook
2020-05-06 12:41 ` exec: Promised cleanups after introducing exec_update_mutex Greg Ungerer
2020-05-08 18:43 ` [PATCH 0/6] exec: Trivial cleanups for exec Eric W. Biederman
2020-05-08 18:44   ` [PATCH 1/6] exec: Move the comment from above de_thread to above unshare_sighand Eric W. Biederman
2020-05-09  5:02     ` Kees Cook
2020-05-08 18:44   ` [PATCH 2/6] exec: Fix spelling of search_binary_handler in a comment Eric W. Biederman
2020-05-09  5:03     ` Kees Cook
2020-05-08 18:45   ` [PATCH 3/6] exec: Stop open coding mutex_lock_killable of cred_guard_mutex Eric W. Biederman
2020-05-09  5:08     ` Kees Cook
2020-05-09 19:18     ` Linus Torvalds
2020-05-09 19:57       ` Eric W. Biederman
2020-05-10 20:33       ` Kees Cook
2020-05-08 18:45   ` [PATCH 4/6] exec: Run sync_mm_rss before taking exec_update_mutex Eric W. Biederman
2020-05-09  5:15     ` Kees Cook
2020-05-09 14:17       ` Eric W. Biederman
2020-05-08 18:47   ` [PATCH 5/6] exec: Move handling of the point of no return to the top level Eric W. Biederman
2020-05-09  5:31     ` Kees Cook
2020-05-09 13:39       ` Eric W. Biederman
2020-05-08 18:48   ` [PATCH 6/6] exec: Set the point of no return sooner Eric W. Biederman
2020-05-09  5:33     ` Kees Cook
2020-05-09 19:40   ` [PATCH 0/5] exec: Control flow simplifications Eric W. Biederman
2020-05-09 19:40     ` [PATCH 1/5] exec: Call cap_bprm_set_creds directly from prepare_binprm Eric W. Biederman
2020-05-09 20:04       ` Linus Torvalds
2020-05-09 19:41     ` [PATCH 2/5] exec: Directly call security_bprm_set_creds from __do_execve_file Eric W. Biederman
2020-05-09 20:07       ` Linus Torvalds
2020-05-09 20:12         ` Eric W. Biederman
2020-05-09 20:19           ` Linus Torvalds
2020-05-11  3:15       ` Kees Cook
2020-05-11 16:52         ` Eric W. Biederman
2020-05-11 21:18           ` Kees Cook
2020-05-09 19:41     ` [PATCH 3/5] exec: Remove recursion from search_binary_handler Eric W. Biederman
2020-05-09 20:16       ` Linus Torvalds
2020-05-10  4:22       ` Tetsuo Handa
2020-05-10 19:38         ` Linus Torvalds
2020-05-11 14:33           ` Eric W. Biederman
2020-05-11 19:10             ` Rob Landley
2020-05-13 21:59               ` Eric W. Biederman
2020-05-14 18:46                 ` Rob Landley
2020-05-11 21:55             ` Kees Cook
2020-05-12 18:42               ` Eric W. Biederman
2020-05-12 19:25                 ` Kees Cook
2020-05-12 20:31                   ` Eric W. Biederman
2020-05-12 23:08                     ` Kees Cook
2020-05-12 23:47                       ` Kees Cook
2020-05-12 23:51                         ` Kees Cook
2020-05-14 14:56                           ` Eric W. Biederman
2020-05-14 16:56                             ` Casey Schaufler
2020-05-14 17:02                               ` Eric W. Biederman
2020-05-13  0:20                 ` Linus Torvalds
2020-05-13  2:39                   ` Rob Landley
2020-05-13 19:51                     ` Linus Torvalds
2020-05-14 16:49                   ` Eric W. Biederman
2020-05-09 19:42     ` [PATCH 4/5] exec: Allow load_misc_binary to call prepare_binfmt unconditionally Eric W. Biederman
2020-05-11 22:09       ` Kees Cook
2020-05-09 19:42     ` [PATCH 5/5] exec: Move the call of prepare_binprm into search_binary_handler Eric W. Biederman
2020-05-11 22:24       ` Kees Cook
2020-05-19  0:29     ` [PATCH v2 0/8] exec: Control flow simplifications Eric W. Biederman
2020-05-19  0:29       ` [PATCH v2 1/8] exec: Teach prepare_exec_creds how exec treats uids & gids Eric W. Biederman
2020-05-19 18:03         ` Kees Cook
2020-05-19 18:28           ` Linus Torvalds
2020-05-19 18:57             ` Eric W. Biederman
2020-05-19  0:30       ` [PATCH v2 2/8] exec: Factor security_bprm_creds_for_exec out of security_bprm_set_creds Eric W. Biederman
2020-05-19 15:34         ` Casey Schaufler
2020-05-19 18:10         ` Kees Cook
2020-05-19 21:28           ` James Morris
2020-05-19  0:31       ` [PATCH v2 3/8] exec: Convert security_bprm_set_creds into security_bprm_repopulate_creds Eric W. Biederman
2020-05-19 18:21         ` Kees Cook
2020-05-19 19:03           ` Eric W. Biederman
2020-05-19 19:14             ` Kees Cook
2020-05-20 20:22               ` Eric W. Biederman
2020-05-20 20:53                 ` Kees Cook
2020-05-19 21:52         ` James Morris
2020-05-20 12:40           ` Eric W. Biederman
2020-05-19  0:31       ` [PATCH v2 4/8] exec: Allow load_misc_binary to call prepare_binfmt unconditionally Eric W. Biederman
2020-05-19 18:27         ` Kees Cook
2020-05-19 19:08           ` Eric W. Biederman
2020-05-19 19:17             ` Kees Cook
2020-05-19  0:32       ` [PATCH v2 5/8] exec: Move the call of prepare_binprm into search_binary_handler Eric W. Biederman
2020-05-19 18:27         ` Kees Cook
2020-05-19 21:30         ` James Morris
2020-05-19  0:33       ` [PATCH v2 6/8] exec/binfmt_script: Don't modify bprm->buf and then return -ENOEXEC Eric W. Biederman
2020-05-19 19:08         ` Kees Cook
2020-05-19 19:19           ` Eric W. Biederman
2020-05-19  0:33       ` [PATCH v2 7/8] exec: Generic execfd support Eric W. Biederman
2020-05-19 19:46         ` Kees Cook
2020-05-19 19:54           ` Linus Torvalds
2020-05-19 20:20             ` Eric W. Biederman
2020-05-19 21:59         ` Rob Landley
2020-05-20 16:05           ` Eric W. Biederman
2020-05-21 22:50             ` Rob Landley
2020-05-22  3:28               ` Eric W. Biederman
2020-05-22  4:51                 ` Rob Landley
2020-05-22 13:35                   ` Eric W. Biederman
2020-05-19  0:34       ` [PATCH v2 8/8] exec: Remove recursion from search_binary_handler Eric W. Biederman
2020-05-19 20:37         ` Kees Cook
2020-05-19  1:25       ` [PATCH v2 0/8] exec: Control flow simplifications Linus Torvalds
2020-05-19 21:55       ` Kees Cook
2020-05-20 13:02         ` Eric W. Biederman
2020-05-20 22:12       ` Eric W. Biederman
2020-05-20 23:43         ` Kees Cook
2020-05-21 11:53           ` Eric W. Biederman

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org
	public-inbox-index linux-fsdevel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git