linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Jann Horn <jann@thejh.net>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Roland McGrath <roland@hack.frob.com>,
	John Johansen <john.johansen@canonical.com>,
	James Morris <james.l.morris@oracle.com>,
	"Serge E. Hallyn" <serge@hallyn.com>,
	Paul Moore <aul@paul-moore.com>,
	Stephen Smalley <sds@tycho.nsa.gov>,
	Eric Paris <eparis@parisplace.org>,
	Casey Schaufler <casey@schaufler-ca.com>,
	Kees Cook <keescook@chromium.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Janis Danisevskis <jdanis@google.com>,
	Seth Forshee <seth.forshee@canonical.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Benjamin LaHaise <bcrl@kvack.org>,
	Ben Hutchings <ben@decadent.org.uk>,
	Andy Lutomirski <luto@amacapital.net>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Krister Johansen <kjlx@templeofstupid.com>,
	linux-fsdevel@vger.kernel.org,
	linux-security-module@vger.kernel.org, security@kernel.org
Subject: Re: [PATCH v3 1/8] exec: introduce cred_guard_light
Date: Sat, 5 Nov 2016 15:56:23 +0100	[thread overview]
Message-ID: <20161105145623.GA21207@redhat.com> (raw)
In-Reply-To: <20161104184505.GA21320@redhat.com>

On 11/04, Oleg Nesterov wrote:
>
> On 11/04, Oleg Nesterov wrote:
> >
> > On 11/04, Eric W. Biederman wrote:
> > >
> > > The following mostly correct patch modifies zap_other_threads in
> > > the case of a de_thread to not wait for zombies to be reaped.  The only
> > > case that cares is ptrace (as threads are self reaping).  So I don't
> > > think this will cause any problems except removing the strace -f race.
> >
> > From my previous email:
> >
> > 	So the only plan I currently have is change de_thread() to wait until
> > 	other threads pass exit_notify() or even exit_signals(), but I don't
> > 	like this.
> >
> > And yes, I don't like this, but perhaps this is what we should do.
> >
> > The patch is incomplete and racy (afaics), and the SIGNAL_GROUP_EXIT
> > checks doesn't look right, but off course technically this change should
> > be simple enough.
> >
> > But not that simple. Just for example, the exiting sub-threads should
> > not run with ->group_leader pointing to nowhere, in case it was reaped
> > by de_thread.
>
> Not to mention other potential problems outside of ptrace/exec. For example
> userns_install() can fail after mt-exec even without ptrace, simply because
> thread_group_empty() can be false. Sure, easy to fix, and probably _install()
> should use signal->live anyway, but still.
>
> And I didn't mention the fun with sighand unsharing. We simply can't do this
> until all sub-threads go away. IOW, your patch breaks the usage of ->siglock.
> The execing thread and the zombie threads will use different locks to, say,
> remove the task from thread-group. Again, this is fixable, but not that
> simple.
>
> > And we have another problem with PTRACE_EVENT_EXIT which can lead to the
> > same deadlock. Unfortunately, the semantics of PTRACE_EVENT_EXIT was never
> > defined. But this change will add the user-visible change.
> >
> > And if we add the user-visible changes, then perhaps we could simply untrace
> > the traced sub-threads on exec. This change is simple, we do not even need
> > to touch exec/de_thread, we could just change exit_notify() to ignore ->ptrace
> > if exec is in progress. But I'm afraid we can't do this.

So I was thinking about something like below. Untested, probably buggy/incomplete
too, but hopefully can work.

flush_old_exec() calls the new kill_sub_threads() helper which waits until
all the sub-threads pass exit_notify().

de_thread() is called after install_exec_creds(), it is simplified and waits
for thread_group_empty() without cred_guard_mutex.

But again, I do not really like this, and we need to do something with
PTRACE_EVENT_EXIT anyway, this needs another/separate change. User-visible.

And I disagree that this has nothing to do with cred_guard_mutex. And in any
case we should narrow its scope in do_execve() path. Why do we take it so early?
Why do we need to do, say, copy_strings() with this lock held? The original
motivation for this has gone, acct_arg_size() can work just fine even if
multiple threads call sys_execve().

I'll try to discuss the possible changes in LSM hooks with Jann, I still think
that this is what we actually need to do. At least try to do, possibly this is
too complicated.

Oleg.

 fs/binfmt_elf.c         |   6 ++-
 fs/exec.c               | 108 ++++++++++++++++++++++++------------------------
 include/linux/binfmts.h |   1 +
 kernel/exit.c           |   5 ++-
 kernel/signal.c         |   3 +-
 5 files changed, 65 insertions(+), 58 deletions(-)


--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -855,13 +855,17 @@ static int load_elf_binary(struct linux_binprm *bprm)
 	setup_new_exec(bprm);
 	install_exec_creds(bprm);
 
+	retval = de_thread(current);
+	if (retval)
+		goto out_free_dentry;
+
 	/* Do this so that we can load the interpreter, if need be.  We will
 	   change some of these later */
 	retval = setup_arg_pages(bprm, randomize_stack_top(STACK_TOP),
 				 executable_stack);
 	if (retval < 0)
 		goto out_free_dentry;
-	
+
 	current->mm->start_stack = bprm->p;
 
 	/* Now we do a little grungy work by mmapping the ELF image into
diff --git a/fs/exec.c b/fs/exec.c
index 4e497b9..7246b9f 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1036,13 +1036,59 @@ static int exec_mmap(struct mm_struct *mm)
 	return 0;
 }
 
+static int wait_for_notify_count(struct task_struct *tsk, struct signal_struct *sig)
+{
+	for (;;) {
+		if (unlikely(__fatal_signal_pending(tsk)))
+			goto killed;
+		set_current_state(TASK_KILLABLE);
+		if (!sig->notify_count)
+			break;
+		schedule();
+	}
+	__set_current_state(TASK_RUNNING);
+	return 0;
+
+killed:
+	/* protects against exit_notify() and __exit_signal() */
+	read_lock(&tasklist_lock);
+	sig->group_exit_task = NULL;
+	sig->notify_count = 0;
+	read_unlock(&tasklist_lock);
+	return -EINTR;
+}
+
+static int kill_sub_threads(struct task_struct *tsk)
+{
+	struct signal_struct *sig = tsk->signal;
+	int err = -EINTR;
+
+	if (thread_group_empty(tsk))
+		return 0;
+
+	read_lock(&tasklist_lock);
+	spin_lock_irq(&tsk->sighand->siglock);
+	if (!signal_group_exit(sig)) {
+		sig->group_exit_task = tsk;
+		sig->notify_count = -zap_other_threads(tsk);
+		err = 0;
+	}
+	spin_unlock_irq(&tsk->sighand->siglock);
+	read_unlock(&tasklist_lock);
+
+	if (!err)
+		err = wait_for_notify_count(tsk, sig);
+	return err;
+
+}
+
 /*
  * This function makes sure the current process has its own signal table,
  * so that flush_signal_handlers can later reset the handlers without
  * disturbing other processes.  (Other processes might share the signal
  * table via the CLONE_SIGHAND option to clone().)
  */
-static int de_thread(struct task_struct *tsk)
+int de_thread(struct task_struct *tsk)
 {
 	struct signal_struct *sig = tsk->signal;
 	struct sighand_struct *oldsighand = tsk->sighand;
@@ -1051,34 +1097,15 @@ static int de_thread(struct task_struct *tsk)
 	if (thread_group_empty(tsk))
 		goto no_thread_group;
 
-	/*
-	 * Kill all other threads in the thread group.
-	 */
 	spin_lock_irq(lock);
-	if (signal_group_exit(sig)) {
-		/*
-		 * Another group action in progress, just
-		 * return so that the signal is processed.
-		 */
-		spin_unlock_irq(lock);
-		return -EAGAIN;
-	}
-
-	sig->group_exit_task = tsk;
-	sig->notify_count = zap_other_threads(tsk);
+	sig->notify_count = sig->nr_threads;
 	if (!thread_group_leader(tsk))
 		sig->notify_count--;
-
-	while (sig->notify_count) {
-		__set_current_state(TASK_KILLABLE);
-		spin_unlock_irq(lock);
-		schedule();
-		if (unlikely(__fatal_signal_pending(tsk)))
-			goto killed;
-		spin_lock_irq(lock);
-	}
 	spin_unlock_irq(lock);
 
+	if (wait_for_notify_count(tsk, sig))
+		return -EINTR;
+
 	/*
 	 * At this point all other threads have exited, all we have to
 	 * do is to wait for the thread group leader to become inactive,
@@ -1087,24 +1114,8 @@ static int de_thread(struct task_struct *tsk)
 	if (!thread_group_leader(tsk)) {
 		struct task_struct *leader = tsk->group_leader;
 
-		for (;;) {
-			threadgroup_change_begin(tsk);
-			write_lock_irq(&tasklist_lock);
-			/*
-			 * Do this under tasklist_lock to ensure that
-			 * exit_notify() can't miss ->group_exit_task
-			 */
-			sig->notify_count = -1;
-			if (likely(leader->exit_state))
-				break;
-			__set_current_state(TASK_KILLABLE);
-			write_unlock_irq(&tasklist_lock);
-			threadgroup_change_end(tsk);
-			schedule();
-			if (unlikely(__fatal_signal_pending(tsk)))
-				goto killed;
-		}
-
+		threadgroup_change_begin(tsk);
+		write_lock_irq(&tasklist_lock);
 		/*
 		 * The only record we have of the real-time age of a
 		 * process, regardless of execs it's done, is start_time.
@@ -1162,10 +1173,9 @@ static int de_thread(struct task_struct *tsk)
 		release_task(leader);
 	}
 
+no_thread_group:
 	sig->group_exit_task = NULL;
 	sig->notify_count = 0;
-
-no_thread_group:
 	/* we have changed execution domain */
 	tsk->exit_signal = SIGCHLD;
 
@@ -1197,14 +1207,6 @@ static int de_thread(struct task_struct *tsk)
 
 	BUG_ON(!thread_group_leader(tsk));
 	return 0;
-
-killed:
-	/* protects against exit_notify() and __exit_signal() */
-	read_lock(&tasklist_lock);
-	sig->group_exit_task = NULL;
-	sig->notify_count = 0;
-	read_unlock(&tasklist_lock);
-	return -EAGAIN;
 }
 
 char *get_task_comm(char *buf, struct task_struct *tsk)
@@ -1239,7 +1241,7 @@ int flush_old_exec(struct linux_binprm * bprm)
 	 * Make sure we have a private signal table and that
 	 * we are unassociated from the previous thread group.
 	 */
-	retval = de_thread(current);
+	retval = kill_sub_threads(current);
 	if (retval)
 		goto out;
 
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -101,6 +101,7 @@ extern int __must_check remove_arg_zero(struct linux_binprm *);
 extern int search_binary_handler(struct linux_binprm *);
 extern int flush_old_exec(struct linux_binprm * bprm);
 extern void setup_new_exec(struct linux_binprm * bprm);
+extern int de_thread(struct task_struct *tsk);
 extern void would_dump(struct linux_binprm *, struct file *);
 
 extern int suid_dumpable;
diff --git a/kernel/exit.c b/kernel/exit.c
index 9d68c45..f3dd46d 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -690,8 +690,9 @@ static void exit_notify(struct task_struct *tsk, int group_dead)
 	if (tsk->exit_state == EXIT_DEAD)
 		list_add(&tsk->ptrace_entry, &dead);
 
-	/* mt-exec, de_thread() is waiting for group leader */
-	if (unlikely(tsk->signal->notify_count < 0))
+	/* mt-exec, kill_sub_threads() is waiting for group exit */
+	if (unlikely(tsk->signal->notify_count < 0) &&
+	    !++tsk->signal->notify_count)
 		wake_up_process(tsk->signal->group_exit_task);
 	write_unlock_irq(&tasklist_lock);
 
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1194,13 +1194,12 @@ int zap_other_threads(struct task_struct *p)
 
 	while_each_thread(p, t) {
 		task_clear_jobctl_pending(t, JOBCTL_PENDING_MASK);
-		count++;
-
 		/* Don't bother with already dead threads */
 		if (t->exit_state)
 			continue;
 		sigaddset(&t->pending.signal, SIGKILL);
 		signal_wake_up(t, 1);
+		count++;
 	}
 
 	return count;


  reply	other threads:[~2016-11-05 14:56 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-30 21:46 [PATCH v3 0/8] Various fixes related to ptrace_may_access() Jann Horn
2016-10-30 21:46 ` [PATCH v3 1/8] exec: introduce cred_guard_light Jann Horn
2016-11-02 18:18   ` Oleg Nesterov
2016-11-02 20:50     ` Jann Horn
2016-11-02 21:38       ` Ben Hutchings
2016-11-02 21:54         ` Jann Horn
2016-11-03 18:12       ` Oleg Nesterov
2016-11-03 21:17         ` Jann Horn
2016-11-04 13:26         ` Eric W. Biederman
2016-11-04 15:00           ` Eric W. Biederman
2016-11-04 18:04             ` Oleg Nesterov
2016-11-04 18:45               ` Oleg Nesterov
2016-11-05 14:56                 ` Oleg Nesterov [this message]
2016-11-09  0:34                   ` Eric W. Biederman
2016-11-16 20:03                   ` Eric W. Biederman
2016-11-08 22:02                 ` Kees Cook
2016-11-08 22:46                   ` Eric W. Biederman
2016-11-08 22:56                     ` Benjamin LaHaise
2016-11-08 23:33                       ` Eric W. Biederman
2016-10-30 21:46 ` [PATCH v3 2/8] exec: add privunit to task_struct Jann Horn
2016-10-30 21:46 ` [PATCH v3 3/8] proc: use open()-time creds for ptrace checks Jann Horn
2016-10-30 21:46 ` [PATCH v3 4/8] futex: don't leak robust_list pointer Jann Horn
2016-10-30 21:46 ` [PATCH v3 5/8] proc: lock properly in ptrace_may_access callers Jann Horn
2016-10-30 21:46 ` [PATCH v3 6/8] fs/proc: fix attr access check Jann Horn
2016-10-30 21:46 ` [PATCH v3 7/8] proc: fix timerslack_ns handling Jann Horn
2016-10-30 21:46 ` [PATCH v3 8/8] Documentation: add security/ptrace_checks.txt Jann Horn
2016-11-01 23:57 ` [PATCH v3 0/8] Various fixes related to ptrace_may_access() Linus Torvalds
2016-11-02 18:38   ` Oleg Nesterov
2016-11-02 21:40     ` Jann Horn
2016-11-03 19:09   ` Andrew Morton
2016-11-03 20:01     ` Jann Horn
2016-11-04  0:57   ` James Morris

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161105145623.GA21207@redhat.com \
    --to=oleg@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=aul@paul-moore.com \
    --cc=bcrl@kvack.org \
    --cc=ben@decadent.org.uk \
    --cc=casey@schaufler-ca.com \
    --cc=ebiederm@xmission.com \
    --cc=eparis@parisplace.org \
    --cc=james.l.morris@oracle.com \
    --cc=jann@thejh.net \
    --cc=jdanis@google.com \
    --cc=john.johansen@canonical.com \
    --cc=keescook@chromium.org \
    --cc=kjlx@templeofstupid.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=roland@hack.frob.com \
    --cc=sds@tycho.nsa.gov \
    --cc=security@kernel.org \
    --cc=serge@hallyn.com \
    --cc=seth.forshee@canonical.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).