linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>,
	Michael Schmitz <schmitzmic@gmail.com>,
	linux-arch <linux-arch@vger.kernel.org>,
	Jens Axboe <axboe@kernel.dk>, Oleg Nesterov <oleg@redhat.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Richard Henderson <rth@twiddle.net>,
	Ivan Kokshaysky <ink@jurassic.park.msu.ru>,
	Matt Turner <mattst88@gmail.com>,
	alpha <linux-alpha@vger.kernel.org>,
	Geert Uytterhoeven <geert@linux-m68k.org>,
	linux-m68k <linux-m68k@lists.linux-m68k.org>,
	Arnd Bergmann <arnd@kernel.org>,
	Ley Foon Tan <ley.foon.tan@intel.com>, Tejun Heo <tj@kernel.org>,
	Kees Cook <keescook@chromium.org>
Subject: Re: Kernel stack read with PTRACE_EVENT_EXIT and io_uring threads
Date: Tue, 22 Jun 2021 15:52:18 -0500	[thread overview]
Message-ID: <87tulpbp19.fsf@disp2133> (raw)
In-Reply-To: <CAHk-=wh4_iMRmWcao6a8kCvR0Hhdrz+M9L+q4Bfcwx9E9D0huw@mail.gmail.com> (Linus Torvalds's message of "Mon, 21 Jun 2021 16:15:37 -0700")

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Mon, Jun 21, 2021 at 1:04 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>>
>> For other ptrace_event calls I am playing with seeing if I can split
>> them in two.  Like sending a signal.  So that we can have perform all
>> of the work in get_signal.
>
> That sounds like the right model, but I don't think it works.
> Particularly not for exit(). The second phase will never happen.

Playing with it some more I think I have everything working working
except for PTRACE_EVENT_SECCOMP (which can stay ptrace_event) and
group_exit(2).

Basically in exit sending yourself a signal and then calling do_exit
from the signal handler is not unreasonable, as exit is an ordinary
system call.

I haven't seen anything that ``knows'' that exit(2) or exit_group(2)
will never return and adds a special case in the system call table for
that case.

The complications of exit_group(2) are roughly those of moving
ptrace_event out of do_exit.   They look doable and I am going to look
at that next.

This is not to say that this is the most maintainable way or that we
necessarily want to implement things this way, but I need to look and
see what it looks like.

For purposes of discussion this is my current draft implementation.

diff --git a/include/linux/sched.h b/include/linux/sched.h
index d2c881384517..891812d32b90 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1087,6 +1087,7 @@ struct task_struct {
 	struct capture_control		*capture_control;
 #endif
 	/* Ptrace state: */
+	int				stop_code;
 	unsigned long			ptrace_message;
 	kernel_siginfo_t		*last_siginfo;
 
diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index b5ebf6c01292..33c50119b193 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -164,18 +164,29 @@ static inline void ptrace_event(int event, unsigned long message)
 	}
 }
 
+static inline bool ptrace_post_event(int event, unsigned long message)
+{
+	bool posted = false;
+	if (unlikely(ptrace_event_enabled(current, event))) {
+		current->ptrace_message = message;
+		current->stop_code = (event << 8) | SIGTRAP;
+		set_tsk_thread_flag(current, TIF_SIGPENDING);
+		posted = true;
+	} else if (event == PTRACE_EVENT_EXEC) {
+		/* legacy EXEC report via SIGTRAP */
+		if ((current->ptrace & (PT_PTRACED|PT_SEIZED)) == PT_PTRACED)
+			send_sig(SIGTRAP, current, 0);
+	}
+	return posted;
+}
+
 /**
- * ptrace_event_pid - possibly stop for a ptrace event notification
- * @event:	%PTRACE_EVENT_* value to report
- * @pid:	process identifier for %PTRACE_GETEVENTMSG to return
- *
- * Check whether @event is enabled and, if so, report @event and @pid
- * to the ptrace parent.  @pid is reported as the pid_t seen from the
- * ptrace parent's pid namespace.
+ * pid_parent_nr - Return the number the parent knows this pid as
+ * @pid:	The struct pid whose numerical value we want
  *
  * Called without locks.
  */
-static inline void ptrace_event_pid(int event, struct pid *pid)
+static inline pid_t pid_parent_nr(struct pid *pid)
 {
 	/*
 	 * FIXME: There's a potential race if a ptracer in a different pid
@@ -183,16 +194,15 @@ static inline void ptrace_event_pid(int event, struct pid *pid)
 	 * when we acquire tasklist_lock in ptrace_stop().  If this happens,
 	 * the ptracer will get a bogus pid from PTRACE_GETEVENTMSG.
 	 */
-	unsigned long message = 0;
+	pid_t nr = 0;
 	struct pid_namespace *ns;
 
 	rcu_read_lock();
 	ns = task_active_pid_ns(rcu_dereference(current->parent));
 	if (ns)
-		message = pid_nr_ns(pid, ns);
+		nr = pid_nr_ns(pid, ns);
 	rcu_read_unlock();
-
-	ptrace_event(event, message);
+	return nr;
 }
 
 /**
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index e24b1fe348e3..a2eac3831369 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -97,6 +97,8 @@ extern void exit_mm_release(struct task_struct *, struct mm_struct *);
 /* Remove the current tasks stale references to the old mm_struct on exec() */
 extern void exec_mm_release(struct task_struct *, struct mm_struct *);
 
+extern int wait_for_vfork_done(struct task_struct *child, struct completion *vfork);
+
 #ifdef CONFIG_MEMCG
 extern void mm_update_next_owner(struct mm_struct *mm);
 #else
diff --git a/fs/exec.c b/fs/exec.c
index 18594f11c31f..bb4751d84e2d 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1781,7 +1781,7 @@ static int exec_binprm(struct linux_binprm *bprm)
 
 	audit_bprm(bprm);
 	trace_sched_process_exec(current, old_pid, bprm);
-	ptrace_event(PTRACE_EVENT_EXEC, old_vpid);
+	ptrace_post_event(PTRACE_EVENT_EXEC, old_vpid);
 	proc_exec_connector(current);
 	return 0;
 }
diff --git a/kernel/exit.c b/kernel/exit.c
index fd1c04193e18..aeb22a8e4d24 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -889,7 +889,9 @@ EXPORT_SYMBOL(complete_and_exit);
 
 SYSCALL_DEFINE1(exit, int, error_code)
 {
-	do_exit((error_code&0xff)<<8);
+	long code = (error_code&0xff)<<8;
+	if (!ptrace_post_event(PTRACE_EVENT_EXIT, code))
+		do_exit((error_code&0xff)<<8);
 }
 
 /*
diff --git a/kernel/fork.c b/kernel/fork.c
index dc06afd725cb..8533e056a3d6 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1266,8 +1266,7 @@ static void complete_vfork_done(struct task_struct *tsk)
 	task_unlock(tsk);
 }
 
-static int wait_for_vfork_done(struct task_struct *child,
-				struct completion *vfork)
+int wait_for_vfork_done(struct task_struct *child, struct completion *vfork)
 {
 	int killed;
 
@@ -2278,7 +2277,8 @@ static __latent_entropy struct task_struct *copy_process(
 
 	init_task_pid_links(p);
 	if (likely(p->pid)) {
-		ptrace_init_task(p, (clone_flags & CLONE_PTRACE) || trace);
+		ptrace_init_task(p, (clone_flags & CLONE_PTRACE) ||
+				 (trace && ptrace_event_enabled(current, trace)));
 
 		init_task_pid(p, PIDTYPE_PID, pid);
 		if (thread_group_leader(p)) {
@@ -2462,7 +2462,7 @@ struct task_struct *create_io_thread(int (*fn)(void *), void *arg, int node)
 pid_t kernel_clone(struct kernel_clone_args *args)
 {
 	u64 clone_flags = args->flags;
-	struct completion vfork;
+	unsigned long message;
 	struct pid *pid;
 	struct task_struct *p;
 	int trace = 0;
@@ -2495,9 +2495,6 @@ pid_t kernel_clone(struct kernel_clone_args *args)
 			trace = PTRACE_EVENT_CLONE;
 		else
 			trace = PTRACE_EVENT_FORK;
-
-		if (likely(!ptrace_event_enabled(current, trace)))
-			trace = 0;
 	}
 
 	p = copy_process(NULL, trace, NUMA_NO_NODE, args);
@@ -2512,30 +2509,27 @@ pid_t kernel_clone(struct kernel_clone_args *args)
 	 */
 	trace_sched_process_fork(current, p);
 
-	pid = get_task_pid(p, PIDTYPE_PID);
+	pid = task_pid(p);
 	nr = pid_vnr(pid);
+	message = pid_parent_nr(pid);
 
 	if (clone_flags & CLONE_PARENT_SETTID)
 		put_user(nr, args->parent_tid);
 
-	if (clone_flags & CLONE_VFORK) {
-		p->vfork_done = &vfork;
+	if (!(clone_flags & CLONE_VFORK)) {
+		wake_up_new_task(p);
+		ptrace_post_event(trace, message);
+	}
+	else if (!ptrace_post_event(PTRACE_EVENT_VFORK, (unsigned long)p)) {
+		struct completion vfork;
 		init_completion(&vfork);
+		p->vfork_done = &vfork;
 		get_task_struct(p);
+		wake_up_new_task(p);
+		if (wait_for_vfork_done(p, &vfork))
+			ptrace_post_event(PTRACE_EVENT_VFORK_DONE, message);
 	}
 
-	wake_up_new_task(p);
-
-	/* forking complete and child started to run, tell ptracer */
-	if (unlikely(trace))
-		ptrace_event_pid(trace, pid);
-
-	if (clone_flags & CLONE_VFORK) {
-		if (!wait_for_vfork_done(p, &vfork))
-			ptrace_event_pid(PTRACE_EVENT_VFORK_DONE, pid);
-	}
-
-	put_pid(pid);
 	return nr;
 }
 
diff --git a/kernel/signal.c b/kernel/signal.c
index f7c6ffcbd044..8ac8c4a31d88 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -155,7 +155,8 @@ static bool recalc_sigpending_tsk(struct task_struct *t)
 	if ((t->jobctl & (JOBCTL_PENDING_MASK | JOBCTL_TRAP_FREEZE)) ||
 	    PENDING(&t->pending, &t->blocked) ||
 	    PENDING(&t->signal->shared_pending, &t->blocked) ||
-	    cgroup_task_frozen(t)) {
+	    cgroup_task_frozen(t) ||
+	    t->stop_code) {
 		set_tsk_thread_flag(t, TIF_SIGPENDING);
 		return true;
 	}
@@ -2607,6 +2608,39 @@ bool get_signal(struct ksignal *ksig)
 	if (unlikely(current->task_works))
 		task_work_run();
 
+ptrace_event:
+	/* Handle a posted ptrace event */
+	if (unlikely(current->stop_code)) {
+		int stop_code = current->stop_code;
+		unsigned long message = current->ptrace_message;
+		struct completion vfork;
+		struct task_struct *p;
+
+		current->stop_code = 0;
+
+		if (stop_code == PTRACE_EVENT_VFORK) {
+			p = (struct task_struct *)message;
+			get_task_struct(p);
+			current->ptrace_message = pid_parent_nr(task_pid(p));
+			init_completion(&vfork);
+			p->vfork_done = &vfork;
+			wake_up_new_task(p);
+		}
+
+		spin_lock_irq(&sighand->siglock);
+		ptrace_do_notify(SIGTRAP, stop_code, CLD_TRAPPED);
+		spin_unlock_irq(&sighand->siglock);
+
+		if ((stop_code == PTRACE_EVENT_VFORK) &&
+		    wait_for_vfork_done(p, &vfork) &&
+		    ptrace_post_event(PTRACE_EVENT_VFORK_DONE, message))
+			goto ptrace_event;
+
+		if (stop_code == PTRACE_EVENT_EXIT) {
+			do_exit(message);
+		}
+	}
+
 	/*
 	 * For non-generic architectures, check for TIF_NOTIFY_SIGNAL so
 	 * that the arch handlers don't all have to do it. If we get here

Eric

  reply	other threads:[~2021-06-22 20:53 UTC|newest]

Thread overview: 112+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-10 20:57 Kernel stack read with PTRACE_EVENT_EXIT and io_uring threads Eric W. Biederman
2021-06-10 22:04 ` Linus Torvalds
2021-06-11 21:39   ` Eric W. Biederman
2021-06-11 23:26     ` Linus Torvalds
2021-06-13 21:54       ` Eric W. Biederman
2021-06-13 22:18         ` Linus Torvalds
2021-06-14  2:05           ` Michael Schmitz
2021-06-14  5:03             ` Michael Schmitz
2021-06-14 16:26               ` Eric W. Biederman
2021-06-14 22:26                 ` Michael Schmitz
2021-06-15 19:30                   ` Eric W. Biederman
2021-06-15 19:36                     ` [PATCH] alpha: Add extra switch_stack frames in exit, exec, and kernel threads Eric W. Biederman
2021-06-15 22:02                       ` Linus Torvalds
2021-06-16 16:32                         ` Eric W. Biederman
2021-06-16 18:29                           ` [PATCH 0/2] alpha/ptrace: Improved switch_stack handling Eric W. Biederman
2021-06-16 18:31                             ` [PATCH 1/2] alpha/ptrace: Record and handle the absence of switch_stack Eric W. Biederman
2021-06-16 20:00                               ` Linus Torvalds
2021-06-16 20:37                                 ` Linus Torvalds
2021-06-16 20:57                                   ` Eric W. Biederman
2021-06-16 21:02                                     ` Al Viro
2021-06-16 21:08                                     ` Linus Torvalds
2021-06-16 20:42                                 ` Eric W. Biederman
2021-06-16 20:17                               ` Al Viro
2021-06-21  2:01                               ` Michael Schmitz
2021-06-21  2:17                                 ` Linus Torvalds
2021-06-21  3:18                                   ` Michael Schmitz
2021-06-21  3:37                                     ` Linus Torvalds
2021-06-21  4:08                                       ` Michael Schmitz
2021-06-21  3:44                                     ` Al Viro
2021-06-21  5:31                                       ` Michael Schmitz
2021-06-21  2:27                                 ` Al Viro
2021-06-21  3:36                                   ` Michael Schmitz
2021-06-16 18:32                             ` [PATCH 2/2] alpha/ptrace: Add missing switch_stack frames Eric W. Biederman
2021-06-16 20:25                               ` Al Viro
2021-06-16 20:28                                 ` Al Viro
2021-06-16 20:49                                   ` Eric W. Biederman
2021-06-16 20:54                                     ` Al Viro
2021-06-16 20:47                                 ` Eric W. Biederman
2021-06-16 20:55                                   ` Al Viro
2021-06-16 20:50                       ` [PATCH] alpha: Add extra switch_stack frames in exit, exec, and kernel threads Al Viro
2021-06-15 20:56                     ` Kernel stack read with PTRACE_EVENT_EXIT and io_uring threads Michael Schmitz
2021-06-16  0:23                       ` Finn Thain
2021-06-15 21:58                     ` Linus Torvalds
2021-06-16 15:06                       ` Eric W. Biederman
2021-06-21 13:54                       ` Al Viro
2021-06-21 14:16                         ` Al Viro
2021-06-21 16:50                           ` Eric W. Biederman
2021-06-21 23:05                             ` Al Viro
2021-06-22 16:39                               ` Eric W. Biederman
2021-06-21 15:38                         ` Linus Torvalds
2021-06-21 18:59                         ` Al Viro
2021-06-21 19:22                           ` Linus Torvalds
2021-06-21 19:45                             ` Al Viro
2021-06-21 23:14                               ` Linus Torvalds
2021-06-21 23:23                                 ` Al Viro
2021-06-21 23:36                                   ` Linus Torvalds
2021-06-22 21:02                                     ` Eric W. Biederman
2021-06-22 21:48                                       ` Michael Schmitz
2021-06-23  5:26                                         ` Michael Schmitz
2021-06-23 14:36                                           ` Eric W. Biederman
2021-06-22  0:01                                 ` Michael Schmitz
2021-06-22 20:04                                 ` Michael Schmitz
2021-06-22 20:18                                   ` Al Viro
2021-06-22 21:57                                     ` Michael Schmitz
2021-06-21 20:03                             ` Eric W. Biederman
2021-06-21 23:15                               ` Linus Torvalds
2021-06-22 20:52                                 ` Eric W. Biederman [this message]
2021-06-23  0:41                                   ` Linus Torvalds
2021-06-23 14:33                                     ` Eric W. Biederman
2021-06-24 18:57                                       ` [PATCH 0/9] Refactoring exit Eric W. Biederman
2021-06-24 18:59                                         ` [PATCH 1/9] signal/sh: Use force_sig(SIGKILL) instead of do_group_exit(SIGKILL) Eric W. Biederman
2021-06-24 18:59                                         ` [PATCH 2/9] signal/seccomp: Refactor seccomp signal and coredump generation Eric W. Biederman
2021-06-26  3:17                                           ` Kees Cook
2021-06-28 19:21                                             ` Eric W. Biederman
2021-06-28 14:34                                           ` [signal/seccomp] 3fdd8c68c2: kernel-selftests.seccomp.seccomp_bpf.fail kernel test robot
2021-06-24 19:00                                         ` [PATCH 3/9] signal/seccomp: Dump core when there is only one live thread Eric W. Biederman
2021-06-26  3:20                                           ` Kees Cook
2021-06-24 19:01                                         ` [PATCH 4/9] signal: Factor start_group_exit out of complete_signal Eric W. Biederman
2021-06-24 20:04                                           ` Linus Torvalds
2021-06-25  8:47                                           ` kernel test robot
2021-06-26  3:24                                           ` Kees Cook
2021-06-24 19:01                                         ` [PATCH 5/9] signal/group_exit: Use start_group_exit in place of do_group_exit Eric W. Biederman
2021-06-26  3:35                                           ` Kees Cook
2021-06-24 19:02                                         ` [PATCH 6/9] signal: Fold do_group_exit into get_signal fixing io_uring threads Eric W. Biederman
2021-06-26  3:42                                           ` Kees Cook
2021-06-28 19:25                                             ` Eric W. Biederman
2021-06-24 19:02                                         ` [PATCH 7/9] signal: Make individual tasks exiting a first class concept Eric W. Biederman
2021-06-24 20:11                                           ` Linus Torvalds
2021-06-24 21:37                                             ` Eric W. Biederman
2021-06-24 19:03                                         ` [PATCH 8/9] signal/task_exit: Use start_task_exit in place of do_exit Eric W. Biederman
2021-06-26  5:56                                           ` Kees Cook
2021-06-24 19:03                                         ` [PATCH 9/9] signal: Move PTRACE_EVENT_EXIT into get_signal Eric W. Biederman
2021-06-24 22:45                                         ` [PATCH 0/9] Refactoring exit Al Viro
2021-06-27 22:13                                           ` Al Viro
2021-06-27 22:59                                             ` Michael Schmitz
2021-06-28  7:31                                               ` Geert Uytterhoeven
2021-06-28 16:20                                                 ` Eric W. Biederman
2021-06-28 17:14                                                 ` Michael Schmitz
2021-06-28 19:17                                                   ` Geert Uytterhoeven
2021-06-28 20:13                                                     ` Michael Schmitz
2021-06-28 21:18                                                       ` Geert Uytterhoeven
2021-06-28 23:42                                                         ` Michael Schmitz
2021-06-29 20:28                                                           ` [CFT][PATCH] exit/bdflush: Remove the deprecated bdflush system call Eric W. Biederman
2021-06-29 21:45                                                             ` Michael Schmitz
2021-06-30  8:24                                                             ` Geert Uytterhoeven
2021-06-30  8:37                                                             ` Arnd Bergmann
2021-06-30 12:30                                                             ` Cyril Hrubis
2021-06-28 19:02                                           ` [PATCH 0/9] Refactoring exit Eric W. Biederman
2021-06-21 19:24                           ` Kernel stack read with PTRACE_EVENT_EXIT and io_uring threads Al Viro
2021-06-21 23:24                             ` Michael Schmitz
2021-06-16  7:38                     ` Geert Uytterhoeven
2021-06-16 19:40                       ` Michael Schmitz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87tulpbp19.fsf@disp2133 \
    --to=ebiederm@xmission.com \
    --cc=arnd@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=geert@linux-m68k.org \
    --cc=ink@jurassic.park.msu.ru \
    --cc=keescook@chromium.org \
    --cc=ley.foon.tan@intel.com \
    --cc=linux-alpha@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-m68k@lists.linux-m68k.org \
    --cc=mattst88@gmail.com \
    --cc=oleg@redhat.com \
    --cc=rth@twiddle.net \
    --cc=schmitzmic@gmail.com \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).