From: ebiederm@xmission.com (Eric W. Biederman)
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>,
Michael Schmitz <schmitzmic@gmail.com>,
linux-arch <linux-arch@vger.kernel.org>,
Jens Axboe <axboe@kernel.dk>, Oleg Nesterov <oleg@redhat.com>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Richard Henderson <rth@twiddle.net>,
Ivan Kokshaysky <ink@jurassic.park.msu.ru>,
Matt Turner <mattst88@gmail.com>,
alpha <linux-alpha@vger.kernel.org>,
Geert Uytterhoeven <geert@linux-m68k.org>,
linux-m68k <linux-m68k@lists.linux-m68k.org>,
Arnd Bergmann <arnd@kernel.org>,
Ley Foon Tan <ley.foon.tan@intel.com>, Tejun Heo <tj@kernel.org>,
Kees Cook <keescook@chromium.org>
Subject: Re: Kernel stack read with PTRACE_EVENT_EXIT and io_uring threads
Date: Tue, 22 Jun 2021 15:52:18 -0500 [thread overview]
Message-ID: <87tulpbp19.fsf@disp2133> (raw)
In-Reply-To: <CAHk-=wh4_iMRmWcao6a8kCvR0Hhdrz+M9L+q4Bfcwx9E9D0huw@mail.gmail.com> (Linus Torvalds's message of "Mon, 21 Jun 2021 16:15:37 -0700")
Linus Torvalds <torvalds@linux-foundation.org> writes:
> On Mon, Jun 21, 2021 at 1:04 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>>
>> For other ptrace_event calls I am playing with seeing if I can split
>> them in two. Like sending a signal. So that we can have perform all
>> of the work in get_signal.
>
> That sounds like the right model, but I don't think it works.
> Particularly not for exit(). The second phase will never happen.
Playing with it some more I think I have everything working working
except for PTRACE_EVENT_SECCOMP (which can stay ptrace_event) and
group_exit(2).
Basically in exit sending yourself a signal and then calling do_exit
from the signal handler is not unreasonable, as exit is an ordinary
system call.
I haven't seen anything that ``knows'' that exit(2) or exit_group(2)
will never return and adds a special case in the system call table for
that case.
The complications of exit_group(2) are roughly those of moving
ptrace_event out of do_exit. They look doable and I am going to look
at that next.
This is not to say that this is the most maintainable way or that we
necessarily want to implement things this way, but I need to look and
see what it looks like.
For purposes of discussion this is my current draft implementation.
diff --git a/include/linux/sched.h b/include/linux/sched.h
index d2c881384517..891812d32b90 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1087,6 +1087,7 @@ struct task_struct {
struct capture_control *capture_control;
#endif
/* Ptrace state: */
+ int stop_code;
unsigned long ptrace_message;
kernel_siginfo_t *last_siginfo;
diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index b5ebf6c01292..33c50119b193 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -164,18 +164,29 @@ static inline void ptrace_event(int event, unsigned long message)
}
}
+static inline bool ptrace_post_event(int event, unsigned long message)
+{
+ bool posted = false;
+ if (unlikely(ptrace_event_enabled(current, event))) {
+ current->ptrace_message = message;
+ current->stop_code = (event << 8) | SIGTRAP;
+ set_tsk_thread_flag(current, TIF_SIGPENDING);
+ posted = true;
+ } else if (event == PTRACE_EVENT_EXEC) {
+ /* legacy EXEC report via SIGTRAP */
+ if ((current->ptrace & (PT_PTRACED|PT_SEIZED)) == PT_PTRACED)
+ send_sig(SIGTRAP, current, 0);
+ }
+ return posted;
+}
+
/**
- * ptrace_event_pid - possibly stop for a ptrace event notification
- * @event: %PTRACE_EVENT_* value to report
- * @pid: process identifier for %PTRACE_GETEVENTMSG to return
- *
- * Check whether @event is enabled and, if so, report @event and @pid
- * to the ptrace parent. @pid is reported as the pid_t seen from the
- * ptrace parent's pid namespace.
+ * pid_parent_nr - Return the number the parent knows this pid as
+ * @pid: The struct pid whose numerical value we want
*
* Called without locks.
*/
-static inline void ptrace_event_pid(int event, struct pid *pid)
+static inline pid_t pid_parent_nr(struct pid *pid)
{
/*
* FIXME: There's a potential race if a ptracer in a different pid
@@ -183,16 +194,15 @@ static inline void ptrace_event_pid(int event, struct pid *pid)
* when we acquire tasklist_lock in ptrace_stop(). If this happens,
* the ptracer will get a bogus pid from PTRACE_GETEVENTMSG.
*/
- unsigned long message = 0;
+ pid_t nr = 0;
struct pid_namespace *ns;
rcu_read_lock();
ns = task_active_pid_ns(rcu_dereference(current->parent));
if (ns)
- message = pid_nr_ns(pid, ns);
+ nr = pid_nr_ns(pid, ns);
rcu_read_unlock();
-
- ptrace_event(event, message);
+ return nr;
}
/**
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index e24b1fe348e3..a2eac3831369 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -97,6 +97,8 @@ extern void exit_mm_release(struct task_struct *, struct mm_struct *);
/* Remove the current tasks stale references to the old mm_struct on exec() */
extern void exec_mm_release(struct task_struct *, struct mm_struct *);
+extern int wait_for_vfork_done(struct task_struct *child, struct completion *vfork);
+
#ifdef CONFIG_MEMCG
extern void mm_update_next_owner(struct mm_struct *mm);
#else
diff --git a/fs/exec.c b/fs/exec.c
index 18594f11c31f..bb4751d84e2d 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1781,7 +1781,7 @@ static int exec_binprm(struct linux_binprm *bprm)
audit_bprm(bprm);
trace_sched_process_exec(current, old_pid, bprm);
- ptrace_event(PTRACE_EVENT_EXEC, old_vpid);
+ ptrace_post_event(PTRACE_EVENT_EXEC, old_vpid);
proc_exec_connector(current);
return 0;
}
diff --git a/kernel/exit.c b/kernel/exit.c
index fd1c04193e18..aeb22a8e4d24 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -889,7 +889,9 @@ EXPORT_SYMBOL(complete_and_exit);
SYSCALL_DEFINE1(exit, int, error_code)
{
- do_exit((error_code&0xff)<<8);
+ long code = (error_code&0xff)<<8;
+ if (!ptrace_post_event(PTRACE_EVENT_EXIT, code))
+ do_exit((error_code&0xff)<<8);
}
/*
diff --git a/kernel/fork.c b/kernel/fork.c
index dc06afd725cb..8533e056a3d6 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1266,8 +1266,7 @@ static void complete_vfork_done(struct task_struct *tsk)
task_unlock(tsk);
}
-static int wait_for_vfork_done(struct task_struct *child,
- struct completion *vfork)
+int wait_for_vfork_done(struct task_struct *child, struct completion *vfork)
{
int killed;
@@ -2278,7 +2277,8 @@ static __latent_entropy struct task_struct *copy_process(
init_task_pid_links(p);
if (likely(p->pid)) {
- ptrace_init_task(p, (clone_flags & CLONE_PTRACE) || trace);
+ ptrace_init_task(p, (clone_flags & CLONE_PTRACE) ||
+ (trace && ptrace_event_enabled(current, trace)));
init_task_pid(p, PIDTYPE_PID, pid);
if (thread_group_leader(p)) {
@@ -2462,7 +2462,7 @@ struct task_struct *create_io_thread(int (*fn)(void *), void *arg, int node)
pid_t kernel_clone(struct kernel_clone_args *args)
{
u64 clone_flags = args->flags;
- struct completion vfork;
+ unsigned long message;
struct pid *pid;
struct task_struct *p;
int trace = 0;
@@ -2495,9 +2495,6 @@ pid_t kernel_clone(struct kernel_clone_args *args)
trace = PTRACE_EVENT_CLONE;
else
trace = PTRACE_EVENT_FORK;
-
- if (likely(!ptrace_event_enabled(current, trace)))
- trace = 0;
}
p = copy_process(NULL, trace, NUMA_NO_NODE, args);
@@ -2512,30 +2509,27 @@ pid_t kernel_clone(struct kernel_clone_args *args)
*/
trace_sched_process_fork(current, p);
- pid = get_task_pid(p, PIDTYPE_PID);
+ pid = task_pid(p);
nr = pid_vnr(pid);
+ message = pid_parent_nr(pid);
if (clone_flags & CLONE_PARENT_SETTID)
put_user(nr, args->parent_tid);
- if (clone_flags & CLONE_VFORK) {
- p->vfork_done = &vfork;
+ if (!(clone_flags & CLONE_VFORK)) {
+ wake_up_new_task(p);
+ ptrace_post_event(trace, message);
+ }
+ else if (!ptrace_post_event(PTRACE_EVENT_VFORK, (unsigned long)p)) {
+ struct completion vfork;
init_completion(&vfork);
+ p->vfork_done = &vfork;
get_task_struct(p);
+ wake_up_new_task(p);
+ if (wait_for_vfork_done(p, &vfork))
+ ptrace_post_event(PTRACE_EVENT_VFORK_DONE, message);
}
- wake_up_new_task(p);
-
- /* forking complete and child started to run, tell ptracer */
- if (unlikely(trace))
- ptrace_event_pid(trace, pid);
-
- if (clone_flags & CLONE_VFORK) {
- if (!wait_for_vfork_done(p, &vfork))
- ptrace_event_pid(PTRACE_EVENT_VFORK_DONE, pid);
- }
-
- put_pid(pid);
return nr;
}
diff --git a/kernel/signal.c b/kernel/signal.c
index f7c6ffcbd044..8ac8c4a31d88 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -155,7 +155,8 @@ static bool recalc_sigpending_tsk(struct task_struct *t)
if ((t->jobctl & (JOBCTL_PENDING_MASK | JOBCTL_TRAP_FREEZE)) ||
PENDING(&t->pending, &t->blocked) ||
PENDING(&t->signal->shared_pending, &t->blocked) ||
- cgroup_task_frozen(t)) {
+ cgroup_task_frozen(t) ||
+ t->stop_code) {
set_tsk_thread_flag(t, TIF_SIGPENDING);
return true;
}
@@ -2607,6 +2608,39 @@ bool get_signal(struct ksignal *ksig)
if (unlikely(current->task_works))
task_work_run();
+ptrace_event:
+ /* Handle a posted ptrace event */
+ if (unlikely(current->stop_code)) {
+ int stop_code = current->stop_code;
+ unsigned long message = current->ptrace_message;
+ struct completion vfork;
+ struct task_struct *p;
+
+ current->stop_code = 0;
+
+ if (stop_code == PTRACE_EVENT_VFORK) {
+ p = (struct task_struct *)message;
+ get_task_struct(p);
+ current->ptrace_message = pid_parent_nr(task_pid(p));
+ init_completion(&vfork);
+ p->vfork_done = &vfork;
+ wake_up_new_task(p);
+ }
+
+ spin_lock_irq(&sighand->siglock);
+ ptrace_do_notify(SIGTRAP, stop_code, CLD_TRAPPED);
+ spin_unlock_irq(&sighand->siglock);
+
+ if ((stop_code == PTRACE_EVENT_VFORK) &&
+ wait_for_vfork_done(p, &vfork) &&
+ ptrace_post_event(PTRACE_EVENT_VFORK_DONE, message))
+ goto ptrace_event;
+
+ if (stop_code == PTRACE_EVENT_EXIT) {
+ do_exit(message);
+ }
+ }
+
/*
* For non-generic architectures, check for TIF_NOTIFY_SIGNAL so
* that the arch handlers don't all have to do it. If we get here
Eric
next prev parent reply other threads:[~2021-06-22 20:53 UTC|newest]
Thread overview: 112+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-10 20:57 Kernel stack read with PTRACE_EVENT_EXIT and io_uring threads Eric W. Biederman
2021-06-10 22:04 ` Linus Torvalds
2021-06-11 21:39 ` Eric W. Biederman
2021-06-11 23:26 ` Linus Torvalds
2021-06-13 21:54 ` Eric W. Biederman
2021-06-13 22:18 ` Linus Torvalds
2021-06-14 2:05 ` Michael Schmitz
2021-06-14 5:03 ` Michael Schmitz
2021-06-14 16:26 ` Eric W. Biederman
2021-06-14 22:26 ` Michael Schmitz
2021-06-15 19:30 ` Eric W. Biederman
2021-06-15 19:36 ` [PATCH] alpha: Add extra switch_stack frames in exit, exec, and kernel threads Eric W. Biederman
2021-06-15 22:02 ` Linus Torvalds
2021-06-16 16:32 ` Eric W. Biederman
2021-06-16 18:29 ` [PATCH 0/2] alpha/ptrace: Improved switch_stack handling Eric W. Biederman
2021-06-16 18:31 ` [PATCH 1/2] alpha/ptrace: Record and handle the absence of switch_stack Eric W. Biederman
2021-06-16 20:00 ` Linus Torvalds
2021-06-16 20:37 ` Linus Torvalds
2021-06-16 20:57 ` Eric W. Biederman
2021-06-16 21:02 ` Al Viro
2021-06-16 21:08 ` Linus Torvalds
2021-06-16 20:42 ` Eric W. Biederman
2021-06-16 20:17 ` Al Viro
2021-06-21 2:01 ` Michael Schmitz
2021-06-21 2:17 ` Linus Torvalds
2021-06-21 3:18 ` Michael Schmitz
2021-06-21 3:37 ` Linus Torvalds
2021-06-21 4:08 ` Michael Schmitz
2021-06-21 3:44 ` Al Viro
2021-06-21 5:31 ` Michael Schmitz
2021-06-21 2:27 ` Al Viro
2021-06-21 3:36 ` Michael Schmitz
2021-06-16 18:32 ` [PATCH 2/2] alpha/ptrace: Add missing switch_stack frames Eric W. Biederman
2021-06-16 20:25 ` Al Viro
2021-06-16 20:28 ` Al Viro
2021-06-16 20:49 ` Eric W. Biederman
2021-06-16 20:54 ` Al Viro
2021-06-16 20:47 ` Eric W. Biederman
2021-06-16 20:55 ` Al Viro
2021-06-16 20:50 ` [PATCH] alpha: Add extra switch_stack frames in exit, exec, and kernel threads Al Viro
2021-06-15 20:56 ` Kernel stack read with PTRACE_EVENT_EXIT and io_uring threads Michael Schmitz
2021-06-16 0:23 ` Finn Thain
2021-06-15 21:58 ` Linus Torvalds
2021-06-16 15:06 ` Eric W. Biederman
2021-06-21 13:54 ` Al Viro
2021-06-21 14:16 ` Al Viro
2021-06-21 16:50 ` Eric W. Biederman
2021-06-21 23:05 ` Al Viro
2021-06-22 16:39 ` Eric W. Biederman
2021-06-21 15:38 ` Linus Torvalds
2021-06-21 18:59 ` Al Viro
2021-06-21 19:22 ` Linus Torvalds
2021-06-21 19:45 ` Al Viro
2021-06-21 23:14 ` Linus Torvalds
2021-06-21 23:23 ` Al Viro
2021-06-21 23:36 ` Linus Torvalds
2021-06-22 21:02 ` Eric W. Biederman
2021-06-22 21:48 ` Michael Schmitz
2021-06-23 5:26 ` Michael Schmitz
2021-06-23 14:36 ` Eric W. Biederman
2021-06-22 0:01 ` Michael Schmitz
2021-06-22 20:04 ` Michael Schmitz
2021-06-22 20:18 ` Al Viro
2021-06-22 21:57 ` Michael Schmitz
2021-06-21 20:03 ` Eric W. Biederman
2021-06-21 23:15 ` Linus Torvalds
2021-06-22 20:52 ` Eric W. Biederman [this message]
2021-06-23 0:41 ` Linus Torvalds
2021-06-23 14:33 ` Eric W. Biederman
2021-06-24 18:57 ` [PATCH 0/9] Refactoring exit Eric W. Biederman
2021-06-24 18:59 ` [PATCH 1/9] signal/sh: Use force_sig(SIGKILL) instead of do_group_exit(SIGKILL) Eric W. Biederman
2021-06-24 18:59 ` [PATCH 2/9] signal/seccomp: Refactor seccomp signal and coredump generation Eric W. Biederman
2021-06-26 3:17 ` Kees Cook
2021-06-28 19:21 ` Eric W. Biederman
2021-06-28 14:34 ` [signal/seccomp] 3fdd8c68c2: kernel-selftests.seccomp.seccomp_bpf.fail kernel test robot
2021-06-24 19:00 ` [PATCH 3/9] signal/seccomp: Dump core when there is only one live thread Eric W. Biederman
2021-06-26 3:20 ` Kees Cook
2021-06-24 19:01 ` [PATCH 4/9] signal: Factor start_group_exit out of complete_signal Eric W. Biederman
2021-06-24 20:04 ` Linus Torvalds
2021-06-25 8:47 ` kernel test robot
2021-06-26 3:24 ` Kees Cook
2021-06-24 19:01 ` [PATCH 5/9] signal/group_exit: Use start_group_exit in place of do_group_exit Eric W. Biederman
2021-06-26 3:35 ` Kees Cook
2021-06-24 19:02 ` [PATCH 6/9] signal: Fold do_group_exit into get_signal fixing io_uring threads Eric W. Biederman
2021-06-26 3:42 ` Kees Cook
2021-06-28 19:25 ` Eric W. Biederman
2021-06-24 19:02 ` [PATCH 7/9] signal: Make individual tasks exiting a first class concept Eric W. Biederman
2021-06-24 20:11 ` Linus Torvalds
2021-06-24 21:37 ` Eric W. Biederman
2021-06-24 19:03 ` [PATCH 8/9] signal/task_exit: Use start_task_exit in place of do_exit Eric W. Biederman
2021-06-26 5:56 ` Kees Cook
2021-06-24 19:03 ` [PATCH 9/9] signal: Move PTRACE_EVENT_EXIT into get_signal Eric W. Biederman
2021-06-24 22:45 ` [PATCH 0/9] Refactoring exit Al Viro
2021-06-27 22:13 ` Al Viro
2021-06-27 22:59 ` Michael Schmitz
2021-06-28 7:31 ` Geert Uytterhoeven
2021-06-28 16:20 ` Eric W. Biederman
2021-06-28 17:14 ` Michael Schmitz
2021-06-28 19:17 ` Geert Uytterhoeven
2021-06-28 20:13 ` Michael Schmitz
2021-06-28 21:18 ` Geert Uytterhoeven
2021-06-28 23:42 ` Michael Schmitz
2021-06-29 20:28 ` [CFT][PATCH] exit/bdflush: Remove the deprecated bdflush system call Eric W. Biederman
2021-06-29 21:45 ` Michael Schmitz
2021-06-30 8:24 ` Geert Uytterhoeven
2021-06-30 8:37 ` Arnd Bergmann
2021-06-30 12:30 ` Cyril Hrubis
2021-06-28 19:02 ` [PATCH 0/9] Refactoring exit Eric W. Biederman
2021-06-21 19:24 ` Kernel stack read with PTRACE_EVENT_EXIT and io_uring threads Al Viro
2021-06-21 23:24 ` Michael Schmitz
2021-06-16 7:38 ` Geert Uytterhoeven
2021-06-16 19:40 ` Michael Schmitz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87tulpbp19.fsf@disp2133 \
--to=ebiederm@xmission.com \
--cc=arnd@kernel.org \
--cc=axboe@kernel.dk \
--cc=geert@linux-m68k.org \
--cc=ink@jurassic.park.msu.ru \
--cc=keescook@chromium.org \
--cc=ley.foon.tan@intel.com \
--cc=linux-alpha@vger.kernel.org \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-m68k@lists.linux-m68k.org \
--cc=mattst88@gmail.com \
--cc=oleg@redhat.com \
--cc=rth@twiddle.net \
--cc=schmitzmic@gmail.com \
--cc=tj@kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).