From: ebiederm@xmission.com (Eric W. Biederman)
To: Al Viro <viro@zeniv.linux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Michael Schmitz <schmitzmic@gmail.com>,
linux-arch <linux-arch@vger.kernel.org>,
Jens Axboe <axboe@kernel.dk>, Oleg Nesterov <oleg@redhat.com>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Richard Henderson <rth@twiddle.net>,
Ivan Kokshaysky <ink@jurassic.park.msu.ru>,
Matt Turner <mattst88@gmail.com>,
alpha <linux-alpha@vger.kernel.org>,
Geert Uytterhoeven <geert@linux-m68k.org>,
linux-m68k <linux-m68k@lists.linux-m68k.org>,
Arnd Bergmann <arnd@kernel.org>,
Ley Foon Tan <ley.foon.tan@intel.com>, Tejun Heo <tj@kernel.org>,
Kees Cook <keescook@chromium.org>
Subject: Re: Kernel stack read with PTRACE_EVENT_EXIT and io_uring threads
Date: Tue, 22 Jun 2021 11:39:40 -0500 [thread overview]
Message-ID: <87pmwddfar.fsf@disp2133> (raw)
In-Reply-To: <YNEbM+B8Su7GDCSo@zeniv-ca.linux.org.uk> (Al Viro's message of "Mon, 21 Jun 2021 23:05:23 +0000")
Al Viro <viro@zeniv.linux.org.uk> writes:
> On Mon, Jun 21, 2021 at 11:50:56AM -0500, Eric W. Biederman wrote:
>> Al Viro <viro@zeniv.linux.org.uk> writes:
>>
>> > On Mon, Jun 21, 2021 at 01:54:56PM +0000, Al Viro wrote:
>> >> On Tue, Jun 15, 2021 at 02:58:12PM -0700, Linus Torvalds wrote:
>> >>
>> >> > And I think our horrible "kernel threads return to user space when
>> >> > done" is absolutely horrifically nasty. Maybe of the clever sort, but
>> >> > mostly of the historical horror sort.
>> >>
>> >> How would you prefer to handle that, then? Separate magical path from
>> >> kernel_execve() to switch to userland? We used to have something of
>> >> that sort, and that had been a real horror...
>> >>
>> >> As it is, it's "kernel thread is spawned at the point similar to
>> >> ret_from_fork(), runs the payload (which almost never returns) and
>> >> then proceeds out to userland, same way fork(2) would've done."
>> >> That way kernel_execve() doesn't have to do anything magical.
>> >>
>> >> Al, digging through the old notes and current call graph...
>> >
>> > FWIW, the major assumption back then had been that get_signal(),
>> > signal_delivered() and all associated machinery (including coredumps)
>> > runs *only* from SIGPENDING/NOTIFY_SIGNAL handling.
>> >
>> > And "has complete registers on stack" is only a part of that;
>> > there was other fun stuff in the area ;-/ Do we want coredumps for
>> > those, and if we do, will the de_thread stuff work there?
>>
>> Do we want coredumps from processes that use io_uring? yes
>> Exactly what we want from io_uring threads is less clear. We can't
>> really give much that is meaningful beyond the thread ids of the
>> io_uring threads.
>>
>> What problems do are you seeing beyond the missing registers on the
>> stack for kernel threads?
>>
>> I don't immediately see the connection between coredumps and de_thread.
>>
>> The function de_thread arranges for the fatal_signal_pending to be true,
>> and that should work just fine for io_uring threads. The io_uring
>> threads process the fatal_signal with get_signal and then proceed to
>> exit eventually calling do_exit.
>
> I would like to see the testing in cases when the io-uring thread is
> the one getting hit by initial signal and when it's the normal one
> with associated io-uring ones. The thread-collecting logics at least
> used to depend upon fairly subtle assumptions, and "kernel threads
> obviously can't show up as candidates" used to narrow the analysis
> down...
>
> In any case, WTF would we allow reads or writes to *any* registers of
> such threads? It's not as simple as "just return zeroes", BTW - the
> values allowed in special registers might have non-trivial constraints
> on them. The same goes for coredump - we don't _have_ registers to
> dump for those, period.
>
> Looks like the first things to do would be
> * prohibit ptrace accessing any regsets of worker threads
> * make coredump skip all register notes for those
Skipping register notes is fine. Prohibiting ptrace access to any
regsets of worker threads is interesting. I think that was tried and
shown to confuse gdb. So the conclusion was just to provide a fake set
of registers.
Which has appears to work up to the point of dealing with architectures
that have their magic caller-saved optimization (like alpha and m68k),
and no check that all of the registers were saved when accessed. Adding
a dummy switch stack frame for the kernel threads on those architectures
looks like a good/cheap solution at first glance.
> Note, BTW, that kernel_thread() and kernel_execve() do *NOT* step into
> ptrace_notify() - explicit CLONE_UNTRACED for the former and zero
> current->ptrace in the caller of the latter. So fork and exec side
> has ptrace_event() crap limited to real syscalls.
That is where I thought we were. Thanks for confirming that.
> It's seccomp[1] and exit-related stuff that are messy...
>
> [1] "never trust somebody who introduces himself as Honest Joe and keeps
> carping on that all the time"; c.f. __secure_computing(), CONFIG_INTEGRITY,
> etc.
next prev parent reply other threads:[~2021-06-22 16:40 UTC|newest]
Thread overview: 112+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-10 20:57 Kernel stack read with PTRACE_EVENT_EXIT and io_uring threads Eric W. Biederman
2021-06-10 22:04 ` Linus Torvalds
2021-06-11 21:39 ` Eric W. Biederman
2021-06-11 23:26 ` Linus Torvalds
2021-06-13 21:54 ` Eric W. Biederman
2021-06-13 22:18 ` Linus Torvalds
2021-06-14 2:05 ` Michael Schmitz
2021-06-14 5:03 ` Michael Schmitz
2021-06-14 16:26 ` Eric W. Biederman
2021-06-14 22:26 ` Michael Schmitz
2021-06-15 19:30 ` Eric W. Biederman
2021-06-15 19:36 ` [PATCH] alpha: Add extra switch_stack frames in exit, exec, and kernel threads Eric W. Biederman
2021-06-15 22:02 ` Linus Torvalds
2021-06-16 16:32 ` Eric W. Biederman
2021-06-16 18:29 ` [PATCH 0/2] alpha/ptrace: Improved switch_stack handling Eric W. Biederman
2021-06-16 18:31 ` [PATCH 1/2] alpha/ptrace: Record and handle the absence of switch_stack Eric W. Biederman
2021-06-16 20:00 ` Linus Torvalds
2021-06-16 20:37 ` Linus Torvalds
2021-06-16 20:57 ` Eric W. Biederman
2021-06-16 21:02 ` Al Viro
2021-06-16 21:08 ` Linus Torvalds
2021-06-16 20:42 ` Eric W. Biederman
2021-06-16 20:17 ` Al Viro
2021-06-21 2:01 ` Michael Schmitz
2021-06-21 2:17 ` Linus Torvalds
2021-06-21 3:18 ` Michael Schmitz
2021-06-21 3:37 ` Linus Torvalds
2021-06-21 4:08 ` Michael Schmitz
2021-06-21 3:44 ` Al Viro
2021-06-21 5:31 ` Michael Schmitz
2021-06-21 2:27 ` Al Viro
2021-06-21 3:36 ` Michael Schmitz
2021-06-16 18:32 ` [PATCH 2/2] alpha/ptrace: Add missing switch_stack frames Eric W. Biederman
2021-06-16 20:25 ` Al Viro
2021-06-16 20:28 ` Al Viro
2021-06-16 20:49 ` Eric W. Biederman
2021-06-16 20:54 ` Al Viro
2021-06-16 20:47 ` Eric W. Biederman
2021-06-16 20:55 ` Al Viro
2021-06-16 20:50 ` [PATCH] alpha: Add extra switch_stack frames in exit, exec, and kernel threads Al Viro
2021-06-15 20:56 ` Kernel stack read with PTRACE_EVENT_EXIT and io_uring threads Michael Schmitz
2021-06-16 0:23 ` Finn Thain
2021-06-15 21:58 ` Linus Torvalds
2021-06-16 15:06 ` Eric W. Biederman
2021-06-21 13:54 ` Al Viro
2021-06-21 14:16 ` Al Viro
2021-06-21 16:50 ` Eric W. Biederman
2021-06-21 23:05 ` Al Viro
2021-06-22 16:39 ` Eric W. Biederman [this message]
2021-06-21 15:38 ` Linus Torvalds
2021-06-21 18:59 ` Al Viro
2021-06-21 19:22 ` Linus Torvalds
2021-06-21 19:45 ` Al Viro
2021-06-21 23:14 ` Linus Torvalds
2021-06-21 23:23 ` Al Viro
2021-06-21 23:36 ` Linus Torvalds
2021-06-22 21:02 ` Eric W. Biederman
2021-06-22 21:48 ` Michael Schmitz
2021-06-23 5:26 ` Michael Schmitz
2021-06-23 14:36 ` Eric W. Biederman
2021-06-22 0:01 ` Michael Schmitz
2021-06-22 20:04 ` Michael Schmitz
2021-06-22 20:18 ` Al Viro
2021-06-22 21:57 ` Michael Schmitz
2021-06-21 20:03 ` Eric W. Biederman
2021-06-21 23:15 ` Linus Torvalds
2021-06-22 20:52 ` Eric W. Biederman
2021-06-23 0:41 ` Linus Torvalds
2021-06-23 14:33 ` Eric W. Biederman
2021-06-24 18:57 ` [PATCH 0/9] Refactoring exit Eric W. Biederman
2021-06-24 18:59 ` [PATCH 1/9] signal/sh: Use force_sig(SIGKILL) instead of do_group_exit(SIGKILL) Eric W. Biederman
2021-06-24 18:59 ` [PATCH 2/9] signal/seccomp: Refactor seccomp signal and coredump generation Eric W. Biederman
2021-06-26 3:17 ` Kees Cook
2021-06-28 19:21 ` Eric W. Biederman
2021-06-28 14:34 ` [signal/seccomp] 3fdd8c68c2: kernel-selftests.seccomp.seccomp_bpf.fail kernel test robot
2021-06-24 19:00 ` [PATCH 3/9] signal/seccomp: Dump core when there is only one live thread Eric W. Biederman
2021-06-26 3:20 ` Kees Cook
2021-06-24 19:01 ` [PATCH 4/9] signal: Factor start_group_exit out of complete_signal Eric W. Biederman
2021-06-24 20:04 ` Linus Torvalds
2021-06-25 8:47 ` kernel test robot
2021-06-26 3:24 ` Kees Cook
2021-06-24 19:01 ` [PATCH 5/9] signal/group_exit: Use start_group_exit in place of do_group_exit Eric W. Biederman
2021-06-26 3:35 ` Kees Cook
2021-06-24 19:02 ` [PATCH 6/9] signal: Fold do_group_exit into get_signal fixing io_uring threads Eric W. Biederman
2021-06-26 3:42 ` Kees Cook
2021-06-28 19:25 ` Eric W. Biederman
2021-06-24 19:02 ` [PATCH 7/9] signal: Make individual tasks exiting a first class concept Eric W. Biederman
2021-06-24 20:11 ` Linus Torvalds
2021-06-24 21:37 ` Eric W. Biederman
2021-06-24 19:03 ` [PATCH 8/9] signal/task_exit: Use start_task_exit in place of do_exit Eric W. Biederman
2021-06-26 5:56 ` Kees Cook
2021-06-24 19:03 ` [PATCH 9/9] signal: Move PTRACE_EVENT_EXIT into get_signal Eric W. Biederman
2021-06-24 22:45 ` [PATCH 0/9] Refactoring exit Al Viro
2021-06-27 22:13 ` Al Viro
2021-06-27 22:59 ` Michael Schmitz
2021-06-28 7:31 ` Geert Uytterhoeven
2021-06-28 16:20 ` Eric W. Biederman
2021-06-28 17:14 ` Michael Schmitz
2021-06-28 19:17 ` Geert Uytterhoeven
2021-06-28 20:13 ` Michael Schmitz
2021-06-28 21:18 ` Geert Uytterhoeven
2021-06-28 23:42 ` Michael Schmitz
2021-06-29 20:28 ` [CFT][PATCH] exit/bdflush: Remove the deprecated bdflush system call Eric W. Biederman
2021-06-29 21:45 ` Michael Schmitz
2021-06-30 8:24 ` Geert Uytterhoeven
2021-06-30 8:37 ` Arnd Bergmann
2021-06-30 12:30 ` Cyril Hrubis
2021-06-28 19:02 ` [PATCH 0/9] Refactoring exit Eric W. Biederman
2021-06-21 19:24 ` Kernel stack read with PTRACE_EVENT_EXIT and io_uring threads Al Viro
2021-06-21 23:24 ` Michael Schmitz
2021-06-16 7:38 ` Geert Uytterhoeven
2021-06-16 19:40 ` Michael Schmitz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87pmwddfar.fsf@disp2133 \
--to=ebiederm@xmission.com \
--cc=arnd@kernel.org \
--cc=axboe@kernel.dk \
--cc=geert@linux-m68k.org \
--cc=ink@jurassic.park.msu.ru \
--cc=keescook@chromium.org \
--cc=ley.foon.tan@intel.com \
--cc=linux-alpha@vger.kernel.org \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-m68k@lists.linux-m68k.org \
--cc=mattst88@gmail.com \
--cc=oleg@redhat.com \
--cc=rth@twiddle.net \
--cc=schmitzmic@gmail.com \
--cc=tj@kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).