From: Thomas Gleixner <tglx@linutronix.de>
To: Gabriel Krisman Bertazi <krisman@collabora.com>
Cc: linux-kernel@vger.kernel.org, kernel@collabora.com,
willy@infradead.org, luto@kernel.org, gofmanp@gmail.com,
keescook@chromium.org, linux-kselftest@vger.kernel.org,
shuah@kernel.org, Gabriel Krisman Bertazi <krisman@collabora.com>
Subject: Re: [PATCH v4 1/2] kernel: Implement selective syscall userspace redirection
Date: Mon, 20 Jul 2020 12:08:47 +0200 [thread overview]
Message-ID: <87v9iimrbk.fsf@nanos.tec.linutronix.de> (raw)
In-Reply-To: <20200716193141.4068476-2-krisman@collabora.com>
Gabriel,
Gabriel Krisman Bertazi <krisman@collabora.com> writes:
> Introduce a mechanism to quickly disable/enable syscall handling for a
> specific process and redirect to userspace via SIGSYS. This is useful
> for processes with parts that require syscall redirection and parts that
> don't, but who need to perform this boundary crossing really fast,
> without paying the cost of a system call to reconfigure syscall handling
> on each boundary transition. This is particularly important for Windows
> games running over Wine.
>
> The proposed interface looks like this:
>
> prctl(PR_SET_SYSCALL_USER_DISPATCH, <op>, <start_addr>, <end_addr>, [selector])
>
> The range [<start_addr>,<end_addr>] is a part of the process memory map
> that is allowed to by-pass the redirection code and dispatch syscalls
> directly, such that in fast paths a process doesn't need to disable the
> trap nor the kernel has to check the selector. This is essential to
> return from SIGSYS to a blocked area without triggering another SIGSYS
> from rt_sigreturn.
Why isn't rt_sigreturn() exempt from that redirection in the first place?
> ---
> arch/Kconfig | 20 ++++++
> arch/x86/Kconfig | 1 +
> arch/x86/entry/common.c | 5 ++
> arch/x86/include/asm/thread_info.h | 4 +-
> arch/x86/kernel/signal_compat.c | 2 +-
> fs/exec.c | 2 +
> include/linux/sched.h | 3 +
> include/linux/syscall_user_dispatch.h | 50 +++++++++++++++
> include/uapi/asm-generic/siginfo.h | 3 +-
> include/uapi/linux/prctl.h | 5 ++
> kernel/Makefile | 1 +
> kernel/fork.c | 1 +
> kernel/sys.c | 5 ++
> kernel/syscall_user_dispatch.c | 92 +++++++++++++++++++++++++++
A big combo patch is not how we do that. Please split it up into the
core part and a patch enabling it for a particular architexture.
As I said in my reply to Andy, this wants to go on top of the generic
entry/exit work stuff:
https://lore.kernel.org/r/20200716182208.180916541@linutronix.de
and then syscall_user_dispatch.c ends up in kernel/entry/ and the
dispatching function is not exposed outside of that directory.
I'm going to post a new version later today. Will cc you.
> --- a/arch/x86/include/asm/thread_info.h
> +++ b/arch/x86/include/asm/thread_info.h
> @@ -93,6 +93,7 @@ struct thread_info {
> #define TIF_NOTSC 16 /* TSC is not accessible in userland */
> #define TIF_IA32 17 /* IA32 compatibility process */
> #define TIF_SLD 18 /* Restore split lock detection on context switch */
> +#define TIF_SYSCALL_USER_DISPATCH 19 /* Redirect syscall for userspace handling */
There are two other things out there which compete about the last TIF
bits on x86, so we need to clean that up first.
> +static void trigger_sigsys(struct pt_regs *regs)
> +{
> + struct kernel_siginfo info;
> +
> + clear_siginfo(&info);
> + info.si_signo = SIGSYS;
> + info.si_code = SYS_USER_DISPATCH;
> + info.si_call_addr = (void __user *)KSTK_EIP(current);
> + info.si_errno = 0;
> + info.si_arch = syscall_get_arch(current);
> + info.si_syscall = syscall_get_nr(current, regs);
> +
> + force_sig_info(&info);
> +}
> +
> +int do_syscall_user_dispatch(struct pt_regs *regs)
> +{
> + struct syscall_user_dispatch *sd = ¤t->syscall_dispatch;
> + unsigned long ip = instruction_pointer(regs);
> + char state;
> +
> + if (likely(ip >= sd->dispatcher_start && ip <= sd->dispatcher_end))
> + return 0;
> +
> + if (likely(sd->selector)) {
> + if (unlikely(__get_user(state, sd->selector)))
__get_user() mandates an explicit access_ok() which happened in the
prctl(). So this wants a comment why there is none right here.
> + do_exit(SIGSEGV);
> +
> + if (likely(state == 0))
> + return 0;
> +
> + if (state != 1)
> + do_exit(SIGSEGV);
If that happens its going to be quite interesting to debug.
Also please use proper defines which are exposed to user space instead
of 0/1.
> + }
> +
> + syscall_rollback(current, regs);
> + trigger_sigsys(regs);
> +
> + return 1;
> +}
> +
> +int set_syscall_user_dispatch(int mode, unsigned long dispatcher_start,
> + unsigned long dispatcher_end, char __user *selector)
> +{
> + switch (mode) {
> + case PR_SYS_DISPATCH_OFF:
> + if (dispatcher_start || dispatcher_end || selector)
> + return -EINVAL;
> + break;
> + case PR_SYS_DISPATCH_ON:
> + /*
> + * Validate the direct dispatcher region just for basic
> + * sanity. If the user is able to submit a syscall from
> + * an address, that address is obviously valid.
> + */
> + if (dispatcher_end < dispatcher_start)
> + return -EINVAL;
> +
> + if (selector && !access_ok(selector, 1))
sizeof(*selector)
Thanks,
tglx
next prev parent reply other threads:[~2020-07-20 10:08 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-16 19:31 [PATCH v4 0/2] Syscall User Redirection Gabriel Krisman Bertazi
2020-07-16 19:31 ` [PATCH v4 1/2] kernel: Implement selective syscall userspace redirection Gabriel Krisman Bertazi
2020-07-16 21:06 ` Matthew Wilcox
2020-07-16 21:26 ` Kees Cook
2020-07-17 0:20 ` Andy Lutomirski
2020-07-17 2:15 ` Gabriel Krisman Bertazi
2020-07-17 4:48 ` Andy Lutomirski
2020-07-21 12:06 ` Mark Rutland
2020-07-20 9:23 ` Thomas Gleixner
2020-07-20 9:44 ` Will Deacon
2020-07-20 10:08 ` Thomas Gleixner [this message]
2020-07-20 13:46 ` Gabriel Krisman Bertazi
2020-07-16 19:31 ` [PATCH v4 2/2] selftests: Add kselftest for syscall user dispatch Gabriel Krisman Bertazi
2020-07-16 20:04 ` [PATCH v4 0/2] Syscall User Redirection Kees Cook
2020-07-16 20:22 ` Christian Brauner
2020-07-16 20:25 ` Kees Cook
2020-07-16 20:29 ` Christian Brauner
2020-07-16 20:30 ` Gabriel Krisman Bertazi
2020-07-16 21:06 ` Carlos O'Donell
2020-08-02 12:01 ` Pavel Machek
2020-08-04 14:26 ` Gabriel Krisman Bertazi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87v9iimrbk.fsf@nanos.tec.linutronix.de \
--to=tglx@linutronix.de \
--cc=gofmanp@gmail.com \
--cc=keescook@chromium.org \
--cc=kernel@collabora.com \
--cc=krisman@collabora.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=luto@kernel.org \
--cc=shuah@kernel.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).