All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Gabriel Krisman Bertazi <krisman@collabora.com>
Cc: linux-kernel@vger.kernel.org, kernel@collabora.com,
	willy@infradead.org, luto@kernel.org, gofmanp@gmail.com,
	keescook@chromium.org, linux-kselftest@vger.kernel.org,
	shuah@kernel.org, Gabriel Krisman Bertazi <krisman@collabora.com>
Subject: Re: [PATCH v4 1/2] kernel: Implement selective syscall userspace redirection
Date: Mon, 20 Jul 2020 12:08:47 +0200	[thread overview]
Message-ID: <87v9iimrbk.fsf@nanos.tec.linutronix.de> (raw)
In-Reply-To: <20200716193141.4068476-2-krisman@collabora.com>

Gabriel,

Gabriel Krisman Bertazi <krisman@collabora.com> writes:
> Introduce a mechanism to quickly disable/enable syscall handling for a
> specific process and redirect to userspace via SIGSYS.  This is useful
> for processes with parts that require syscall redirection and parts that
> don't, but who need to perform this boundary crossing really fast,
> without paying the cost of a system call to reconfigure syscall handling
> on each boundary transition.  This is particularly important for Windows
> games running over Wine.
>
> The proposed interface looks like this:
>
>   prctl(PR_SET_SYSCALL_USER_DISPATCH, <op>, <start_addr>, <end_addr>, [selector])
>
> The range [<start_addr>,<end_addr>] is a part of the process memory map
> that is allowed to by-pass the redirection code and dispatch syscalls
> directly, such that in fast paths a process doesn't need to disable the
> trap nor the kernel has to check the selector.  This is essential to
> return from SIGSYS to a blocked area without triggering another SIGSYS
> from rt_sigreturn.

Why isn't rt_sigreturn() exempt from that redirection in the first place?

> ---
>  arch/Kconfig                          | 20 ++++++
>  arch/x86/Kconfig                      |  1 +
>  arch/x86/entry/common.c               |  5 ++
>  arch/x86/include/asm/thread_info.h    |  4 +-
>  arch/x86/kernel/signal_compat.c       |  2 +-
>  fs/exec.c                             |  2 +
>  include/linux/sched.h                 |  3 +
>  include/linux/syscall_user_dispatch.h | 50 +++++++++++++++
>  include/uapi/asm-generic/siginfo.h    |  3 +-
>  include/uapi/linux/prctl.h            |  5 ++
>  kernel/Makefile                       |  1 +
>  kernel/fork.c                         |  1 +
>  kernel/sys.c                          |  5 ++
>  kernel/syscall_user_dispatch.c        | 92 +++++++++++++++++++++++++++

A big combo patch is not how we do that. Please split it up into the
core part and a patch enabling it for a particular architexture.

As I said in my reply to Andy, this wants to go on top of the generic
entry/exit work stuff:

  https://lore.kernel.org/r/20200716182208.180916541@linutronix.de

and then syscall_user_dispatch.c ends up in kernel/entry/ and the
dispatching function is not exposed outside of that directory.

I'm going to post a new version later today. Will cc you.

> --- a/arch/x86/include/asm/thread_info.h
> +++ b/arch/x86/include/asm/thread_info.h
> @@ -93,6 +93,7 @@ struct thread_info {
>  #define TIF_NOTSC		16	/* TSC is not accessible in userland */
>  #define TIF_IA32		17	/* IA32 compatibility process */
>  #define TIF_SLD			18	/* Restore split lock detection on context switch */
> +#define TIF_SYSCALL_USER_DISPATCH 19	/* Redirect syscall for userspace handling */

There are two other things out there which compete about the last TIF
bits on x86, so we need to clean that up first.

> +static void trigger_sigsys(struct pt_regs *regs)
> +{
> +	struct kernel_siginfo info;
> +
> +	clear_siginfo(&info);
> +	info.si_signo = SIGSYS;
> +	info.si_code = SYS_USER_DISPATCH;
> +	info.si_call_addr = (void __user *)KSTK_EIP(current);
> +	info.si_errno = 0;
> +	info.si_arch = syscall_get_arch(current);
> +	info.si_syscall = syscall_get_nr(current, regs);
> +
> +	force_sig_info(&info);
> +}
> +
> +int do_syscall_user_dispatch(struct pt_regs *regs)
> +{
> +	struct syscall_user_dispatch *sd = &current->syscall_dispatch;
> +	unsigned long ip = instruction_pointer(regs);
> +	char state;
> +
> +	if (likely(ip >= sd->dispatcher_start && ip <= sd->dispatcher_end))
> +		return 0;
> +
> +	if (likely(sd->selector)) {
> +		if (unlikely(__get_user(state, sd->selector)))

__get_user() mandates an explicit access_ok() which happened in the
prctl(). So this wants a comment why there is none right here.

> +			do_exit(SIGSEGV);
> +
> +		if (likely(state == 0))
> +			return 0;
> +
> +		if (state != 1)
> +			do_exit(SIGSEGV);

If that happens its going to be quite interesting to debug.

Also please use proper defines which are exposed to user space instead
of 0/1.

> +	}
> +
> +	syscall_rollback(current, regs);
> +	trigger_sigsys(regs);
> +
> +	return 1;
> +}
> +
> +int set_syscall_user_dispatch(int mode, unsigned long dispatcher_start,
> +			      unsigned long dispatcher_end, char __user *selector)
> +{
> +	switch (mode) {
> +	case PR_SYS_DISPATCH_OFF:
> +		if (dispatcher_start || dispatcher_end || selector)
> +			return -EINVAL;
> +		break;
> +	case PR_SYS_DISPATCH_ON:
> +		/*
> +		 * Validate the direct dispatcher region just for basic
> +		 * sanity.  If the user is able to submit a syscall from
> +		 * an address, that address is obviously valid.
> +		 */
> +		if (dispatcher_end < dispatcher_start)
> +			return -EINVAL;
> +
> +		if (selector && !access_ok(selector, 1))

  sizeof(*selector)

Thanks,

        tglx

  parent reply	other threads:[~2020-07-20 10:08 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-16 19:31 [PATCH v4 0/2] Syscall User Redirection Gabriel Krisman Bertazi
2020-07-16 19:31 ` [PATCH v4 1/2] kernel: Implement selective syscall userspace redirection Gabriel Krisman Bertazi
2020-07-16 21:06   ` Matthew Wilcox
2020-07-16 21:26     ` Kees Cook
2020-07-17  0:20   ` Andy Lutomirski
2020-07-17  2:15     ` Gabriel Krisman Bertazi
2020-07-17  4:48       ` Andy Lutomirski
2020-07-21 12:06         ` Mark Rutland
2020-07-20  9:23     ` Thomas Gleixner
2020-07-20  9:44       ` Will Deacon
2020-07-20 10:08   ` Thomas Gleixner [this message]
2020-07-20 13:46     ` Gabriel Krisman Bertazi
2020-07-16 19:31 ` [PATCH v4 2/2] selftests: Add kselftest for syscall user dispatch Gabriel Krisman Bertazi
2020-07-16 20:04 ` [PATCH v4 0/2] Syscall User Redirection Kees Cook
2020-07-16 20:22   ` Christian Brauner
2020-07-16 20:25     ` Kees Cook
2020-07-16 20:29       ` Christian Brauner
2020-07-16 20:30         ` Gabriel Krisman Bertazi
2020-07-16 21:06           ` Carlos O'Donell
2020-08-02 12:01 ` Pavel Machek
2020-08-04 14:26   ` Gabriel Krisman Bertazi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87v9iimrbk.fsf@nanos.tec.linutronix.de \
    --to=tglx@linutronix.de \
    --cc=gofmanp@gmail.com \
    --cc=keescook@chromium.org \
    --cc=kernel@collabora.com \
    --cc=krisman@collabora.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=shuah@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.