From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754840AbcFUAGQ (ORCPT ); Mon, 20 Jun 2016 20:06:16 -0400 Received: from mail.kernel.org ([198.145.29.136]:60628 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752494AbcFUAFa (ORCPT ); Mon, 20 Jun 2016 20:05:30 -0400 From: Andy Lutomirski To: x86@kernel.org, linux-kernel@vger.kernel.org Cc: Borislav Petkov , Andy Lutomirski , Pedro Alves , Oleg Nesterov , Kees Cook Subject: [PATCH v3 2/3] x86/signal: Rewire the restart_block() syscall to have a constant nr Date: Mon, 20 Jun 2016 16:39:53 -0700 Message-Id: <6347a4fe9e8c67d511e7b8a68e8ee5ffb02ca968.1466464928.git.luto@kernel.org> X-Mailer: git-send-email 2.5.5 In-Reply-To: References: In-Reply-To: References: Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Suppose a 64-bit task A traces a 32-bit task B. B makes a syscall that uses ERESTART_RESTARTBLOCK and gets a signal. A catches syscall exit, snapshots B's regs, changes the regs, and resumes. Then A restores the snapshot of B's regs. A expects B to resume where it left off when snapshotted, which means that B should execute sys_restart_block(). We have a big set of hacks in place that make this work in some cases, but in this particular case, the hacks fall apart. When A restores regs, because A is 64-bit, ptrace() will *not* set TS_I386_REGS_POKED. But, because B isn't stopped in a syscall, TS_COMPAT won't be set either. As a result, the current code will set nr=219 (the 64-bit __NR_restart_syscall) and then promptly invoke madvise() (the 32-bit syscall with nr==219). This patch fixes the bug and simplifies the code simultaneous by simply wiring up nr==380 to refer to sys_restart_block() in all cases. Cc: Pedro Alves Cc: Oleg Nesterov Cc: Kees Cook Signed-off-by: Andy Lutomirski --- arch/x86/entry/syscalls/syscall_32.tbl | 2 ++ arch/x86/entry/syscalls/syscall_64.tbl | 3 +++ arch/x86/kernel/signal.c | 34 ++++++++-------------------------- 3 files changed, 13 insertions(+), 26 deletions(-) diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl index 4cddd17153fb..9fc9272d0c41 100644 --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl @@ -386,3 +386,5 @@ 377 i386 copy_file_range sys_copy_file_range 378 i386 preadv2 sys_preadv2 compat_sys_preadv2 379 i386 pwritev2 sys_pwritev2 compat_sys_pwritev2 +# Same as restart_syscall but with the same nr for all ABIs. +380 i386 restart_syscall2 sys_restart_syscall diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index 555263e385c9..f074952eaad8 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -336,6 +336,9 @@ 327 64 preadv2 sys_preadv2 328 64 pwritev2 sys_pwritev2 +# Same as restart_syscall but with the same nr for all ABIs. +380 common restart_syscall2 sys_restart_syscall + # # x32-specific system call numbers start at 512 to avoid cache impact # for native 64-bit operation. diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c index 6b952e1d8db8..b9f5133867f3 100644 --- a/arch/x86/kernel/signal.c +++ b/arch/x86/kernel/signal.c @@ -761,35 +761,17 @@ handle_signal(struct ksignal *ksig, struct pt_regs *regs) static inline unsigned long get_nr_restart_syscall(const struct pt_regs *regs) { /* - * This function is fundamentally broken as currently - * implemented. - * - * The idea is that we want to trigger a call to the - * restart_block() syscall and that we want in_ia32_syscall(), - * in_x32_syscall(), etc. to match whatever they were in the - * syscall being restarted. We assume that the syscall - * instruction at (regs->ip - 2) matches whatever syscall - * instruction we used to enter in the first place. - * - * The problem is that we can get here when ptrace pokes - * syscall-like values into regs even if we're not in a syscall - * at all. - * - * For now, we maintain historical behavior and guess based on - * stored state. We could do better by saving the actual - * syscall arch in restart_block or (with caveats on x32) by - * checking if regs->ip points to 'int $0x80'. The current - * behavior is incorrect if a tracer has a different bitness - * than the tracee. + * We can't reliably distinguish between restarting an i386 + * syscall and an x86_64 syscall without decoding the + * instruction at regs->ip -= 2. Rather than trying, restart + * using __NR_restart_syscall2, which works regardless of ABI. + * We still want to preserve the x32 state, but we can do that + * directly. */ -#ifdef CONFIG_IA32_EMULATION - if (current_thread_info()->status & (TS_COMPAT|TS_I386_REGS_POKED)) - return __NR_ia32_restart_syscall; -#endif #ifdef CONFIG_X86_X32_ABI - return __NR_restart_syscall | (regs->orig_ax & __X32_SYSCALL_BIT); + return __NR_restart_syscall2 | (regs->orig_ax & __X32_SYSCALL_BIT); #else - return __NR_restart_syscall; + return __NR_restart_syscall2; #endif } -- 2.5.5