linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] arm64/ptrace: allow to get all registers on syscall traps
@ 2021-01-19 22:06 Andrei Vagin
  2021-01-19 22:06 ` [PATCH 1/3] arm64/ptrace: don't clobber task registers on syscall entry/exit traps Andrei Vagin
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Andrei Vagin @ 2021-01-19 22:06 UTC (permalink / raw)
  To: Will Deacon, Catalin Marinas
  Cc: Oleg Nesterov, linux-arm-kernel, linux-kernel, Andrei Vagin

Right now, ip/r12 for AArch32 and x7 for AArch64 is used to indicate
whether or not the stop has been signalled from syscall entry or syscall
exit. This means that:

- Any writes by the tracer to this register during the stop are
  ignored/discarded.

- The actual value of the register is not available during the stop,
  so the tracer cannot save it and restore it later.

This series introduces NT_ARM_PRSTATUS to get all registers and makes it
possible to change ip/r12 and x7 registers when tracee is stopped in
syscall traps.

For applications like the user-mode Linux or gVisor, it is critical to
have access to the full set of registers in any moment. For example,
they need to change values of all registers to emulate rt_sigreturn and
they need to have the full set of registers to build a signal frame.

Andrei Vagin (3):
  arm64/ptrace: don't clobber task registers on syscall entry/exit traps
  arm64/ptrace: introduce NT_ARM_PRSTATUS to get a full set of registers
  selftest/arm64/ptrace: add a test for NT_ARM_PRSTATUS

 arch/arm64/include/asm/ptrace.h               |   5 +
 arch/arm64/kernel/ptrace.c                    | 130 +++++++++++-----
 include/uapi/linux/elf.h                      |   1 +
 tools/testing/selftests/arm64/Makefile        |   2 +-
 tools/testing/selftests/arm64/ptrace/Makefile |   6 +
 .../arm64/ptrace/ptrace_syscall_regs_test.c   | 142 ++++++++++++++++++
 6 files changed, 246 insertions(+), 40 deletions(-)
 create mode 100644 tools/testing/selftests/arm64/ptrace/Makefile
 create mode 100644 tools/testing/selftests/arm64/ptrace/ptrace_syscall_regs_test.c

-- 
2.29.2


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/3] arm64/ptrace: don't clobber task registers on syscall entry/exit traps
  2021-01-19 22:06 [PATCH 0/3] arm64/ptrace: allow to get all registers on syscall traps Andrei Vagin
@ 2021-01-19 22:06 ` Andrei Vagin
  2021-01-27 15:14   ` Dave Martin
  2021-01-19 22:06 ` [PATCH 2/3] arm64/ptrace: introduce NT_ARM_PRSTATUS to get a full set of registers Andrei Vagin
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 8+ messages in thread
From: Andrei Vagin @ 2021-01-19 22:06 UTC (permalink / raw)
  To: Will Deacon, Catalin Marinas
  Cc: Oleg Nesterov, linux-arm-kernel, linux-kernel, Andrei Vagin

ip/r12 for AArch32 and x7 for AArch64 is used to indicate whether or not
the stop has been signalled from syscall entry or syscall exit. This
means that:

- Any writes by the tracer to this register during the stop are
  ignored/discarded.

- The actual value of the register is not available during the stop,
  so the tracer cannot save it and restore it later.

Right now, these registers are clobbered in tracehook_report_syscall.
This change moves this logic to gpr_get and compat_gpr_get where
registers are copied into a user-space buffer.

This will allow to change these registers and to introduce a new
NT_ARM_PRSTATUS command to get the full set of registers.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
---
 arch/arm64/include/asm/ptrace.h |   5 ++
 arch/arm64/kernel/ptrace.c      | 104 +++++++++++++++++++-------------
 2 files changed, 67 insertions(+), 42 deletions(-)

diff --git a/arch/arm64/include/asm/ptrace.h b/arch/arm64/include/asm/ptrace.h
index e58bca832dff..0a9552b4f61e 100644
--- a/arch/arm64/include/asm/ptrace.h
+++ b/arch/arm64/include/asm/ptrace.h
@@ -170,6 +170,11 @@ static inline unsigned long pstate_to_compat_psr(const unsigned long pstate)
 	return psr;
 }
 
+enum ptrace_syscall_dir {
+	PTRACE_SYSCALL_ENTER = 0,
+	PTRACE_SYSCALL_EXIT,
+};
+
 /*
  * This struct defines the way the registers are stored on the stack during an
  * exception. Note that sizeof(struct pt_regs) has to be a multiple of 16 (for
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index 8ac487c84e37..1863f080cb07 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -40,6 +40,7 @@
 #include <asm/syscall.h>
 #include <asm/traps.h>
 #include <asm/system_misc.h>
+#include <asm/ptrace.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/syscalls.h>
@@ -561,7 +562,33 @@ static int gpr_get(struct task_struct *target,
 		   struct membuf to)
 {
 	struct user_pt_regs *uregs = &task_pt_regs(target)->user_regs;
-	return membuf_write(&to, uregs, sizeof(*uregs));
+	unsigned long saved_reg;
+	int ret;
+
+	/*
+	 * We have some ABI weirdness here in the way that we handle syscall
+	 * exit stops because we indicate whether or not the stop has been
+	 * signalled from syscall entry or syscall exit by clobbering the general
+	 * purpose register x7.
+	 */
+	switch (target->ptrace_message) {
+	case PTRACE_EVENTMSG_SYSCALL_ENTRY:
+		saved_reg = uregs->regs[7];
+		uregs->regs[7] = PTRACE_SYSCALL_ENTER;
+		break;
+	case PTRACE_EVENTMSG_SYSCALL_EXIT:
+		saved_reg = uregs->regs[7];
+		uregs->regs[7] = PTRACE_SYSCALL_EXIT;
+		break;
+	}
+
+	ret =  membuf_write(&to, uregs, sizeof(*uregs));
+
+	if (target->ptrace_message == PTRACE_EVENTMSG_SYSCALL_ENTRY ||
+	    target->ptrace_message == PTRACE_EVENTMSG_SYSCALL_EXIT)
+		uregs->regs[7] = saved_reg;
+
+	return ret;
 }
 
 static int gpr_set(struct task_struct *target, const struct user_regset *regset,
@@ -1221,10 +1248,40 @@ static int compat_gpr_get(struct task_struct *target,
 			  const struct user_regset *regset,
 			  struct membuf to)
 {
+	compat_ulong_t r12;
+	bool overwrite_r12;
 	int i = 0;
 
-	while (to.left)
-		membuf_store(&to, compat_get_user_reg(target, i++));
+	/*
+	 * We have some ABI weirdness here in the way that we handle syscall
+	 * exit stops because we indicate whether or not the stop has been
+	 * signalled from syscall entry or syscall exit by clobbering the
+	 * general purpose register r12.
+	 */
+	switch (target->ptrace_message) {
+	case PTRACE_EVENTMSG_SYSCALL_ENTRY:
+		r12 = PTRACE_SYSCALL_ENTER;
+		overwrite_r12 = true;
+		break;
+	case PTRACE_EVENTMSG_SYSCALL_EXIT:
+		r12 = PTRACE_SYSCALL_EXIT;
+		overwrite_r12 = true;
+		break;
+	default:
+		overwrite_r12 = false;
+		break;
+	}
+
+	while (to.left) {
+		compat_ulong_t val;
+
+		if (!overwrite_r12 || i != 12)
+			val = compat_get_user_reg(target, i++);
+		else
+			val = r12;
+		membuf_store(&to, val);
+	}
+
 	return 0;
 }
 
@@ -1740,53 +1797,16 @@ long arch_ptrace(struct task_struct *child, long request,
 	return ptrace_request(child, request, addr, data);
 }
 
-enum ptrace_syscall_dir {
-	PTRACE_SYSCALL_ENTER = 0,
-	PTRACE_SYSCALL_EXIT,
-};
-
 static void tracehook_report_syscall(struct pt_regs *regs,
 				     enum ptrace_syscall_dir dir)
 {
-	int regno;
-	unsigned long saved_reg;
-
-	/*
-	 * We have some ABI weirdness here in the way that we handle syscall
-	 * exit stops because we indicate whether or not the stop has been
-	 * signalled from syscall entry or syscall exit by clobbering a general
-	 * purpose register (ip/r12 for AArch32, x7 for AArch64) in the tracee
-	 * and restoring its old value after the stop. This means that:
-	 *
-	 * - Any writes by the tracer to this register during the stop are
-	 *   ignored/discarded.
-	 *
-	 * - The actual value of the register is not available during the stop,
-	 *   so the tracer cannot save it and restore it later.
-	 *
-	 * - Syscall stops behave differently to seccomp and pseudo-step traps
-	 *   (the latter do not nobble any registers).
-	 */
-	regno = (is_compat_task() ? 12 : 7);
-	saved_reg = regs->regs[regno];
-	regs->regs[regno] = dir;
-
 	if (dir == PTRACE_SYSCALL_ENTER) {
 		if (tracehook_report_syscall_entry(regs))
 			forget_syscall(regs);
-		regs->regs[regno] = saved_reg;
-	} else if (!test_thread_flag(TIF_SINGLESTEP)) {
-		tracehook_report_syscall_exit(regs, 0);
-		regs->regs[regno] = saved_reg;
 	} else {
-		regs->regs[regno] = saved_reg;
+		int singlestep = test_thread_flag(TIF_SINGLESTEP);
 
-		/*
-		 * Signal a pseudo-step exception since we are stepping but
-		 * tracer modifications to the registers may have rewound the
-		 * state machine.
-		 */
-		tracehook_report_syscall_exit(regs, 1);
+		tracehook_report_syscall_exit(regs, singlestep);
 	}
 }
 
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/3] arm64/ptrace: introduce NT_ARM_PRSTATUS to get a full set of registers
  2021-01-19 22:06 [PATCH 0/3] arm64/ptrace: allow to get all registers on syscall traps Andrei Vagin
  2021-01-19 22:06 ` [PATCH 1/3] arm64/ptrace: don't clobber task registers on syscall entry/exit traps Andrei Vagin
@ 2021-01-19 22:06 ` Andrei Vagin
  2021-01-27 14:53   ` Dave Martin
  2021-01-19 22:06 ` [PATCH 3/3] selftest/arm64/ptrace: add a test for NT_ARM_PRSTATUS Andrei Vagin
  2021-01-27  8:10 ` [PATCH 0/3] arm64/ptrace: allow to get all registers on syscall traps Andrei Vagin
  3 siblings, 1 reply; 8+ messages in thread
From: Andrei Vagin @ 2021-01-19 22:06 UTC (permalink / raw)
  To: Will Deacon, Catalin Marinas
  Cc: Oleg Nesterov, linux-arm-kernel, linux-kernel, Andrei Vagin

This is an alternative to NT_PRSTATUS that clobbers ip/r12 on AArch32,
x7 on AArch64 when a tracee is stopped in syscall entry or syscall exit
traps.

Signed-off-by: Andrei Vagin <avagin@gmail.com>
---
 arch/arm64/kernel/ptrace.c | 39 ++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/elf.h   |  1 +
 2 files changed, 40 insertions(+)

diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index 1863f080cb07..b8e4c2ddf636 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -591,6 +591,15 @@ static int gpr_get(struct task_struct *target,
 	return ret;
 }
 
+static int gpr_get_full(struct task_struct *target,
+		   const struct user_regset *regset,
+		   struct membuf to)
+{
+	struct user_pt_regs *uregs = &task_pt_regs(target)->user_regs;
+
+	return membuf_write(&to, uregs, sizeof(*uregs));
+}
+
 static int gpr_set(struct task_struct *target, const struct user_regset *regset,
 		   unsigned int pos, unsigned int count,
 		   const void *kbuf, const void __user *ubuf)
@@ -1088,6 +1097,7 @@ static int tagged_addr_ctrl_set(struct task_struct *target, const struct
 
 enum aarch64_regset {
 	REGSET_GPR,
+	REGSET_GPR_FULL,
 	REGSET_FPR,
 	REGSET_TLS,
 #ifdef CONFIG_HAVE_HW_BREAKPOINT
@@ -1119,6 +1129,14 @@ static const struct user_regset aarch64_regsets[] = {
 		.regset_get = gpr_get,
 		.set = gpr_set
 	},
+	[REGSET_GPR_FULL] = {
+		.core_note_type = NT_ARM_PRSTATUS,
+		.n = sizeof(struct user_pt_regs) / sizeof(u64),
+		.size = sizeof(u64),
+		.align = sizeof(u64),
+		.regset_get = gpr_get_full,
+		.set = gpr_set
+	},
 	[REGSET_FPR] = {
 		.core_note_type = NT_PRFPREG,
 		.n = sizeof(struct user_fpsimd_state) / sizeof(u32),
@@ -1225,6 +1243,7 @@ static const struct user_regset_view user_aarch64_view = {
 #ifdef CONFIG_COMPAT
 enum compat_regset {
 	REGSET_COMPAT_GPR,
+	REGSET_COMPAT_GPR_FULL,
 	REGSET_COMPAT_VFP,
 };
 
@@ -1285,6 +1304,18 @@ static int compat_gpr_get(struct task_struct *target,
 	return 0;
 }
 
+/* compat_gpr_get_full doesn't  overwrite x12 like compat_gpr_get. */
+static int compat_gpr_get_full(struct task_struct *target,
+			  const struct user_regset *regset,
+			  struct membuf to)
+{
+	int i = 0;
+
+	while (to.left)
+		membuf_store(&to, compat_get_user_reg(target, i++));
+	return 0;
+}
+
 static int compat_gpr_set(struct task_struct *target,
 			  const struct user_regset *regset,
 			  unsigned int pos, unsigned int count,
@@ -1435,6 +1466,14 @@ static const struct user_regset aarch32_regsets[] = {
 		.regset_get = compat_gpr_get,
 		.set = compat_gpr_set
 	},
+	[REGSET_COMPAT_GPR_FULL] = {
+		.core_note_type = NT_ARM_PRSTATUS,
+		.n = COMPAT_ELF_NGREG,
+		.size = sizeof(compat_elf_greg_t),
+		.align = sizeof(compat_elf_greg_t),
+		.regset_get = compat_gpr_get_full,
+		.set = compat_gpr_set
+	},
 	[REGSET_COMPAT_VFP] = {
 		.core_note_type = NT_ARM_VFP,
 		.n = VFP_STATE_SIZE / sizeof(compat_ulong_t),
diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h
index 30f68b42eeb5..a2086d19263a 100644
--- a/include/uapi/linux/elf.h
+++ b/include/uapi/linux/elf.h
@@ -426,6 +426,7 @@ typedef struct elf64_shdr {
 #define NT_ARM_PACA_KEYS	0x407	/* ARM pointer authentication address keys */
 #define NT_ARM_PACG_KEYS	0x408	/* ARM pointer authentication generic key */
 #define NT_ARM_TAGGED_ADDR_CTRL	0x409	/* arm64 tagged address control (prctl()) */
+#define NT_ARM_PRSTATUS		0x410   /* ARM general-purpose registers */
 #define NT_ARC_V2	0x600		/* ARCv2 accumulator/extra registers */
 #define NT_VMCOREDD	0x700		/* Vmcore Device Dump Note */
 #define NT_MIPS_DSP	0x800		/* MIPS DSP ASE registers */
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/3] selftest/arm64/ptrace: add a test for NT_ARM_PRSTATUS
  2021-01-19 22:06 [PATCH 0/3] arm64/ptrace: allow to get all registers on syscall traps Andrei Vagin
  2021-01-19 22:06 ` [PATCH 1/3] arm64/ptrace: don't clobber task registers on syscall entry/exit traps Andrei Vagin
  2021-01-19 22:06 ` [PATCH 2/3] arm64/ptrace: introduce NT_ARM_PRSTATUS to get a full set of registers Andrei Vagin
@ 2021-01-19 22:06 ` Andrei Vagin
  2021-01-27  8:10 ` [PATCH 0/3] arm64/ptrace: allow to get all registers on syscall traps Andrei Vagin
  3 siblings, 0 replies; 8+ messages in thread
From: Andrei Vagin @ 2021-01-19 22:06 UTC (permalink / raw)
  To: Will Deacon, Catalin Marinas
  Cc: Oleg Nesterov, linux-arm-kernel, linux-kernel, Andrei Vagin

Test output:
 TAP version 13
 1..1
 # selftests: arm64/ptrace: ptrace_syscall_regs_test
 # 1..3
 # ok 1 NT_PRSTATUS: x7 0
 # ok 2 NT_ARM_PRSTATUS: x7 686920776f726c64
 # ok 3 The child exited with code 0.
 # # Totals: pass:3 fail:0 xfail:0 xpass:0 skip:0 error:0
 ok 1 selftests: arm64/ptrace: ptrace_syscall_regs_test

Signed-off-by: Andrei Vagin <avagin@gmail.com>
---
 tools/testing/selftests/arm64/Makefile        |   2 +-
 tools/testing/selftests/arm64/ptrace/Makefile |   6 +
 .../arm64/ptrace/ptrace_syscall_regs_test.c   | 147 ++++++++++++++++++
 3 files changed, 154 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/arm64/ptrace/Makefile
 create mode 100644 tools/testing/selftests/arm64/ptrace/ptrace_syscall_regs_test.c

diff --git a/tools/testing/selftests/arm64/Makefile b/tools/testing/selftests/arm64/Makefile
index 2c9d012797a7..704770a60ece 100644
--- a/tools/testing/selftests/arm64/Makefile
+++ b/tools/testing/selftests/arm64/Makefile
@@ -4,7 +4,7 @@
 ARCH ?= $(shell uname -m 2>/dev/null || echo not)
 
 ifneq (,$(filter $(ARCH),aarch64 arm64))
-ARM64_SUBTARGETS ?= tags signal pauth fp mte
+ARM64_SUBTARGETS ?= tags signal pauth fp mte ptrace
 else
 ARM64_SUBTARGETS :=
 endif
diff --git a/tools/testing/selftests/arm64/ptrace/Makefile b/tools/testing/selftests/arm64/ptrace/Makefile
new file mode 100644
index 000000000000..ca906ce3a581
--- /dev/null
+++ b/tools/testing/selftests/arm64/ptrace/Makefile
@@ -0,0 +1,6 @@
+# SPDX-License-Identifier: GPL-2.0
+
+CFLAGS += -g -I../../../../../usr/include/
+TEST_GEN_PROGS := ptrace_syscall_regs_test
+
+include ../../lib.mk
diff --git a/tools/testing/selftests/arm64/ptrace/ptrace_syscall_regs_test.c b/tools/testing/selftests/arm64/ptrace/ptrace_syscall_regs_test.c
new file mode 100644
index 000000000000..601378b7591d
--- /dev/null
+++ b/tools/testing/selftests/arm64/ptrace/ptrace_syscall_regs_test.c
@@ -0,0 +1,147 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <errno.h>
+#include <signal.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+
+#include <sys/types.h>
+#include <sys/ptrace.h>
+#include <sys/user.h>
+#include <sys/wait.h>
+#include <sys/uio.h>
+#include <linux/elf.h>
+#include <linux/unistd.h>
+
+#include "../../kselftest.h"
+
+#define TEST_VAL 0x686920776f726c64UL
+
+#define pr_p(func, fmt, ...)	func(fmt ": %m", ##__VA_ARGS__)
+
+#define pr_err(fmt, ...)						\
+	({								\
+		ksft_test_result_error(fmt "\n", ##__VA_ARGS__);		\
+		-1;							\
+	})
+
+#define pr_fail(fmt, ...)					\
+	({							\
+		ksft_test_result_fail(fmt "\n", ##__VA_ARGS__);	\
+		-1;						\
+	})
+
+#define pr_perror(fmt, ...)	pr_p(pr_err, fmt, ##__VA_ARGS__)
+
+static long loop(void *val)
+{
+	register long x0 __asm__("x0");
+	register void *x1 __asm__("x1") = val;
+	register long x8 __asm__("x8") = 555;
+
+	__asm__ (
+		"again:\n"
+		"ldr x7, [x1, 0]\n"
+		"svc 0\n"
+		"str x7, [x1, 0]\n"
+			   : "=r"(x0)
+			   : "r"(x1), "r"(x8)
+			   :
+	);
+	return 0;
+}
+
+static int child(void)
+{
+	long  val = TEST_VAL;
+
+	loop(&val);
+	if (val != ~TEST_VAL) {
+		ksft_print_msg("Unexpected x7: %lx\n", val);
+		return 1;
+	}
+
+	return 0;
+}
+
+#ifndef PTRACE_SYSEMU
+#define PTRACE_SYSEMU 31
+#endif
+
+#ifndef NT_ARM_PRSTATUS
+#define NT_ARM_PRSTATUS 0x410
+#endif
+
+int main(int argc, void **argv)
+{
+	struct user_regs_struct regs = {};
+	struct iovec iov = {
+		.iov_base = &regs,
+		.iov_len = sizeof(struct user_regs_struct),
+	};
+	int status;
+	pid_t pid;
+
+	ksft_set_plan(3);
+
+	pid = fork();
+	if (pid == 0) {
+		kill(getpid(), SIGSTOP);
+		child();
+		_exit(0);
+	}
+	if (pid < 0)
+		return 1;
+
+	if (ptrace(PTRACE_ATTACH, pid, 0, 0))
+		return pr_perror("Can't attach to the child %d", pid);
+	if (waitpid(pid, &status, 0) != pid)
+		return pr_perror("Can't wait for the child %d", pid);
+	/* skip SIGSTOP */
+	if (ptrace(PTRACE_CONT, pid, 0, 0))
+		return pr_perror("Can't resume the child %d", pid);
+	if (waitpid(pid, &status, 0) != pid)
+		return pr_perror("Can't wait for the child %d", pid);
+
+	/* Resume the child to the next system call. */
+	if (ptrace(PTRACE_SYSEMU, pid, 0, 0))
+		return pr_perror("Can't resume the child %d", pid);
+	if (waitpid(pid, &status, 0) != pid)
+		return pr_perror("Can't wait for the child %d", pid);
+	if (!WIFSTOPPED(status) || WSTOPSIG(status) != SIGTRAP)
+		return pr_err("Unexpected status: %d", status);
+
+	/* Check that x7 is zero in the case of NT_PRSTATUS. */
+	if (ptrace(PTRACE_GETREGSET, pid, NT_PRSTATUS, &iov))
+		return pr_perror("Can't get child registers");
+	if (regs.regs[7] != 0)
+		return pr_fail("NT_PRSTATUS: unexpected x7: %lx", regs.regs[7]);
+	ksft_test_result_pass("NT_PRSTATUS: x7 %llx\n", regs.regs[7]);
+
+	/* Check that x7 isnt't clobbered in the case of NT_ARM_PRSTATUS. */
+	if (ptrace(PTRACE_GETREGSET, pid, NT_ARM_PRSTATUS, &iov))
+		return pr_perror("Can't get child registers");
+	if (regs.regs[7] != TEST_VAL)
+		return pr_fail("NT_ARM_PRSTATUS: unexpected x7: %lx", regs.regs[7]);
+	ksft_test_result_pass("NT_ARM_PRSTATUS: x7 %llx\n", regs.regs[7]);
+
+	/* Check that the child will see a new value of x7. */
+	regs.regs[0] = 0;
+	regs.regs[7] = ~TEST_VAL;
+	if (ptrace(PTRACE_SETREGSET, pid, NT_PRSTATUS, &iov))
+		return pr_perror("Can't set child registers");
+
+	if (ptrace(PTRACE_CONT, pid, 0, 0))
+		return pr_perror("Can't resume the child %d", pid);
+	if (waitpid(pid, &status, 0) != pid)
+		return pr_perror("Can't wait for the child %d", pid);
+
+	if (status != 0)
+		return pr_fail("Child exited with code %d.", status);
+
+	ksft_test_result_pass("The child exited with code 0.\n");
+	ksft_exit_pass();
+	return 0;
+}
+
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/3] arm64/ptrace: allow to get all registers on syscall traps
  2021-01-19 22:06 [PATCH 0/3] arm64/ptrace: allow to get all registers on syscall traps Andrei Vagin
                   ` (2 preceding siblings ...)
  2021-01-19 22:06 ` [PATCH 3/3] selftest/arm64/ptrace: add a test for NT_ARM_PRSTATUS Andrei Vagin
@ 2021-01-27  8:10 ` Andrei Vagin
  3 siblings, 0 replies; 8+ messages in thread
From: Andrei Vagin @ 2021-01-27  8:10 UTC (permalink / raw)
  To: Will Deacon, Catalin Marinas, keno, dave.martin
  Cc: Oleg Nesterov, linux-arm-kernel, LKML, Andrei Vagin,
	Howard Zhang, Anthony Steinhauser

On Tue, Jan 19, 2021 at 2:08 PM Andrei Vagin <avagin@gmail.com> wrote:
>
> Right now, ip/r12 for AArch32 and x7 for AArch64 is used to indicate
> whether or not the stop has been signalled from syscall entry or syscall
> exit. This means that:
>
> - Any writes by the tracer to this register during the stop are
>   ignored/discarded.
>
> - The actual value of the register is not available during the stop,
>   so the tracer cannot save it and restore it later.
>
> This series introduces NT_ARM_PRSTATUS to get all registers and makes it
> possible to change ip/r12 and x7 registers when tracee is stopped in
> syscall traps.
>
> For applications like the user-mode Linux or gVisor, it is critical to
> have access to the full set of registers at any moment. For example,
> they need to change values of all registers to emulate rt_sigreturn and
> they need to have the full set of registers to build a signal frame.

I have found the thread [1] where Keno, Will, and Dave discussed the same
problem. If I understand this right, the problem was not fixed, because there
were no users who needed it.

gVisor is a general-purpose sandbox to run untrusted workloads. It has a
platform interface that is responsible for syscall interception, context
switching, and managing process address spaces. Right now, we have kvm and
ptrace platforms. The ptrace platform runs a guest code in the context of stub
processes and intercepts syscalls with help of PTRACE_SYSEMU. All system calls
are handled by the gVisor kernel including rt_sigreturn and execve. Signal
handling is happing inside the gVisor kernel too. Each stub process can have
more than one thread, but we don't bind guest threads to stub threads and we
can run more than one guest thread in the context of one stub thread. Taking
into account all these facts, we need to have access to all registers at any
moment when a stub thread has been stopped.

We were able to introduce the workaround [3] for this issue. Each time when a
stub process is stopped on a system call, we queue a fake signal and resume a
process to stop it on the signal. It works, but we need to do extra interaction
with a stub process what is expensive. My benchmarks show that this workaround
slows down syscalls in gVisor for more than 50%. BTW: it is one of the major
reasons why PTRACE_SYSEMU was introduced instead of emulating it via
two calls of PTRACE_SYSCALL.


[1] https://lore.kernel.org/lkml/CABV8kRz0mKSc=u1LeonQSLroKJLOKWOWktCoGji2nvEBc=e7=w@mail.gmail.com/#r
[2] https://github.com/google/gvisor/issues/5238
[3] https://github.com/google/gvisor/commit/a44efaab6d4b815880749a39647fb3ed9634a489

>
> Andrei Vagin (3):
>   arm64/ptrace: don't clobber task registers on syscall entry/exit traps
>   arm64/ptrace: introduce NT_ARM_PRSTATUS to get a full set of registers
>   selftest/arm64/ptrace: add a test for NT_ARM_PRSTATUS
>
>  arch/arm64/include/asm/ptrace.h               |   5 +
>  arch/arm64/kernel/ptrace.c                    | 130 +++++++++++-----
>  include/uapi/linux/elf.h                      |   1 +
>  tools/testing/selftests/arm64/Makefile        |   2 +-
>  tools/testing/selftests/arm64/ptrace/Makefile |   6 +
>  .../arm64/ptrace/ptrace_syscall_regs_test.c   | 142 ++++++++++++++++++
>  6 files changed, 246 insertions(+), 40 deletions(-)
>  create mode 100644 tools/testing/selftests/arm64/ptrace/Makefile
>  create mode 100644 tools/testing/selftests/arm64/ptrace/ptrace_syscall_regs_test.c
>
> --
> 2.29.2
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/3] arm64/ptrace: introduce NT_ARM_PRSTATUS to get a full set of registers
  2021-01-19 22:06 ` [PATCH 2/3] arm64/ptrace: introduce NT_ARM_PRSTATUS to get a full set of registers Andrei Vagin
@ 2021-01-27 14:53   ` Dave Martin
  2021-01-29  7:56     ` Andrei Vagin
  0 siblings, 1 reply; 8+ messages in thread
From: Dave Martin @ 2021-01-27 14:53 UTC (permalink / raw)
  To: Andrei Vagin
  Cc: Will Deacon, Catalin Marinas, Oleg Nesterov, linux-arm-kernel,
	linux-kernel

On Tue, Jan 19, 2021 at 02:06:36PM -0800, Andrei Vagin wrote:
> This is an alternative to NT_PRSTATUS that clobbers ip/r12 on AArch32,
> x7 on AArch64 when a tracee is stopped in syscall entry or syscall exit
> traps.
> 
> Signed-off-by: Andrei Vagin <avagin@gmail.com>

This approach looks like it works, though I still think adding an option
for this under PTRACE_SETOPTIONS would be less intrusive.

Adding a shadow regset like this also looks like it would cause the gp
regs to be pointlessly be dumped twice in a core dump.  Avoiding that
might require hacks in the core code...


> ---
>  arch/arm64/kernel/ptrace.c | 39 ++++++++++++++++++++++++++++++++++++++
>  include/uapi/linux/elf.h   |  1 +
>  2 files changed, 40 insertions(+)
> 
> diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
> index 1863f080cb07..b8e4c2ddf636 100644
> --- a/arch/arm64/kernel/ptrace.c
> +++ b/arch/arm64/kernel/ptrace.c
> @@ -591,6 +591,15 @@ static int gpr_get(struct task_struct *target,
>  	return ret;
>  }
>  
> +static int gpr_get_full(struct task_struct *target,
> +		   const struct user_regset *regset,
> +		   struct membuf to)
> +{
> +	struct user_pt_regs *uregs = &task_pt_regs(target)->user_regs;
> +
> +	return membuf_write(&to, uregs, sizeof(*uregs));
> +}
> +
>  static int gpr_set(struct task_struct *target, const struct user_regset *regset,
>  		   unsigned int pos, unsigned int count,
>  		   const void *kbuf, const void __user *ubuf)
> @@ -1088,6 +1097,7 @@ static int tagged_addr_ctrl_set(struct task_struct *target, const struct
>  
>  enum aarch64_regset {
>  	REGSET_GPR,
> +	REGSET_GPR_FULL,

If we go with this approach, "REGSET_GPR_RAW" might be a preferable
name.  Both regs represent all the regs ("full"), but REGSET_GPR is
mangled by the kernel.

>  	REGSET_FPR,
>  	REGSET_TLS,
>  #ifdef CONFIG_HAVE_HW_BREAKPOINT
> @@ -1119,6 +1129,14 @@ static const struct user_regset aarch64_regsets[] = {
>  		.regset_get = gpr_get,
>  		.set = gpr_set
>  	},
> +	[REGSET_GPR_FULL] = {
> +		.core_note_type = NT_ARM_PRSTATUS,

Similarly, something like NT_ARM_PRSTATUS_RAW or similar.

> +		.n = sizeof(struct user_pt_regs) / sizeof(u64),
> +		.size = sizeof(u64),
> +		.align = sizeof(u64),
> +		.regset_get = gpr_get_full,
> +		.set = gpr_set
> +	},
>  	[REGSET_FPR] = {
>  		.core_note_type = NT_PRFPREG,
>  		.n = sizeof(struct user_fpsimd_state) / sizeof(u32),
> @@ -1225,6 +1243,7 @@ static const struct user_regset_view user_aarch64_view = {
>  #ifdef CONFIG_COMPAT
>  enum compat_regset {
>  	REGSET_COMPAT_GPR,
> +	REGSET_COMPAT_GPR_FULL,
>  	REGSET_COMPAT_VFP,
>  };
>  
> @@ -1285,6 +1304,18 @@ static int compat_gpr_get(struct task_struct *target,
>  	return 0;
>  }
>  
> +/* compat_gpr_get_full doesn't  overwrite x12 like compat_gpr_get. */
> +static int compat_gpr_get_full(struct task_struct *target,
> +			  const struct user_regset *regset,
> +			  struct membuf to)
> +{
> +	int i = 0;
> +
> +	while (to.left)
> +		membuf_store(&to, compat_get_user_reg(target, i++));
> +	return 0;
> +}
> +
>  static int compat_gpr_set(struct task_struct *target,
>  			  const struct user_regset *regset,
>  			  unsigned int pos, unsigned int count,
> @@ -1435,6 +1466,14 @@ static const struct user_regset aarch32_regsets[] = {
>  		.regset_get = compat_gpr_get,
>  		.set = compat_gpr_set
>  	},
> +	[REGSET_COMPAT_GPR_FULL] = {
> +		.core_note_type = NT_ARM_PRSTATUS,
> +		.n = COMPAT_ELF_NGREG,
> +		.size = sizeof(compat_elf_greg_t),
> +		.align = sizeof(compat_elf_greg_t),
> +		.regset_get = compat_gpr_get_full,
> +		.set = compat_gpr_set
> +	},
>  	[REGSET_COMPAT_VFP] = {
>  		.core_note_type = NT_ARM_VFP,
>  		.n = VFP_STATE_SIZE / sizeof(compat_ulong_t),
> diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h
> index 30f68b42eeb5..a2086d19263a 100644
> --- a/include/uapi/linux/elf.h
> +++ b/include/uapi/linux/elf.h
> @@ -426,6 +426,7 @@ typedef struct elf64_shdr {
>  #define NT_ARM_PACA_KEYS	0x407	/* ARM pointer authentication address keys */
>  #define NT_ARM_PACG_KEYS	0x408	/* ARM pointer authentication generic key */
>  #define NT_ARM_TAGGED_ADDR_CTRL	0x409	/* arm64 tagged address control (prctl()) */

What happened to 0x40a..0x40f?

[...]

Cheers
---Dave

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/3] arm64/ptrace: don't clobber task registers on syscall entry/exit traps
  2021-01-19 22:06 ` [PATCH 1/3] arm64/ptrace: don't clobber task registers on syscall entry/exit traps Andrei Vagin
@ 2021-01-27 15:14   ` Dave Martin
  0 siblings, 0 replies; 8+ messages in thread
From: Dave Martin @ 2021-01-27 15:14 UTC (permalink / raw)
  To: Andrei Vagin
  Cc: Will Deacon, Catalin Marinas, Oleg Nesterov, linux-arm-kernel,
	linux-kernel

On Tue, Jan 19, 2021 at 02:06:35PM -0800, Andrei Vagin wrote:
> ip/r12 for AArch32 and x7 for AArch64 is used to indicate whether or not
> the stop has been signalled from syscall entry or syscall exit. This
> means that:
> 
> - Any writes by the tracer to this register during the stop are
>   ignored/discarded.
> 
> - The actual value of the register is not available during the stop,
>   so the tracer cannot save it and restore it later.
> 
> Right now, these registers are clobbered in tracehook_report_syscall.
> This change moves this logic to gpr_get and compat_gpr_get where
> registers are copied into a user-space buffer.
> 
> This will allow to change these registers and to introduce a new
> NT_ARM_PRSTATUS command to get the full set of registers.
> 
> Signed-off-by: Andrei Vagin <avagin@gmail.com>
> ---
>  arch/arm64/include/asm/ptrace.h |   5 ++
>  arch/arm64/kernel/ptrace.c      | 104 +++++++++++++++++++-------------
>  2 files changed, 67 insertions(+), 42 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/ptrace.h b/arch/arm64/include/asm/ptrace.h
> index e58bca832dff..0a9552b4f61e 100644
> --- a/arch/arm64/include/asm/ptrace.h
> +++ b/arch/arm64/include/asm/ptrace.h
> @@ -170,6 +170,11 @@ static inline unsigned long pstate_to_compat_psr(const unsigned long pstate)
>  	return psr;
>  }
>  
> +enum ptrace_syscall_dir {
> +	PTRACE_SYSCALL_ENTER = 0,
> +	PTRACE_SYSCALL_EXIT,
> +};
> +
>  /*
>   * This struct defines the way the registers are stored on the stack during an
>   * exception. Note that sizeof(struct pt_regs) has to be a multiple of 16 (for
> diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
> index 8ac487c84e37..1863f080cb07 100644
> --- a/arch/arm64/kernel/ptrace.c
> +++ b/arch/arm64/kernel/ptrace.c
> @@ -40,6 +40,7 @@
>  #include <asm/syscall.h>
>  #include <asm/traps.h>
>  #include <asm/system_misc.h>
> +#include <asm/ptrace.h>
>  
>  #define CREATE_TRACE_POINTS
>  #include <trace/events/syscalls.h>
> @@ -561,7 +562,33 @@ static int gpr_get(struct task_struct *target,
>  		   struct membuf to)
>  {
>  	struct user_pt_regs *uregs = &task_pt_regs(target)->user_regs;
> -	return membuf_write(&to, uregs, sizeof(*uregs));
> +	unsigned long saved_reg;
> +	int ret;
> +
> +	/*
> +	 * We have some ABI weirdness here in the way that we handle syscall
> +	 * exit stops because we indicate whether or not the stop has been
> +	 * signalled from syscall entry or syscall exit by clobbering the general
> +	 * purpose register x7.
> +	 */
> +	switch (target->ptrace_message) {
> +	case PTRACE_EVENTMSG_SYSCALL_ENTRY:
> +		saved_reg = uregs->regs[7];
> +		uregs->regs[7] = PTRACE_SYSCALL_ENTER;
> +		break;
> +	case PTRACE_EVENTMSG_SYSCALL_EXIT:
> +		saved_reg = uregs->regs[7];
> +		uregs->regs[7] = PTRACE_SYSCALL_EXIT;
> +		break;
> +	}
> +
> +	ret =  membuf_write(&to, uregs, sizeof(*uregs));
> +
> +	if (target->ptrace_message == PTRACE_EVENTMSG_SYSCALL_ENTRY ||
> +	    target->ptrace_message == PTRACE_EVENTMSG_SYSCALL_EXIT)
> +		uregs->regs[7] = saved_reg;

This might be a reasonable cleanup even if the extra regset isn't
introduced: it makes it clear that we're not changing the user registers
here, just the tracer's view of them.

I'm assuming it doesn't break tracing anywhere else.  I can't think of
anything it would break just now, but I haven't spent much time looking
into it.


Can you not just unconditionally back up and restore regs[7] here?  e.g.

	saved_reg = uregs->regs[7];

	switch (target->ptrace_message) {
	case PTRACE_EVENTMSG_SYSCALL_ENTRY:
	case PTRACE_EVENTMSG_SYSCALL_EXIT:
		uregs->regs[7] = target->ptrace_message;
	}

	ret = membuf_write(...);

	uregs->regs[7] = saved_reg;


> +
> +	return ret;
>  }
>  
>  static int gpr_set(struct task_struct *target, const struct user_regset *regset,
> @@ -1221,10 +1248,40 @@ static int compat_gpr_get(struct task_struct *target,
>  			  const struct user_regset *regset,
>  			  struct membuf to)
>  {
> +	compat_ulong_t r12;
> +	bool overwrite_r12;
>  	int i = 0;
>  
> -	while (to.left)
> -		membuf_store(&to, compat_get_user_reg(target, i++));
> +	/*
> +	 * We have some ABI weirdness here in the way that we handle syscall
> +	 * exit stops because we indicate whether or not the stop has been
> +	 * signalled from syscall entry or syscall exit by clobbering the
> +	 * general purpose register r12.
> +	 */
> +	switch (target->ptrace_message) {
> +	case PTRACE_EVENTMSG_SYSCALL_ENTRY:
> +		r12 = PTRACE_SYSCALL_ENTER;
> +		overwrite_r12 = true;
> +		break;
> +	case PTRACE_EVENTMSG_SYSCALL_EXIT:
> +		r12 = PTRACE_SYSCALL_EXIT;
> +		overwrite_r12 = true;
> +		break;
> +	default:
> +		overwrite_r12 = false;
> +		break;
> +	}
> +
> +	while (to.left) {
> +		compat_ulong_t val;
> +
> +		if (!overwrite_r12 || i != 12)
> +			val = compat_get_user_reg(target, i++);
> +		else
> +			val = r12;
> +		membuf_store(&to, val);
> +	}
> +

Can this be condensed too, say by introducing a wrapper for
compat_get_user_reg() that does the fudging on r12?

[...]

Cheers
---Dave

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/3] arm64/ptrace: introduce NT_ARM_PRSTATUS to get a full set of registers
  2021-01-27 14:53   ` Dave Martin
@ 2021-01-29  7:56     ` Andrei Vagin
  0 siblings, 0 replies; 8+ messages in thread
From: Andrei Vagin @ 2021-01-29  7:56 UTC (permalink / raw)
  To: Dave Martin
  Cc: Will Deacon, Catalin Marinas, Oleg Nesterov, linux-arm-kernel,
	linux-kernel

On Wed, Jan 27, 2021 at 02:53:07PM +0000, Dave Martin wrote:
> On Tue, Jan 19, 2021 at 02:06:36PM -0800, Andrei Vagin wrote:
> > This is an alternative to NT_PRSTATUS that clobbers ip/r12 on AArch32,
> > x7 on AArch64 when a tracee is stopped in syscall entry or syscall exit
> > traps.
> > 
> > Signed-off-by: Andrei Vagin <avagin@gmail.com>
> 
> This approach looks like it works, though I still think adding an option
> for this under PTRACE_SETOPTIONS would be less intrusive.

Dave, thank you for the feedback. I will prepare a patch with an option
and then we will see what looks better.

> 
> Adding a shadow regset like this also looks like it would cause the gp
> regs to be pointlessly be dumped twice in a core dump.  Avoiding that
> might require hacks in the core code...
> 
> 
> > ---
> >  arch/arm64/kernel/ptrace.c | 39 ++++++++++++++++++++++++++++++++++++++
> >  include/uapi/linux/elf.h   |  1 +
> >  2 files changed, 40 insertions(+)
> > 
> > diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
> > index 1863f080cb07..b8e4c2ddf636 100644
> > --- a/arch/arm64/kernel/ptrace.c
> > +++ b/arch/arm64/kernel/ptrace.c
> > @@ -591,6 +591,15 @@ static int gpr_get(struct task_struct *target,
> >  	return ret;
> >  }
> >  
> > +static int gpr_get_full(struct task_struct *target,
> > +		   const struct user_regset *regset,
> > +		   struct membuf to)
> > +{
> > +	struct user_pt_regs *uregs = &task_pt_regs(target)->user_regs;
> > +
> > +	return membuf_write(&to, uregs, sizeof(*uregs));
> > +}
> > +
> >  static int gpr_set(struct task_struct *target, const struct user_regset *regset,
> >  		   unsigned int pos, unsigned int count,
> >  		   const void *kbuf, const void __user *ubuf)
> > @@ -1088,6 +1097,7 @@ static int tagged_addr_ctrl_set(struct task_struct *target, const struct
> >  
> >  enum aarch64_regset {
> >  	REGSET_GPR,
> > +	REGSET_GPR_FULL,
> 
> If we go with this approach, "REGSET_GPR_RAW" might be a preferable
> name.  Both regs represent all the regs ("full"), but REGSET_GPR is
> mangled by the kernel.

I agree that REGSET_GPR_RAW looks better in this case.

> 
> >  	REGSET_FPR,
> >  	REGSET_TLS,
> >  #ifdef CONFIG_HAVE_HW_BREAKPOINT
> > @@ -1119,6 +1129,14 @@ static const struct user_regset aarch64_regsets[] = {
> >  		.regset_get = gpr_get,
> >  		.set = gpr_set
> >  	},
> > +	[REGSET_GPR_FULL] = {
> > +		.core_note_type = NT_ARM_PRSTATUS,

...

> > diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h
> > index 30f68b42eeb5..a2086d19263a 100644
> > --- a/include/uapi/linux/elf.h
> > +++ b/include/uapi/linux/elf.h
> > @@ -426,6 +426,7 @@ typedef struct elf64_shdr {
> >  #define NT_ARM_PACA_KEYS	0x407	/* ARM pointer authentication address keys */
> >  #define NT_ARM_PACG_KEYS	0x408	/* ARM pointer authentication generic key */
> >  #define NT_ARM_TAGGED_ADDR_CTRL	0x409	/* arm64 tagged address control (prctl()) */
> 
> What happened to 0x40a..0x40f?

shame on me :)

> 
> [...]
> 
> Cheers
> ---Dave

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-01-29  8:00 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-19 22:06 [PATCH 0/3] arm64/ptrace: allow to get all registers on syscall traps Andrei Vagin
2021-01-19 22:06 ` [PATCH 1/3] arm64/ptrace: don't clobber task registers on syscall entry/exit traps Andrei Vagin
2021-01-27 15:14   ` Dave Martin
2021-01-19 22:06 ` [PATCH 2/3] arm64/ptrace: introduce NT_ARM_PRSTATUS to get a full set of registers Andrei Vagin
2021-01-27 14:53   ` Dave Martin
2021-01-29  7:56     ` Andrei Vagin
2021-01-19 22:06 ` [PATCH 3/3] selftest/arm64/ptrace: add a test for NT_ARM_PRSTATUS Andrei Vagin
2021-01-27  8:10 ` [PATCH 0/3] arm64/ptrace: allow to get all registers on syscall traps Andrei Vagin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).