linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] seccomp: Add protection keys into seccomp_data
@ 2018-10-29 11:23 Michael Sammler
  2018-10-29 16:25 ` Kees Cook
  0 siblings, 1 reply; 16+ messages in thread
From: Michael Sammler @ 2018-10-29 11:23 UTC (permalink / raw)
  Cc: Will Drewry, Kees Cook, linux-api, Ram Pai, Andy Lutomirski,
	Michael Sammler, linuxppc-dev

Add the current value of an architecture specific protection keys
register (currently PKRU on x86) to data available for seccomp-bpf
programs to work on. This allows filters based on the currently
enabled protection keys.

Support for protection keys on the POWER architecture is not part of
this patch since I do not have access to a PowerPC, but adding support
for it can be achieved by setting sd->pkeys to the AMR register in
populate_seccomp_data.

One use case for this patch is disabling unnecessary system calls for a
library (e.g. network i/o for a crypto library) while the library runs
without disabling the system calls for the whole program (by changing
the protection keys before and after the library executes). Using this
one could ensure that the library behaves a expected (e.g. the crypto
library not sending private keys to a malicious server).

This patch also enables lightweight sandboxing of untrusted code using
memory protection keys: Protection keys provide memory isolation but
for a sandbox system call isolation is needed as well. This patch
allows writing a seccomp filter to prevent system calls by the
untrusted code while still allowing system calls for the trusted code.

An alternative design would be to extend (c)BPF with a new instruction
to read the state of a protection key. This alternate design would
provide a simpler interface to the user space since the BPF program
would not need to deal with the architecture specific pkeys field in
seccomp_data, but the question is, how much of an advantage this would
be as the nr field in seccomp_data is already architecture specific.
Adding a new instruction for BPF programs is more complicated than
this patch and might be a breaking change.

Results of selftests/seccomp_benchmark.c on a x86 machine with pkeys
support:

With patch:
Benchmarking 33554432 samples...
28.019505558 - 18.676858522 = 9342647036
getpid native: 278 ns
42.279109885 - 28.019657031 = 14259452854
getpid RET_ALLOW: 424 ns
Estimated seccomp overhead per syscall: 146 ns

Without patch:
Benchmarking 33554432 samples...
28.059619466 - 18.706769155 = 9352850311
getpid native: 278 ns
42.299228279 - 28.059761804 = 14239466475
getpid RET_ALLOW: 424 ns
Estimated seccomp overhead per syscall: 146 ns

Cc: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Will Drewry <wad@chromium.org>
Cc: Ram Pai <linuxram@us.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-api@vger.kernel.org
Signed-off-by: Michael Sammler <msammler@mpi-sws.org>
---
Changes to the previous version:
- added motivation, notes about POWER, alternative design and benchmark results to the commit log
- renamed pkru field in seccomp_data to pkeys
- changed size of pkru field to __u64 and removed reserved field
- added test for x86

 arch/mips/kernel/ptrace.c                     |   1 +
 arch/x86/entry/common.c                       |   1 +
 include/uapi/linux/seccomp.h                  |   3 +
 kernel/seccomp.c                              |   1 +
 tools/testing/selftests/seccomp/seccomp_bpf.c | 107 +++++++++++++++++++++++++-
 5 files changed, 112 insertions(+), 1 deletion(-)

diff --git a/arch/mips/kernel/ptrace.c b/arch/mips/kernel/ptrace.c
index e5ba56c0..a58dd04d 100644
--- a/arch/mips/kernel/ptrace.c
+++ b/arch/mips/kernel/ptrace.c
@@ -1277,6 +1277,7 @@ asmlinkage long syscall_trace_enter(struct pt_regs *regs, long syscall)
 		for (i = 0; i < 6; i++)
 			sd.args[i] = args[i];
 		sd.instruction_pointer = KSTK_EIP(current);
+		sd.pkeys = 0;

 		ret = __secure_computing(&sd);
 		if (ret == -1)
diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 3b2490b8..20c51bf2 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -98,6 +98,7 @@ static long syscall_trace_enter(struct pt_regs *regs)
 		sd.arch = arch;
 		sd.nr = regs->orig_ax;
 		sd.instruction_pointer = regs->ip;
+		sd.pkeys = read_pkru();
 #ifdef CONFIG_X86_64
 		if (arch == AUDIT_ARCH_X86_64) {
 			sd.args[0] = regs->di;
diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
index 9efc0e73..3aa2d934 100644
--- a/include/uapi/linux/seccomp.h
+++ b/include/uapi/linux/seccomp.h
@@ -52,12 +52,15 @@
  * @instruction_pointer: at the time of the system call.
  * @args: up to 6 system call arguments always stored as 64-bit values
  *        regardless of the architecture.
+ * @pkeys: value of an architecture specific protection keys register
+ *         (currently PKRU on x86)
  */
 struct seccomp_data {
 	int nr;
 	__u32 arch;
 	__u64 instruction_pointer;
 	__u64 args[6];
+	__u64 pkeys;
 };

 #endif /* _UAPI_LINUX_SECCOMP_H */
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index fd023ac2..dfb8b0d6 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -91,6 +91,7 @@ static void populate_seccomp_data(struct seccomp_data *sd)
 	sd->args[4] = args[4];
 	sd->args[5] = args[5];
 	sd->instruction_pointer = KSTK_EIP(task);
+	sd->pkeys = 0;
 }

 /**
diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
index e1473234..f7f8fa6f 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -82,6 +82,7 @@ struct seccomp_data {
 	__u32 arch;
 	__u64 instruction_pointer;
 	__u64 args[6];
+	__u64 pkeys;
 };
 #endif

@@ -732,7 +733,9 @@ TEST(KILL_process)
 TEST(arg_out_of_range)
 {
 	struct sock_filter filter[] = {
-		BPF_STMT(BPF_LD|BPF_W|BPF_ABS, syscall_arg(6)),
+		BPF_STMT(BPF_LD|BPF_W|BPF_ABS,
+			offsetof(struct seccomp_data, pkeys)
+				+ sizeof(__u64)),
 		BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW),
 	};
 	struct sock_fprog prog = {
@@ -2933,6 +2936,108 @@ skip:
 	ASSERT_EQ(0, kill(pid, SIGKILL));
 }

+#if defined(__i386__) || defined(__x86_64__)
+static inline void __cpuid(unsigned int *eax, unsigned int *ebx,
+		unsigned int *ecx, unsigned int *edx)
+{
+	/* ecx is often an input as well as an output. */
+	asm volatile(
+		"cpuid;"
+		: "=a" (*eax),
+		  "=b" (*ebx),
+		  "=c" (*ecx),
+		  "=d" (*edx)
+		: "0" (*eax), "2" (*ecx));
+}
+
+/* Intel-defined CPU features, CPUID level 0x00000007:0 (ecx) */
+#define X86_FEATURE_PKU        (1<<3) /* Protection Keys for Userspace */
+#define X86_FEATURE_OSPKE      (1<<4) /* OS Protection Keys Enable */
+
+static inline int cpu_has_pku(void)
+{
+	unsigned int eax;
+	unsigned int ebx;
+	unsigned int ecx;
+	unsigned int edx;
+
+	eax = 0x7;
+	ecx = 0x0;
+	__cpuid(&eax, &ebx, &ecx, &edx);
+
+	if (!(ecx & X86_FEATURE_PKU))
+		return 0;
+	if (!(ecx & X86_FEATURE_OSPKE))
+		return 0;
+	return 1;
+}
+
+static inline __u32 read_pkru(void)
+{
+	if (!cpu_has_pku())
+		return 0;
+
+	__u32 ecx = 0;
+	__u32 edx, pkru;
+
+	/*
+	 * "rdpkru" instruction.  Places PKRU contents in to EAX,
+	 * clears EDX and requires that ecx=0.
+	 */
+	asm volatile(".byte 0x0f,0x01,0xee\n\t"
+		     : "=a" (pkru), "=d" (edx)
+		     : "c" (ecx));
+	return pkru;
+}
+
+static inline void write_pkru(__u32 pkru)
+{
+	if (!cpu_has_pku())
+		return;
+
+	__u32 ecx = 0, edx = 0;
+
+	/*
+	 * "wrpkru" instruction.  Loads contents in EAX to PKRU,
+	 * requires that ecx = edx = 0.
+	 */
+	asm volatile(".byte 0x0f,0x01,0xef\n\t"
+		     : : "a" (pkru), "c"(ecx), "d"(edx));
+}
+
+#define TEST_PKRU 0x55555550
+
+TEST_SIGNAL(pkeys_set, SIGSYS)
+{
+	write_pkru(TEST_PKRU);
+	/* read back the written value because pkru might not be supported */
+	__u32 pkru = read_pkru();
+
+	struct sock_filter filter[] = {
+		BPF_STMT(BPF_LD|BPF_W|BPF_ABS,
+			offsetof(struct seccomp_data, pkeys)),
+		BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, pkru, 1, 0),
+		BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW),
+		BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_KILL),
+	};
+	struct sock_fprog prog = {
+		.len = (unsigned short)ARRAY_SIZE(filter),
+		.filter = filter,
+	};
+	long ret;
+
+	ret = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
+	ASSERT_EQ(0, ret);
+
+	ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog);
+	ASSERT_EQ(0, ret);
+
+	/* should never return. */
+	EXPECT_EQ(0, syscall(__NR_getpid));
+}
+#endif
+
+
 /*
  * TODO:
  * - add microbenchmarks
--
2.11.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH] seccomp: Add protection keys into seccomp_data
  2018-10-29 11:23 [RFC PATCH] seccomp: Add protection keys into seccomp_data Michael Sammler
@ 2018-10-29 16:25 ` Kees Cook
  2018-10-29 16:37   ` Dave Hansen
                     ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Kees Cook @ 2018-10-29 16:25 UTC (permalink / raw)
  To: Michael Sammler
  Cc: Will Drewry, Linux API, Dave Hansen, Ram Pai, Andy Lutomirski, PowerPC

On Mon, Oct 29, 2018 at 4:23 AM, Michael Sammler <msammler@mpi-sws.org> wrote:
> Add the current value of an architecture specific protection keys
> register (currently PKRU on x86) to data available for seccomp-bpf
> programs to work on. This allows filters based on the currently
> enabled protection keys.
>
> Support for protection keys on the POWER architecture is not part of
> this patch since I do not have access to a PowerPC, but adding support
> for it can be achieved by setting sd->pkeys to the AMR register in
> populate_seccomp_data.
>
> One use case for this patch is disabling unnecessary system calls for a
> library (e.g. network i/o for a crypto library) while the library runs
> without disabling the system calls for the whole program (by changing
> the protection keys before and after the library executes). Using this
> one could ensure that the library behaves a expected (e.g. the crypto
> library not sending private keys to a malicious server).
>
> This patch also enables lightweight sandboxing of untrusted code using
> memory protection keys: Protection keys provide memory isolation but
> for a sandbox system call isolation is needed as well. This patch
> allows writing a seccomp filter to prevent system calls by the
> untrusted code while still allowing system calls for the trusted code.

Isn't PKU instruction based? Couldn't a malicious library just change
the state of the MPK registers? This seems like an easy way to bypass
any filters that used PKU. I'm not convinced this is a meaningful
barrier that should be enforced by seccomp.

This can also be done with a signal handler with SECCOMP_RET_TRAP and
check instruction pointer vs PKU state.

-Kees

>
> An alternative design would be to extend (c)BPF with a new instruction
> to read the state of a protection key. This alternate design would
> provide a simpler interface to the user space since the BPF program
> would not need to deal with the architecture specific pkeys field in
> seccomp_data, but the question is, how much of an advantage this would
> be as the nr field in seccomp_data is already architecture specific.
> Adding a new instruction for BPF programs is more complicated than
> this patch and might be a breaking change.
>
> Results of selftests/seccomp_benchmark.c on a x86 machine with pkeys
> support:
>
> With patch:
> Benchmarking 33554432 samples...
> 28.019505558 - 18.676858522 = 9342647036
> getpid native: 278 ns
> 42.279109885 - 28.019657031 = 14259452854
> getpid RET_ALLOW: 424 ns
> Estimated seccomp overhead per syscall: 146 ns
>
> Without patch:
> Benchmarking 33554432 samples...
> 28.059619466 - 18.706769155 = 9352850311
> getpid native: 278 ns
> 42.299228279 - 28.059761804 = 14239466475
> getpid RET_ALLOW: 424 ns
> Estimated seccomp overhead per syscall: 146 ns
>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Will Drewry <wad@chromium.org>
> Cc: Ram Pai <linuxram@us.ibm.com>
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-api@vger.kernel.org
> Signed-off-by: Michael Sammler <msammler@mpi-sws.org>
> ---
> Changes to the previous version:
> - added motivation, notes about POWER, alternative design and benchmark results to the commit log
> - renamed pkru field in seccomp_data to pkeys
> - changed size of pkru field to __u64 and removed reserved field
> - added test for x86
>
>  arch/mips/kernel/ptrace.c                     |   1 +
>  arch/x86/entry/common.c                       |   1 +
>  include/uapi/linux/seccomp.h                  |   3 +
>  kernel/seccomp.c                              |   1 +
>  tools/testing/selftests/seccomp/seccomp_bpf.c | 107 +++++++++++++++++++++++++-
>  5 files changed, 112 insertions(+), 1 deletion(-)
>
> diff --git a/arch/mips/kernel/ptrace.c b/arch/mips/kernel/ptrace.c
> index e5ba56c0..a58dd04d 100644
> --- a/arch/mips/kernel/ptrace.c
> +++ b/arch/mips/kernel/ptrace.c
> @@ -1277,6 +1277,7 @@ asmlinkage long syscall_trace_enter(struct pt_regs *regs, long syscall)
>                 for (i = 0; i < 6; i++)
>                         sd.args[i] = args[i];
>                 sd.instruction_pointer = KSTK_EIP(current);
> +               sd.pkeys = 0;
>
>                 ret = __secure_computing(&sd);
>                 if (ret == -1)
> diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
> index 3b2490b8..20c51bf2 100644
> --- a/arch/x86/entry/common.c
> +++ b/arch/x86/entry/common.c
> @@ -98,6 +98,7 @@ static long syscall_trace_enter(struct pt_regs *regs)
>                 sd.arch = arch;
>                 sd.nr = regs->orig_ax;
>                 sd.instruction_pointer = regs->ip;
> +               sd.pkeys = read_pkru();
>  #ifdef CONFIG_X86_64
>                 if (arch == AUDIT_ARCH_X86_64) {
>                         sd.args[0] = regs->di;
> diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
> index 9efc0e73..3aa2d934 100644
> --- a/include/uapi/linux/seccomp.h
> +++ b/include/uapi/linux/seccomp.h
> @@ -52,12 +52,15 @@
>   * @instruction_pointer: at the time of the system call.
>   * @args: up to 6 system call arguments always stored as 64-bit values
>   *        regardless of the architecture.
> + * @pkeys: value of an architecture specific protection keys register
> + *         (currently PKRU on x86)
>   */
>  struct seccomp_data {
>         int nr;
>         __u32 arch;
>         __u64 instruction_pointer;
>         __u64 args[6];
> +       __u64 pkeys;
>  };
>
>  #endif /* _UAPI_LINUX_SECCOMP_H */
> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index fd023ac2..dfb8b0d6 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -91,6 +91,7 @@ static void populate_seccomp_data(struct seccomp_data *sd)
>         sd->args[4] = args[4];
>         sd->args[5] = args[5];
>         sd->instruction_pointer = KSTK_EIP(task);
> +       sd->pkeys = 0;
>  }
>
>  /**
> diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
> index e1473234..f7f8fa6f 100644
> --- a/tools/testing/selftests/seccomp/seccomp_bpf.c
> +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
> @@ -82,6 +82,7 @@ struct seccomp_data {
>         __u32 arch;
>         __u64 instruction_pointer;
>         __u64 args[6];
> +       __u64 pkeys;
>  };
>  #endif
>
> @@ -732,7 +733,9 @@ TEST(KILL_process)
>  TEST(arg_out_of_range)
>  {
>         struct sock_filter filter[] = {
> -               BPF_STMT(BPF_LD|BPF_W|BPF_ABS, syscall_arg(6)),
> +               BPF_STMT(BPF_LD|BPF_W|BPF_ABS,
> +                       offsetof(struct seccomp_data, pkeys)
> +                               + sizeof(__u64)),
>                 BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW),
>         };
>         struct sock_fprog prog = {
> @@ -2933,6 +2936,108 @@ skip:
>         ASSERT_EQ(0, kill(pid, SIGKILL));
>  }
>
> +#if defined(__i386__) || defined(__x86_64__)
> +static inline void __cpuid(unsigned int *eax, unsigned int *ebx,
> +               unsigned int *ecx, unsigned int *edx)
> +{
> +       /* ecx is often an input as well as an output. */
> +       asm volatile(
> +               "cpuid;"
> +               : "=a" (*eax),
> +                 "=b" (*ebx),
> +                 "=c" (*ecx),
> +                 "=d" (*edx)
> +               : "0" (*eax), "2" (*ecx));
> +}
> +
> +/* Intel-defined CPU features, CPUID level 0x00000007:0 (ecx) */
> +#define X86_FEATURE_PKU        (1<<3) /* Protection Keys for Userspace */
> +#define X86_FEATURE_OSPKE      (1<<4) /* OS Protection Keys Enable */
> +
> +static inline int cpu_has_pku(void)
> +{
> +       unsigned int eax;
> +       unsigned int ebx;
> +       unsigned int ecx;
> +       unsigned int edx;
> +
> +       eax = 0x7;
> +       ecx = 0x0;
> +       __cpuid(&eax, &ebx, &ecx, &edx);
> +
> +       if (!(ecx & X86_FEATURE_PKU))
> +               return 0;
> +       if (!(ecx & X86_FEATURE_OSPKE))
> +               return 0;
> +       return 1;
> +}
> +
> +static inline __u32 read_pkru(void)
> +{
> +       if (!cpu_has_pku())
> +               return 0;
> +
> +       __u32 ecx = 0;
> +       __u32 edx, pkru;
> +
> +       /*
> +        * "rdpkru" instruction.  Places PKRU contents in to EAX,
> +        * clears EDX and requires that ecx=0.
> +        */
> +       asm volatile(".byte 0x0f,0x01,0xee\n\t"
> +                    : "=a" (pkru), "=d" (edx)
> +                    : "c" (ecx));
> +       return pkru;
> +}
> +
> +static inline void write_pkru(__u32 pkru)
> +{
> +       if (!cpu_has_pku())
> +               return;
> +
> +       __u32 ecx = 0, edx = 0;
> +
> +       /*
> +        * "wrpkru" instruction.  Loads contents in EAX to PKRU,
> +        * requires that ecx = edx = 0.
> +        */
> +       asm volatile(".byte 0x0f,0x01,0xef\n\t"
> +                    : : "a" (pkru), "c"(ecx), "d"(edx));
> +}
> +
> +#define TEST_PKRU 0x55555550
> +
> +TEST_SIGNAL(pkeys_set, SIGSYS)
> +{
> +       write_pkru(TEST_PKRU);
> +       /* read back the written value because pkru might not be supported */
> +       __u32 pkru = read_pkru();
> +
> +       struct sock_filter filter[] = {
> +               BPF_STMT(BPF_LD|BPF_W|BPF_ABS,
> +                       offsetof(struct seccomp_data, pkeys)),
> +               BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, pkru, 1, 0),
> +               BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW),
> +               BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_KILL),
> +       };
> +       struct sock_fprog prog = {
> +               .len = (unsigned short)ARRAY_SIZE(filter),
> +               .filter = filter,
> +       };
> +       long ret;
> +
> +       ret = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
> +       ASSERT_EQ(0, ret);
> +
> +       ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog);
> +       ASSERT_EQ(0, ret);
> +
> +       /* should never return. */
> +       EXPECT_EQ(0, syscall(__NR_getpid));
> +}
> +#endif
> +
> +
>  /*
>   * TODO:
>   * - add microbenchmarks
> --
> 2.11.0



-- 
Kees Cook

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH] seccomp: Add protection keys into seccomp_data
  2018-10-29 16:25 ` Kees Cook
@ 2018-10-29 16:37   ` Dave Hansen
  2018-10-29 16:48     ` Jann Horn
  2018-10-29 16:42   ` Jann Horn
  2018-10-29 16:48   ` Ram Pai
  2 siblings, 1 reply; 16+ messages in thread
From: Dave Hansen @ 2018-10-29 16:37 UTC (permalink / raw)
  To: Kees Cook, Michael Sammler
  Cc: Will Drewry, Linux API, Dave Hansen, Ram Pai, Andy Lutomirski, PowerPC

On 10/29/18 9:25 AM, Kees Cook wrote:
> On Mon, Oct 29, 2018 at 4:23 AM, Michael Sammler <msammler@mpi-sws.org> wrote:
>> Add the current value of an architecture specific protection keys
>> register (currently PKRU on x86) to data available for seccomp-bpf
>> programs to work on. This allows filters based on the currently
>> enabled protection keys.

How does the current "assignment" of protection keys to the various uses
get communicated to the filter?

I'm not sure this is a great use for PKRU.  I *think* the basic problem
is that you want to communicate some rights information down into a
filter, and you want to communicate it with PKRU.  While it's handy to
have an extra register that nobody (generally) mucks with, I'm not quite
convinced that we want to repurpose it this way.

Also, I'm not sure the kernel provides the PKRU guarantees you want at
the moment.  Our implementation *probably* works, but it's mostly by
accident.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH] seccomp: Add protection keys into seccomp_data
  2018-10-29 16:25 ` Kees Cook
  2018-10-29 16:37   ` Dave Hansen
@ 2018-10-29 16:42   ` Jann Horn
  2018-10-29 16:48   ` Ram Pai
  2 siblings, 0 replies; 16+ messages in thread
From: Jann Horn @ 2018-10-29 16:42 UTC (permalink / raw)
  To: Kees Cook
  Cc: wad, Linux API, Dave Hansen, linuxram, Andy Lutomirski, msammler,
	linuxppc-dev

On Mon, Oct 29, 2018 at 5:25 PM Kees Cook <keescook@chromium.org> wrote:
> On Mon, Oct 29, 2018 at 4:23 AM, Michael Sammler <msammler@mpi-sws.org> wrote:
> > Add the current value of an architecture specific protection keys
> > register (currently PKRU on x86) to data available for seccomp-bpf
> > programs to work on. This allows filters based on the currently
> > enabled protection keys.
> >
> > Support for protection keys on the POWER architecture is not part of
> > this patch since I do not have access to a PowerPC, but adding support
> > for it can be achieved by setting sd->pkeys to the AMR register in
> > populate_seccomp_data.
> >
> > One use case for this patch is disabling unnecessary system calls for a
> > library (e.g. network i/o for a crypto library) while the library runs
> > without disabling the system calls for the whole program (by changing
> > the protection keys before and after the library executes). Using this
> > one could ensure that the library behaves a expected (e.g. the crypto
> > library not sending private keys to a malicious server).
> >
> > This patch also enables lightweight sandboxing of untrusted code using
> > memory protection keys: Protection keys provide memory isolation but
> > for a sandbox system call isolation is needed as well. This patch
> > allows writing a seccomp filter to prevent system calls by the
> > untrusted code while still allowing system calls for the trusted code.
>
> Isn't PKU instruction based? Couldn't a malicious library just change
> the state of the MPK registers? This seems like an easy way to bypass
> any filters that used PKU. I'm not convinced this is a meaningful
> barrier that should be enforced by seccomp.

The idea is that if you know that there are no unwanted WRPKRU
instructions in your code, and all WRPKRU instructions in your code
are directly followed by code that verifies correctness of the
protection key, then you can implement lightweight sandboxes inside a
single process. See https://arxiv.org/pdf/1801.06822.pdf for one
description of this approach.

> This can also be done with a signal handler with SECCOMP_RET_TRAP and
> check instruction pointer vs PKU state.

But then you can't prevent an attacker from just jumping directly to
your syscall instruction. And it's slow.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH] seccomp: Add protection keys into seccomp_data
  2018-10-29 16:25 ` Kees Cook
  2018-10-29 16:37   ` Dave Hansen
  2018-10-29 16:42   ` Jann Horn
@ 2018-10-29 16:48   ` Ram Pai
  2018-10-29 17:05     ` Michael Sammler
  2 siblings, 1 reply; 16+ messages in thread
From: Ram Pai @ 2018-10-29 16:48 UTC (permalink / raw)
  To: Kees Cook
  Cc: Will Drewry, Linux API, Dave Hansen, Andy Lutomirski,
	Michael Sammler, PowerPC

On Mon, Oct 29, 2018 at 09:25:15AM -0700, Kees Cook wrote:
> On Mon, Oct 29, 2018 at 4:23 AM, Michael Sammler <msammler@mpi-sws.org> wrote:
> > Add the current value of an architecture specific protection keys
> > register (currently PKRU on x86) to data available for seccomp-bpf
> > programs to work on. This allows filters based on the currently
> > enabled protection keys.
> >
> > Support for protection keys on the POWER architecture is not part of
> > this patch since I do not have access to a PowerPC, but adding support
> > for it can be achieved by setting sd->pkeys to the AMR register in
> > populate_seccomp_data.

Maybe you can use a generic read_pkey() function, and each arch can
provide their own implementation.  In the case of powerpc it is
mfspr(SPRN_AMR);

something like the code below might help?

diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h
index 92a9962..d79efe3 100644
--- a/arch/powerpc/include/asm/pkeys.h
+++ b/arch/powerpc/include/asm/pkeys.h
@@ -89,6 +89,11 @@ static inline u16 pte_to_pkey_bits(u64 pteflags)
 #define __mm_pkey_is_reserved(pkey) (reserved_allocation_mask & \
 				       pkey_alloc_mask(pkey))
 
+static inline u64 read_pkey()
+{
+        return mfspr(SPRN_AMR);
+}
+
 static inline bool mm_pkey_is_allocated(struct mm_struct *mm, int pkey)
 {
 	if (pkey < 0 || pkey >= arch_max_pkey())
diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h
index 19b137f..feea114 100644
--- a/arch/x86/include/asm/pkeys.h
+++ b/arch/x86/include/asm/pkeys.h
@@ -9,6 +9,11 @@
 extern int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
 		unsigned long init_val);
 
+static inline u64 read_pkey()
+{
+        return read_pkru();
+}
+
 static inline bool arch_pkeys_enabled(void)
 {
 	return boot_cpu_has(X86_FEATURE_OSPKE);
diff --git a/include/linux/pkeys.h b/include/linux/pkeys.h
index 2955ba97..f1538c7 100644
--- a/include/linux/pkeys.h
+++ b/include/linux/pkeys.h
@@ -13,6 +13,11 @@
 #define PKEY_DEDICATED_EXECUTE_ONLY 0
 #define ARCH_VM_PKEY_FLAGS 0
 
+static inline u64 read_pkey()
+{
+	return 0;
+}
+
 static inline int vma_pkey(struct vm_area_struct *vma)
 {
 	return 0;

RP

> >
> > One use case for this patch is disabling unnecessary system calls for a
> > library (e.g. network i/o for a crypto library) while the library runs
> > without disabling the system calls for the whole program (by changing
> > the protection keys before and after the library executes). Using this
> > one could ensure that the library behaves a expected (e.g. the crypto
> > library not sending private keys to a malicious server).
> >
> > This patch also enables lightweight sandboxing of untrusted code using
> > memory protection keys: Protection keys provide memory isolation but
> > for a sandbox system call isolation is needed as well. This patch
> > allows writing a seccomp filter to prevent system calls by the
> > untrusted code while still allowing system calls for the trusted code.
> 
> Isn't PKU instruction based? Couldn't a malicious library just change
> the state of the MPK registers? This seems like an easy way to bypass
> any filters that used PKU. I'm not convinced this is a meaningful
> barrier that should be enforced by seccomp.
> 
> This can also be done with a signal handler with SECCOMP_RET_TRAP and
> check instruction pointer vs PKU state.
> 
> -Kees
> 
> >
> > An alternative design would be to extend (c)BPF with a new instruction
> > to read the state of a protection key. This alternate design would
> > provide a simpler interface to the user space since the BPF program
> > would not need to deal with the architecture specific pkeys field in
> > seccomp_data, but the question is, how much of an advantage this would
> > be as the nr field in seccomp_data is already architecture specific.
> > Adding a new instruction for BPF programs is more complicated than
> > this patch and might be a breaking change.
> >
> > Results of selftests/seccomp_benchmark.c on a x86 machine with pkeys
> > support:
> >
> > With patch:
> > Benchmarking 33554432 samples...
> > 28.019505558 - 18.676858522 = 9342647036
> > getpid native: 278 ns
> > 42.279109885 - 28.019657031 = 14259452854
> > getpid RET_ALLOW: 424 ns
> > Estimated seccomp overhead per syscall: 146 ns
> >
> > Without patch:
> > Benchmarking 33554432 samples...
> > 28.059619466 - 18.706769155 = 9352850311
> > getpid native: 278 ns
> > 42.299228279 - 28.059761804 = 14239466475
> > getpid RET_ALLOW: 424 ns
> > Estimated seccomp overhead per syscall: 146 ns
> >
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Andy Lutomirski <luto@amacapital.net>
> > Cc: Will Drewry <wad@chromium.org>
> > Cc: Ram Pai <linuxram@us.ibm.com>
> > Cc: linuxppc-dev@lists.ozlabs.org
> > Cc: linux-api@vger.kernel.org
> > Signed-off-by: Michael Sammler <msammler@mpi-sws.org>
> > ---
> > Changes to the previous version:
> > - added motivation, notes about POWER, alternative design and benchmark results to the commit log
> > - renamed pkru field in seccomp_data to pkeys
> > - changed size of pkru field to __u64 and removed reserved field
> > - added test for x86
> >
> >  arch/mips/kernel/ptrace.c                     |   1 +
> >  arch/x86/entry/common.c                       |   1 +
> >  include/uapi/linux/seccomp.h                  |   3 +
> >  kernel/seccomp.c                              |   1 +
> >  tools/testing/selftests/seccomp/seccomp_bpf.c | 107 +++++++++++++++++++++++++-
> >  5 files changed, 112 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/mips/kernel/ptrace.c b/arch/mips/kernel/ptrace.c
> > index e5ba56c0..a58dd04d 100644
> > --- a/arch/mips/kernel/ptrace.c
> > +++ b/arch/mips/kernel/ptrace.c
> > @@ -1277,6 +1277,7 @@ asmlinkage long syscall_trace_enter(struct pt_regs *regs, long syscall)
> >                 for (i = 0; i < 6; i++)
> >                         sd.args[i] = args[i];
> >                 sd.instruction_pointer = KSTK_EIP(current);
> > +               sd.pkeys = 0;
> >
> >                 ret = __secure_computing(&sd);
> >                 if (ret == -1)
> > diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
> > index 3b2490b8..20c51bf2 100644
> > --- a/arch/x86/entry/common.c
> > +++ b/arch/x86/entry/common.c
> > @@ -98,6 +98,7 @@ static long syscall_trace_enter(struct pt_regs *regs)
> >                 sd.arch = arch;
> >                 sd.nr = regs->orig_ax;
> >                 sd.instruction_pointer = regs->ip;
> > +               sd.pkeys = read_pkru();
> >  #ifdef CONFIG_X86_64
> >                 if (arch == AUDIT_ARCH_X86_64) {
> >                         sd.args[0] = regs->di;
> > diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
> > index 9efc0e73..3aa2d934 100644
> > --- a/include/uapi/linux/seccomp.h
> > +++ b/include/uapi/linux/seccomp.h
> > @@ -52,12 +52,15 @@
> >   * @instruction_pointer: at the time of the system call.
> >   * @args: up to 6 system call arguments always stored as 64-bit values
> >   *        regardless of the architecture.
> > + * @pkeys: value of an architecture specific protection keys register
> > + *         (currently PKRU on x86)
> >   */
> >  struct seccomp_data {
> >         int nr;
> >         __u32 arch;
> >         __u64 instruction_pointer;
> >         __u64 args[6];
> > +       __u64 pkeys;
> >  };
> >
> >  #endif /* _UAPI_LINUX_SECCOMP_H */
> > diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> > index fd023ac2..dfb8b0d6 100644
> > --- a/kernel/seccomp.c
> > +++ b/kernel/seccomp.c
> > @@ -91,6 +91,7 @@ static void populate_seccomp_data(struct seccomp_data *sd)
> >         sd->args[4] = args[4];
> >         sd->args[5] = args[5];
> >         sd->instruction_pointer = KSTK_EIP(task);
> > +       sd->pkeys = 0;
> >  }
> >
> >  /**
> > diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
> > index e1473234..f7f8fa6f 100644
> > --- a/tools/testing/selftests/seccomp/seccomp_bpf.c
> > +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
> > @@ -82,6 +82,7 @@ struct seccomp_data {
> >         __u32 arch;
> >         __u64 instruction_pointer;
> >         __u64 args[6];
> > +       __u64 pkeys;
> >  };
> >  #endif
> >
> > @@ -732,7 +733,9 @@ TEST(KILL_process)
> >  TEST(arg_out_of_range)
> >  {
> >         struct sock_filter filter[] = {
> > -               BPF_STMT(BPF_LD|BPF_W|BPF_ABS, syscall_arg(6)),
> > +               BPF_STMT(BPF_LD|BPF_W|BPF_ABS,
> > +                       offsetof(struct seccomp_data, pkeys)
> > +                               + sizeof(__u64)),
> >                 BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW),
> >         };
> >         struct sock_fprog prog = {
> > @@ -2933,6 +2936,108 @@ skip:
> >         ASSERT_EQ(0, kill(pid, SIGKILL));
> >  }
> >
> > +#if defined(__i386__) || defined(__x86_64__)
> > +static inline void __cpuid(unsigned int *eax, unsigned int *ebx,
> > +               unsigned int *ecx, unsigned int *edx)
> > +{
> > +       /* ecx is often an input as well as an output. */
> > +       asm volatile(
> > +               "cpuid;"
> > +               : "=a" (*eax),
> > +                 "=b" (*ebx),
> > +                 "=c" (*ecx),
> > +                 "=d" (*edx)
> > +               : "0" (*eax), "2" (*ecx));
> > +}
> > +
> > +/* Intel-defined CPU features, CPUID level 0x00000007:0 (ecx) */
> > +#define X86_FEATURE_PKU        (1<<3) /* Protection Keys for Userspace */
> > +#define X86_FEATURE_OSPKE      (1<<4) /* OS Protection Keys Enable */
> > +
> > +static inline int cpu_has_pku(void)
> > +{
> > +       unsigned int eax;
> > +       unsigned int ebx;
> > +       unsigned int ecx;
> > +       unsigned int edx;
> > +
> > +       eax = 0x7;
> > +       ecx = 0x0;
> > +       __cpuid(&eax, &ebx, &ecx, &edx);
> > +
> > +       if (!(ecx & X86_FEATURE_PKU))
> > +               return 0;
> > +       if (!(ecx & X86_FEATURE_OSPKE))
> > +               return 0;
> > +       return 1;
> > +}
> > +
> > +static inline __u32 read_pkru(void)
> > +{
> > +       if (!cpu_has_pku())
> > +               return 0;
> > +
> > +       __u32 ecx = 0;
> > +       __u32 edx, pkru;
> > +
> > +       /*
> > +        * "rdpkru" instruction.  Places PKRU contents in to EAX,
> > +        * clears EDX and requires that ecx=0.
> > +        */
> > +       asm volatile(".byte 0x0f,0x01,0xee\n\t"
> > +                    : "=a" (pkru), "=d" (edx)
> > +                    : "c" (ecx));
> > +       return pkru;
> > +}
> > +
> > +static inline void write_pkru(__u32 pkru)
> > +{
> > +       if (!cpu_has_pku())
> > +               return;
> > +
> > +       __u32 ecx = 0, edx = 0;
> > +
> > +       /*
> > +        * "wrpkru" instruction.  Loads contents in EAX to PKRU,
> > +        * requires that ecx = edx = 0.
> > +        */
> > +       asm volatile(".byte 0x0f,0x01,0xef\n\t"
> > +                    : : "a" (pkru), "c"(ecx), "d"(edx));
> > +}
> > +
> > +#define TEST_PKRU 0x55555550
> > +
> > +TEST_SIGNAL(pkeys_set, SIGSYS)
> > +{
> > +       write_pkru(TEST_PKRU);
> > +       /* read back the written value because pkru might not be supported */
> > +       __u32 pkru = read_pkru();
> > +
> > +       struct sock_filter filter[] = {
> > +               BPF_STMT(BPF_LD|BPF_W|BPF_ABS,
> > +                       offsetof(struct seccomp_data, pkeys)),
> > +               BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, pkru, 1, 0),
> > +               BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW),
> > +               BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_KILL),
> > +       };
> > +       struct sock_fprog prog = {
> > +               .len = (unsigned short)ARRAY_SIZE(filter),
> > +               .filter = filter,
> > +       };
> > +       long ret;
> > +
> > +       ret = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
> > +       ASSERT_EQ(0, ret);
> > +
> > +       ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog);
> > +       ASSERT_EQ(0, ret);
> > +
> > +       /* should never return. */
> > +       EXPECT_EQ(0, syscall(__NR_getpid));
> > +}
> > +#endif
> > +
> > +
> >  /*
> >   * TODO:
> >   * - add microbenchmarks
> > --
> > 2.11.0
> 
> 
> 
> -- 
> Kees Cook

-- 
Ram Pai

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH] seccomp: Add protection keys into seccomp_data
  2018-10-29 16:37   ` Dave Hansen
@ 2018-10-29 16:48     ` Jann Horn
  2018-10-29 17:02       ` Michael Sammler
  2018-10-29 17:29       ` Dave Hansen
  0 siblings, 2 replies; 16+ messages in thread
From: Jann Horn @ 2018-10-29 16:48 UTC (permalink / raw)
  To: Dave Hansen
  Cc: wad, Kees Cook, Linux API, Dave Hansen, linuxram,
	Andy Lutomirski, msammler, linuxppc-dev

On Mon, Oct 29, 2018 at 5:37 PM Dave Hansen <dave.hansen@intel.com> wrote:
> On 10/29/18 9:25 AM, Kees Cook wrote:
> > On Mon, Oct 29, 2018 at 4:23 AM, Michael Sammler <msammler@mpi-sws.org> wrote:
> >> Add the current value of an architecture specific protection keys
> >> register (currently PKRU on x86) to data available for seccomp-bpf
> >> programs to work on. This allows filters based on the currently
> >> enabled protection keys.
>
> How does the current "assignment" of protection keys to the various uses
> get communicated to the filter?

I assume that you first allocate your protection keys, then install the filter?

> I'm not sure this is a great use for PKRU.  I *think* the basic problem
> is that you want to communicate some rights information down into a
> filter, and you want to communicate it with PKRU.  While it's handy to
> have an extra register that nobody (generally) mucks with, I'm not quite
> convinced that we want to repurpose it this way.

That's not how I understand it; I believe that the context is probably
https://arxiv.org/pdf/1801.06822.pdf ?
My understanding is that PKRU is used for lightweight in-process
sandboxing, and to extend this sandbox protection to the syscall
interface, it is necessary to expose PKRU state to seccomp filters.
In other words, this isn't using PKRU exclusively for passing rights
into a filter, but it has to use PKRU anyway.

> Also, I'm not sure the kernel provides the PKRU guarantees you want at
> the moment.  Our implementation *probably* works, but it's mostly by
> accident.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH] seccomp: Add protection keys into seccomp_data
  2018-10-29 16:48     ` Jann Horn
@ 2018-10-29 17:02       ` Michael Sammler
  2018-10-29 17:07         ` Dave Hansen
  2018-10-29 17:29       ` Dave Hansen
  1 sibling, 1 reply; 16+ messages in thread
From: Michael Sammler @ 2018-10-29 17:02 UTC (permalink / raw)
  To: Jann Horn, Dave Hansen
  Cc: wad, Kees Cook, Linux API, Dave Hansen, linuxram,
	Andy Lutomirski, linuxppc-dev

On 10/29/2018 05:48 PM, Jann Horn wrote:
> On Mon, Oct 29, 2018 at 5:37 PM Dave Hansen <dave.hansen@intel.com> wrote:
>> On 10/29/18 9:25 AM, Kees Cook wrote:
>>> On Mon, Oct 29, 2018 at 4:23 AM, Michael Sammler <msammler@mpi-sws.org> wrote:
>>>> Add the current value of an architecture specific protection keys
>>>> register (currently PKRU on x86) to data available for seccomp-bpf
>>>> programs to work on. This allows filters based on the currently
>>>> enabled protection keys.
>> How does the current "assignment" of protection keys to the various uses
>> get communicated to the filter?
> I assume that you first allocate your protection keys, then install the filter?
>
Yes, but I agree that it should probably be documented, that the filter 
should only look at the parts of the PKRU, which belong to pkeys the 
user space program allocated (if the kernel wants to use some parts of 
the PKRU for its own purposes).
>> I'm not sure this is a great use for PKRU.  I *think* the basic problem
>> is that you want to communicate some rights information down into a
>> filter, and you want to communicate it with PKRU.  While it's handy to
>> have an extra register that nobody (generally) mucks with, I'm not quite
>> convinced that we want to repurpose it this way.
> That's not how I understand it; I believe that the context is probably
> https://arxiv.org/pdf/1801.06822.pdf ?
> My understanding is that PKRU is used for lightweight in-process
> sandboxing, and to extend this sandbox protection to the syscall
> interface, it is necessary to expose PKRU state to seccomp filters.
> In other words, this isn't using PKRU exclusively for passing rights
> into a filter, but it has to use PKRU anyway.
Yes, https://arxiv.org/pdf/1801.06822.pdf is indeed the context and what 
you say is correct.
>> Also, I'm not sure the kernel provides the PKRU guarantees you want at
>> the moment.  Our implementation *probably* works, but it's mostly by
>> accident.
I don't know, which guarantees about the PKRU are provided at the 
moment, but the only guarantee needed for this patch is, that the kernel 
does not change the bits of the PKRU register, which belong to pkeys 
allocated by the user program, between the syscall entry and the call to 
secure_computing(). Is there are use case where the kernel would like to 
modify these bits of the PKRU?

-- MIchael

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH] seccomp: Add protection keys into seccomp_data
  2018-10-29 16:48   ` Ram Pai
@ 2018-10-29 17:05     ` Michael Sammler
  0 siblings, 0 replies; 16+ messages in thread
From: Michael Sammler @ 2018-10-29 17:05 UTC (permalink / raw)
  To: Ram Pai, Kees Cook
  Cc: PowerPC, Linux API, Will Drewry, Dave Hansen, Andy Lutomirski

On 10/29/2018 05:48 PM, Ram Pai wrote:

> On Mon, Oct 29, 2018 at 09:25:15AM -0700, Kees Cook wrote:
>> On Mon, Oct 29, 2018 at 4:23 AM, Michael Sammler <msammler@mpi-sws.org> wrote:
>>> Add the current value of an architecture specific protection keys
>>> register (currently PKRU on x86) to data available for seccomp-bpf
>>> programs to work on. This allows filters based on the currently
>>> enabled protection keys.
>>>
>>> Support for protection keys on the POWER architecture is not part of
>>> this patch since I do not have access to a PowerPC, but adding support
>>> for it can be achieved by setting sd->pkeys to the AMR register in
>>> populate_seccomp_data.
> Maybe you can use a generic read_pkey() function, and each arch can
> provide their own implementation.  In the case of powerpc it is
> mfspr(SPRN_AMR);
>
> something like the code below might help?
>
> diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h
> index 92a9962..d79efe3 100644
> --- a/arch/powerpc/include/asm/pkeys.h
> +++ b/arch/powerpc/include/asm/pkeys.h
> @@ -89,6 +89,11 @@ static inline u16 pte_to_pkey_bits(u64 pteflags)
>   #define __mm_pkey_is_reserved(pkey) (reserved_allocation_mask & \
>   				       pkey_alloc_mask(pkey))
>   
> +static inline u64 read_pkey()
> +{
> +        return mfspr(SPRN_AMR);
> +}
> +
>   static inline bool mm_pkey_is_allocated(struct mm_struct *mm, int pkey)
>   {
>   	if (pkey < 0 || pkey >= arch_max_pkey())
> diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h
> index 19b137f..feea114 100644
> --- a/arch/x86/include/asm/pkeys.h
> +++ b/arch/x86/include/asm/pkeys.h
> @@ -9,6 +9,11 @@
>   extern int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
>   		unsigned long init_val);
>   
> +static inline u64 read_pkey()
> +{
> +        return read_pkru();
> +}
> +
>   static inline bool arch_pkeys_enabled(void)
>   {
>   	return boot_cpu_has(X86_FEATURE_OSPKE);
> diff --git a/include/linux/pkeys.h b/include/linux/pkeys.h
> index 2955ba97..f1538c7 100644
> --- a/include/linux/pkeys.h
> +++ b/include/linux/pkeys.h
> @@ -13,6 +13,11 @@
>   #define PKEY_DEDICATED_EXECUTE_ONLY 0
>   #define ARCH_VM_PKEY_FLAGS 0
>   
> +static inline u64 read_pkey()
> +{
> +	return 0;
> +}
> +
>   static inline int vma_pkey(struct vm_area_struct *vma)
>   {
>   	return 0;
>
> RP
Thank you very much, this helps indeed. I will add this to the next 
version of this patch.

>>> One use case for this patch is disabling unnecessary system calls for a
>>> library (e.g. network i/o for a crypto library) while the library runs
>>> without disabling the system calls for the whole program (by changing
>>> the protection keys before and after the library executes). Using this
>>> one could ensure that the library behaves a expected (e.g. the crypto
>>> library not sending private keys to a malicious server).
>>>
>>> This patch also enables lightweight sandboxing of untrusted code using
>>> memory protection keys: Protection keys provide memory isolation but
>>> for a sandbox system call isolation is needed as well. This patch
>>> allows writing a seccomp filter to prevent system calls by the
>>> untrusted code while still allowing system calls for the trusted code.
>> Isn't PKU instruction based? Couldn't a malicious library just change
>> the state of the MPK registers? This seems like an easy way to bypass
>> any filters that used PKU. I'm not convinced this is a meaningful
>> barrier that should be enforced by seccomp.
>>
>> This can also be done with a signal handler with SECCOMP_RET_TRAP and
>> check instruction pointer vs PKU state.
>>
>> -Kees
>>
>>> An alternative design would be to extend (c)BPF with a new instruction
>>> to read the state of a protection key. This alternate design would
>>> provide a simpler interface to the user space since the BPF program
>>> would not need to deal with the architecture specific pkeys field in
>>> seccomp_data, but the question is, how much of an advantage this would
>>> be as the nr field in seccomp_data is already architecture specific.
>>> Adding a new instruction for BPF programs is more complicated than
>>> this patch and might be a breaking change.
>>>
>>> Results of selftests/seccomp_benchmark.c on a x86 machine with pkeys
>>> support:
>>>
>>> With patch:
>>> Benchmarking 33554432 samples...
>>> 28.019505558 - 18.676858522 = 9342647036
>>> getpid native: 278 ns
>>> 42.279109885 - 28.019657031 = 14259452854
>>> getpid RET_ALLOW: 424 ns
>>> Estimated seccomp overhead per syscall: 146 ns
>>>
>>> Without patch:
>>> Benchmarking 33554432 samples...
>>> 28.059619466 - 18.706769155 = 9352850311
>>> getpid native: 278 ns
>>> 42.299228279 - 28.059761804 = 14239466475
>>> getpid RET_ALLOW: 424 ns
>>> Estimated seccomp overhead per syscall: 146 ns
>>>
>>> Cc: Kees Cook <keescook@chromium.org>
>>> Cc: Andy Lutomirski <luto@amacapital.net>
>>> Cc: Will Drewry <wad@chromium.org>
>>> Cc: Ram Pai <linuxram@us.ibm.com>
>>> Cc: linuxppc-dev@lists.ozlabs.org
>>> Cc: linux-api@vger.kernel.org
>>> Signed-off-by: Michael Sammler <msammler@mpi-sws.org>
>>> ---
>>> Changes to the previous version:
>>> - added motivation, notes about POWER, alternative design and benchmark results to the commit log
>>> - renamed pkru field in seccomp_data to pkeys
>>> - changed size of pkru field to __u64 and removed reserved field
>>> - added test for x86
>>>
>>>   arch/mips/kernel/ptrace.c                     |   1 +
>>>   arch/x86/entry/common.c                       |   1 +
>>>   include/uapi/linux/seccomp.h                  |   3 +
>>>   kernel/seccomp.c                              |   1 +
>>>   tools/testing/selftests/seccomp/seccomp_bpf.c | 107 +++++++++++++++++++++++++-
>>>   5 files changed, 112 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/mips/kernel/ptrace.c b/arch/mips/kernel/ptrace.c
>>> index e5ba56c0..a58dd04d 100644
>>> --- a/arch/mips/kernel/ptrace.c
>>> +++ b/arch/mips/kernel/ptrace.c
>>> @@ -1277,6 +1277,7 @@ asmlinkage long syscall_trace_enter(struct pt_regs *regs, long syscall)
>>>                  for (i = 0; i < 6; i++)
>>>                          sd.args[i] = args[i];
>>>                  sd.instruction_pointer = KSTK_EIP(current);
>>> +               sd.pkeys = 0;
>>>
>>>                  ret = __secure_computing(&sd);
>>>                  if (ret == -1)
>>> diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
>>> index 3b2490b8..20c51bf2 100644
>>> --- a/arch/x86/entry/common.c
>>> +++ b/arch/x86/entry/common.c
>>> @@ -98,6 +98,7 @@ static long syscall_trace_enter(struct pt_regs *regs)
>>>                  sd.arch = arch;
>>>                  sd.nr = regs->orig_ax;
>>>                  sd.instruction_pointer = regs->ip;
>>> +               sd.pkeys = read_pkru();
>>>   #ifdef CONFIG_X86_64
>>>                  if (arch == AUDIT_ARCH_X86_64) {
>>>                          sd.args[0] = regs->di;
>>> diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
>>> index 9efc0e73..3aa2d934 100644
>>> --- a/include/uapi/linux/seccomp.h
>>> +++ b/include/uapi/linux/seccomp.h
>>> @@ -52,12 +52,15 @@
>>>    * @instruction_pointer: at the time of the system call.
>>>    * @args: up to 6 system call arguments always stored as 64-bit values
>>>    *        regardless of the architecture.
>>> + * @pkeys: value of an architecture specific protection keys register
>>> + *         (currently PKRU on x86)
>>>    */
>>>   struct seccomp_data {
>>>          int nr;
>>>          __u32 arch;
>>>          __u64 instruction_pointer;
>>>          __u64 args[6];
>>> +       __u64 pkeys;
>>>   };
>>>
>>>   #endif /* _UAPI_LINUX_SECCOMP_H */
>>> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
>>> index fd023ac2..dfb8b0d6 100644
>>> --- a/kernel/seccomp.c
>>> +++ b/kernel/seccomp.c
>>> @@ -91,6 +91,7 @@ static void populate_seccomp_data(struct seccomp_data *sd)
>>>          sd->args[4] = args[4];
>>>          sd->args[5] = args[5];
>>>          sd->instruction_pointer = KSTK_EIP(task);
>>> +       sd->pkeys = 0;
>>>   }
>>>
>>>   /**
>>> diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
>>> index e1473234..f7f8fa6f 100644
>>> --- a/tools/testing/selftests/seccomp/seccomp_bpf.c
>>> +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
>>> @@ -82,6 +82,7 @@ struct seccomp_data {
>>>          __u32 arch;
>>>          __u64 instruction_pointer;
>>>          __u64 args[6];
>>> +       __u64 pkeys;
>>>   };
>>>   #endif
>>>
>>> @@ -732,7 +733,9 @@ TEST(KILL_process)
>>>   TEST(arg_out_of_range)
>>>   {
>>>          struct sock_filter filter[] = {
>>> -               BPF_STMT(BPF_LD|BPF_W|BPF_ABS, syscall_arg(6)),
>>> +               BPF_STMT(BPF_LD|BPF_W|BPF_ABS,
>>> +                       offsetof(struct seccomp_data, pkeys)
>>> +                               + sizeof(__u64)),
>>>                  BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW),
>>>          };
>>>          struct sock_fprog prog = {
>>> @@ -2933,6 +2936,108 @@ skip:
>>>          ASSERT_EQ(0, kill(pid, SIGKILL));
>>>   }
>>>
>>> +#if defined(__i386__) || defined(__x86_64__)
>>> +static inline void __cpuid(unsigned int *eax, unsigned int *ebx,
>>> +               unsigned int *ecx, unsigned int *edx)
>>> +{
>>> +       /* ecx is often an input as well as an output. */
>>> +       asm volatile(
>>> +               "cpuid;"
>>> +               : "=a" (*eax),
>>> +                 "=b" (*ebx),
>>> +                 "=c" (*ecx),
>>> +                 "=d" (*edx)
>>> +               : "0" (*eax), "2" (*ecx));
>>> +}
>>> +
>>> +/* Intel-defined CPU features, CPUID level 0x00000007:0 (ecx) */
>>> +#define X86_FEATURE_PKU        (1<<3) /* Protection Keys for Userspace */
>>> +#define X86_FEATURE_OSPKE      (1<<4) /* OS Protection Keys Enable */
>>> +
>>> +static inline int cpu_has_pku(void)
>>> +{
>>> +       unsigned int eax;
>>> +       unsigned int ebx;
>>> +       unsigned int ecx;
>>> +       unsigned int edx;
>>> +
>>> +       eax = 0x7;
>>> +       ecx = 0x0;
>>> +       __cpuid(&eax, &ebx, &ecx, &edx);
>>> +
>>> +       if (!(ecx & X86_FEATURE_PKU))
>>> +               return 0;
>>> +       if (!(ecx & X86_FEATURE_OSPKE))
>>> +               return 0;
>>> +       return 1;
>>> +}
>>> +
>>> +static inline __u32 read_pkru(void)
>>> +{
>>> +       if (!cpu_has_pku())
>>> +               return 0;
>>> +
>>> +       __u32 ecx = 0;
>>> +       __u32 edx, pkru;
>>> +
>>> +       /*
>>> +        * "rdpkru" instruction.  Places PKRU contents in to EAX,
>>> +        * clears EDX and requires that ecx=0.
>>> +        */
>>> +       asm volatile(".byte 0x0f,0x01,0xee\n\t"
>>> +                    : "=a" (pkru), "=d" (edx)
>>> +                    : "c" (ecx));
>>> +       return pkru;
>>> +}
>>> +
>>> +static inline void write_pkru(__u32 pkru)
>>> +{
>>> +       if (!cpu_has_pku())
>>> +               return;
>>> +
>>> +       __u32 ecx = 0, edx = 0;
>>> +
>>> +       /*
>>> +        * "wrpkru" instruction.  Loads contents in EAX to PKRU,
>>> +        * requires that ecx = edx = 0.
>>> +        */
>>> +       asm volatile(".byte 0x0f,0x01,0xef\n\t"
>>> +                    : : "a" (pkru), "c"(ecx), "d"(edx));
>>> +}
>>> +
>>> +#define TEST_PKRU 0x55555550
>>> +
>>> +TEST_SIGNAL(pkeys_set, SIGSYS)
>>> +{
>>> +       write_pkru(TEST_PKRU);
>>> +       /* read back the written value because pkru might not be supported */
>>> +       __u32 pkru = read_pkru();
>>> +
>>> +       struct sock_filter filter[] = {
>>> +               BPF_STMT(BPF_LD|BPF_W|BPF_ABS,
>>> +                       offsetof(struct seccomp_data, pkeys)),
>>> +               BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, pkru, 1, 0),
>>> +               BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW),
>>> +               BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_KILL),
>>> +       };
>>> +       struct sock_fprog prog = {
>>> +               .len = (unsigned short)ARRAY_SIZE(filter),
>>> +               .filter = filter,
>>> +       };
>>> +       long ret;
>>> +
>>> +       ret = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
>>> +       ASSERT_EQ(0, ret);
>>> +
>>> +       ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog);
>>> +       ASSERT_EQ(0, ret);
>>> +
>>> +       /* should never return. */
>>> +       EXPECT_EQ(0, syscall(__NR_getpid));
>>> +}
>>> +#endif
>>> +
>>> +
>>>   /*
>>>    * TODO:
>>>    * - add microbenchmarks
>>> --
>>> 2.11.0
>>
>>
>> -- 
>> Kees Cook

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH] seccomp: Add protection keys into seccomp_data
  2018-10-29 17:02       ` Michael Sammler
@ 2018-10-29 17:07         ` Dave Hansen
  0 siblings, 0 replies; 16+ messages in thread
From: Dave Hansen @ 2018-10-29 17:07 UTC (permalink / raw)
  To: Michael Sammler, Jann Horn
  Cc: wad, Kees Cook, Rik van Riel, Linux API, Dave Hansen, linuxram,
	Andy Lutomirski, linuxppc-dev

On 10/29/18 10:02 AM, Michael Sammler wrote:
>>> Also, I'm not sure the kernel provides the PKRU guarantees you want at
>>> the moment.  Our implementation *probably* works, but it's mostly by
>>> accident.
> I don't know, which guarantees about the PKRU are provided at the
> moment, but the only guarantee needed for this patch is, that the kernel
> does not change the bits of the PKRU register, which belong to pkeys
> allocated by the user program, between the syscall entry and the call to
> secure_computing(). Is there are use case where the kernel would like to
> modify these bits of the PKRU?

We've been talking about doing more lax save/restore of the XSAVE
content (PKRU is part of this content).  We would, for instance, only
restore it when returning to userspace, but PKRU might not be up-to-date
with the value in current->fpu.

It's not a deal-breaker with your approach, it's just something to be
careful of and make sure PKRU is up-to-date before you go use it.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH] seccomp: Add protection keys into seccomp_data
  2018-10-29 16:48     ` Jann Horn
  2018-10-29 17:02       ` Michael Sammler
@ 2018-10-29 17:29       ` Dave Hansen
  2018-10-29 21:55         ` Michael Sammler
  1 sibling, 1 reply; 16+ messages in thread
From: Dave Hansen @ 2018-10-29 17:29 UTC (permalink / raw)
  To: Jann Horn
  Cc: wad, Kees Cook, Linux API, Dave Hansen, linuxram,
	Andy Lutomirski, msammler, linuxppc-dev

On 10/29/18 9:48 AM, Jann Horn wrote:
> On Mon, Oct 29, 2018 at 5:37 PM Dave Hansen <dave.hansen@intel.com> wrote:
>> I'm not sure this is a great use for PKRU.  I *think* the basic problem
>> is that you want to communicate some rights information down into a
>> filter, and you want to communicate it with PKRU.  While it's handy to
>> have an extra register that nobody (generally) mucks with, I'm not quite
>> convinced that we want to repurpose it this way.
> 
> That's not how I understand it; I believe that the context is probably
> https://arxiv.org/pdf/1801.06822.pdf ?
> My understanding is that PKRU is used for lightweight in-process
> sandboxing, and to extend this sandbox protection to the syscall
> interface, it is necessary to expose PKRU state to seccomp filters.
> In other words, this isn't using PKRU exclusively for passing rights
> into a filter, but it has to use PKRU anyway.

PKRU gives information about rights to various bits of application data.
 From that, a seccomp filter can infer the context, and thus the ability
for the code to call a given syscall at a certain point in time.

This makes PKRU an opt-in part of the syscall ABI, which is pretty
interesting.  We _could_ do the same kind of thing with any callee-saved
general purpose register, but PKRU is particularly attractive because
there is only one instruction that writes to it (well, outside of
XSAVE*), and random library code is very unlikely at this point to be
using it.

PKRU getting reset on signals, and the requirement now that it *can't*
be changed if you make syscalls probably needs to get thought about very
carefully before we do this, though.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH] seccomp: Add protection keys into seccomp_data
  2018-10-29 17:29       ` Dave Hansen
@ 2018-10-29 21:55         ` Michael Sammler
  2018-10-29 22:33           ` Dave Hansen
  0 siblings, 1 reply; 16+ messages in thread
From: Michael Sammler @ 2018-10-29 21:55 UTC (permalink / raw)
  To: Dave Hansen, Jann Horn
  Cc: wad, Kees Cook, Linux API, Dave Hansen, linuxram,
	Andy Lutomirski, linuxppc-dev

Am 29.10.2018 um 18:29 schrieb Dave Hansen:

> On 10/29/18 9:48 AM, Jann Horn wrote:
>> On Mon, Oct 29, 2018 at 5:37 PM Dave Hansen <dave.hansen@intel.com> wrote:
>>> I'm not sure this is a great use for PKRU.  I *think* the basic problem
>>> is that you want to communicate some rights information down into a
>>> filter, and you want to communicate it with PKRU.  While it's handy to
>>> have an extra register that nobody (generally) mucks with, I'm not quite
>>> convinced that we want to repurpose it this way.
>> That's not how I understand it; I believe that the context is probably
>> https://arxiv.org/pdf/1801.06822.pdf ?
>> My understanding is that PKRU is used for lightweight in-process
>> sandboxing, and to extend this sandbox protection to the syscall
>> interface, it is necessary to expose PKRU state to seccomp filters.
>> In other words, this isn't using PKRU exclusively for passing rights
>> into a filter, but it has to use PKRU anyway.
> PKRU gives information about rights to various bits of application data.
>   From that, a seccomp filter can infer the context, and thus the ability
> for the code to call a given syscall at a certain point in time.
>
> This makes PKRU an opt-in part of the syscall ABI, which is pretty
> interesting.  We _could_ do the same kind of thing with any callee-saved
> general purpose register, but PKRU is particularly attractive because
> there is only one instruction that writes to it (well, outside of
> XSAVE*), and random library code is very unlikely at this point to be
> using it.
I agree with you on the points, why PKRU is particularly attractive but 
I think the most important point is that PKRU is _not_ a general purpose 
register, but is already used to control access to some resource 
(memory). This patch would allow to also control access to another 
resource (system calls) using the PKRU. This is why it makes sense to 
use the PKRU in this patch instead of another callee-saved register.
> PKRU getting reset on signals, and the requirement now that it *can't*
> be changed if you make syscalls probably needs to get thought about very
> carefully before we do this, though.
I am not sure, whether I follow you. Are you saying, that PKRU is 
currently not guaranteed to be preserved across system calls?
This would make it very hard to use protection keys if libc does not 
save and restore the PKRU before/after systemcalls (and I am not aware 
of this).

Or do you mean, that the kernel might want to use the PKRU register for 
its own purposes while it is executing?
Then the solution you proposed in another email in this thread would 
work: instead of providing the seccomp filter with the current value of 
the PKRU (which might be different from what the user space expects) use 
the user space value which must have been saved somewhere (otherwise it 
would not be possible to restore it).

Or are you afraid, that one part of a user space program installs a 
seccomp filter, which blocks system calls based on the PKRU, and another 
part of the same program (maybe a library) changes the PKRU in a way, 
which the first part did not expect and the program dies because it 
tries to do a forbidden system call?
I don't know whether the kernel can (and wants) do anything against 
this. This problem also exists without this patch if you replace system 
call with memory access.

-- Michael

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH] seccomp: Add protection keys into seccomp_data
  2018-10-29 21:55         ` Michael Sammler
@ 2018-10-29 22:33           ` Dave Hansen
  2018-10-30 10:55             ` Michael Sammler
  0 siblings, 1 reply; 16+ messages in thread
From: Dave Hansen @ 2018-10-29 22:33 UTC (permalink / raw)
  To: Michael Sammler, Jann Horn
  Cc: wad, Kees Cook, Linux API, Dave Hansen, linuxram,
	Andy Lutomirski, linuxppc-dev

On 10/29/18 2:55 PM, Michael Sammler wrote:
>> PKRU getting reset on signals, and the requirement now that it *can't*
>> be changed if you make syscalls probably needs to get thought about very
>> carefully before we do this, though.
> I am not sure, whether I follow you. Are you saying, that PKRU is
> currently not guaranteed to be preserved across system calls?
> This would make it very hard to use protection keys if libc does not
> save and restore the PKRU before/after systemcalls (and I am not aware
> of this).

It's preserved *across* system calls, but you have to be a bit careful
using it _inside_ the kernel.  We could context switch off to something
else, and not think that we need to restore PKRU until _just_ before we
return to userspace.

> Or do you mean, that the kernel might want to use the PKRU register for
> its own purposes while it is executing?

That, or we might keep another process's PKRU state in the register if
we don't think anyone is using it.  Now that I think about it, I think
Rik (cc'd, who was working on those patches) *had* to explicitly restore
PKRU because it's hard to tell where we might do a copy_to/from_user()
and need it.

> Then the solution you proposed in another email in this thread would
> work: instead of providing the seccomp filter with the current value of
> the PKRU (which might be different from what the user space expects) use
> the user space value which must have been saved somewhere (otherwise it
> would not be possible to restore it).

Yep, that's the worst-case scenario: either fetch PKRU out of the XSAVE
buffer (current->fpu->something), or just restore them using an existing
API before doing RDPKRU.

But, that's really an implementation detail.  The effect on the ABI and
how this might constrain future pkeys use is my bigger worry.

I'd also want to make sure that your specific use-case is compatible
with all the oddities of pkeys, like the 'clone' and signal behavior.
Some of that is spelled out here:

	http://man7.org/linux/man-pages/man7/pkeys.7.html

One thing that's a worry is that we have never said that you *can't*
write to arbitrary permissions in PKRU.  I can totally see some really
paranoid code saying, "I'm about to do something risky, so I'll turn off
access to *all* pkeys", or " turn off all access except my current
stack".  If they did that, they might also inadvertently disable access
to certain seccomp-restricted syscalls.

We can fix that up by documenting restrictions like "code should never
change the access rights of any pkey other than those that it
allocated", but that doesn't help any old code (of which I hope there is
relatively little).

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH] seccomp: Add protection keys into seccomp_data
  2018-10-29 22:33           ` Dave Hansen
@ 2018-10-30 10:55             ` Michael Sammler
  0 siblings, 0 replies; 16+ messages in thread
From: Michael Sammler @ 2018-10-30 10:55 UTC (permalink / raw)
  To: Dave Hansen, Jann Horn
  Cc: wad, Kees Cook, Linux API, Dave Hansen, linuxram,
	Andy Lutomirski, linuxppc-dev

On 10/29/2018 11:33 PM, Dave Hansen wrote:
> But, that's really an implementation detail.  The effect on the ABI and
> how this might constrain future pkeys use is my bigger worry.
> 
> I'd also want to make sure that your specific use-case is compatible
> with all the oddities of pkeys, like the 'clone' and signal behavior.
> Some of that is spelled out here:
> 
> 	http://man7.org/linux/man-pages/man7/pkeys.7.html
> 

I think my specific use case is compatible with these oddities since the 
untrusted code is not allowed to install signal handlers or spawn 
threads himself but only through the trusted code, which ensures, that 
the PKRU has the correct value when control passes back to the untrusted 
code. But I agree that these oddities are something a programmer needs 
to take into account if he uses pkeys in general (and this patch adds 
new considerations to it if the program installs a seccomp filter and 
e.g. wants to do system calls from a signal handler).

> One thing that's a worry is that we have never said that you *can't*
> write to arbitrary permissions in PKRU.  I can totally see some really
> paranoid code saying, "I'm about to do something risky, so I'll turn off
> access to *all* pkeys", or " turn off all access except my current
> stack".  If they did that, they might also inadvertently disable access
> to certain seccomp-restricted syscalls.
> 
> We can fix that up by documenting restrictions like "code should never
> change the access rights of any pkey other than those that it
> allocated", but that doesn't help any old code (of which I hope there is
> relatively little).
> 

I can also see paranoid code wanting to do something like you describe 
and I think, that we should try to not forbid this behaviour. 
Documenting restrictions on code which writes to the PKRU as you 
describe would be one way to go, but this would disallow this paranoid 
use case if I understand it correctly because the code would not be 
allowed to disable access to all except of one pkey.

My idea would be to not put the blame on the code which writes to the 
PKRU but on the code which installs the seccomp filter based on the 
PKRU: It has to make sure, that it allows the system calls which the 
paranoid code should be allowed to do when the pkey of the paranoid code 
is the (only) enabled pkey. This could be ensured e.g. by the paranoid 
code providing the pkey it intends to use to the code which installs the 
seccomp filter. This of course means, that you need to know what you are 
doing, when you install a seccomp filter which depends on the PKRU, but 
I think this is already the case when you install a seccomp filter 
without this patch (but maybe someone with more seccomp experience than 
me can say better how much reasoning is needed to install a seccomp 
filter without and with this patch).

-- Michael

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH] seccomp: Add protection keys into seccomp_data
  2022-11-15  4:16 ` Michael Sammler
@ 2022-11-16 12:20   ` Stephen Röttger
  0 siblings, 0 replies; 16+ messages in thread
From: Stephen Röttger @ 2022-11-16 12:20 UTC (permalink / raw)
  To: Michael Sammler
  Cc: Kees Cook, linux-api, linuxppc-dev, linuxram, luto, wad, Dave Hansen

[-- Attachment #1: Type: text/plain, Size: 1143 bytes --]

On Tue, Nov 15, 2022 at 5:16 AM Michael Sammler <msammler@mpi-sws.org> wrote:
> > We're currently working on a feature in chromium that uses pkeys for
> > in-process isolation. Being able to use the pkey state in the seccomp
> > filter would be pretty useful for this. For example, it would allow
> > us to enforce that no code outside the isolated thread would ever
> > map/mprotect executable memory.
> > We can probably do something similar by adding instruction pointer
> > checks to the seccomp filter, but that feels quite hacky and this
> > feature would make a much nicer implementation.
> >
> > Are there any plans to make a version 2 of this patch?
>
> Thanks for your interest in this patch, but I am now working on other projects and currently don't plan to make a version 2 of this patch.

I'd be happy to take over writing a version 2 for this.

Kees and Dave, does this feature overall look good to you?

From the discussion, I think there are two proposed changes:
* use an architecture-generic interface as Ram Pai suggested (i.e. add
a read_pkey function)
* ensure to restore the pkru value or fetch it from the xsave buffer

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4005 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH] seccomp: Add protection keys into seccomp_data
  2022-11-14 10:09 Stephen Röttger
@ 2022-11-15  4:16 ` Michael Sammler
  2022-11-16 12:20   ` Stephen Röttger
  0 siblings, 1 reply; 16+ messages in thread
From: Michael Sammler @ 2022-11-15  4:16 UTC (permalink / raw)
  To: Stephen Röttger
  Cc: Kees Cook, linux-api, linuxppc-dev, linuxram, luto, wad



> We're currently working on a feature in chromium that uses pkeys for
> in-process isolation. Being able to use the pkey state in the seccomp
> filter would be pretty useful for this. For example, it would allow
> us to enforce that no code outside the isolated thread would ever
> map/mprotect executable memory.
> We can probably do something similar by adding instruction pointer
> checks to the seccomp filter, but that feels quite hacky and this
> feature would make a much nicer implementation.
>
> Are there any plans to make a version 2 of this patch?

Thanks for your interest in this patch, but I am now working on other projects and currently don't plan to make a version 2 of this patch.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH] seccomp: Add protection keys into seccomp_data
@ 2022-11-14 10:09 Stephen Röttger
  2022-11-15  4:16 ` Michael Sammler
  0 siblings, 1 reply; 16+ messages in thread
From: Stephen Röttger @ 2022-11-14 10:09 UTC (permalink / raw)
  To: msammler; +Cc: Kees Cook, linux-api, linuxppc-dev, linuxram, luto, wad

[-- Attachment #1: Type: text/plain, Size: 773 bytes --]

> Add the current value of an architecture specific protection keys
> register (currently PKRU on x86) to data available for seccomp-bpf
> programs to work on. This allows filters based on the currently
> enabled protection keys.

We're currently working on a feature in chromium that uses pkeys for
in-process isolation. Being able to use the pkey state in the seccomp
filter would be pretty useful for this. For example, it would allow
us to enforce that no code outside the isolated thread would ever
map/mprotect executable memory.
We can probably do something similar by adding instruction pointer
checks to the seccomp filter, but that feels quite hacky and this
feature would make a much nicer implementation.

Are there any plans to make a version 2 of this patch?

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4005 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2022-11-16 12:25 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-29 11:23 [RFC PATCH] seccomp: Add protection keys into seccomp_data Michael Sammler
2018-10-29 16:25 ` Kees Cook
2018-10-29 16:37   ` Dave Hansen
2018-10-29 16:48     ` Jann Horn
2018-10-29 17:02       ` Michael Sammler
2018-10-29 17:07         ` Dave Hansen
2018-10-29 17:29       ` Dave Hansen
2018-10-29 21:55         ` Michael Sammler
2018-10-29 22:33           ` Dave Hansen
2018-10-30 10:55             ` Michael Sammler
2018-10-29 16:42   ` Jann Horn
2018-10-29 16:48   ` Ram Pai
2018-10-29 17:05     ` Michael Sammler
2022-11-14 10:09 Stephen Röttger
2022-11-15  4:16 ` Michael Sammler
2022-11-16 12:20   ` Stephen Röttger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).