All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v9 0/11] seccomp: add thread sync ability
@ 2014-06-27 23:22 ` Kees Cook
  0 siblings, 0 replies; 59+ messages in thread
From: Kees Cook @ 2014-06-27 23:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, Oleg Nesterov, Andy Lutomirski,
	Michael Kerrisk (man-pages),
	Alexei Starovoitov, Andrew Morton, Daniel Borkmann, Will Drewry,
	Julien Tinnes, David Drysdale, linux-api, x86, linux-arm-kernel,
	linux-mips, linux-arch, linux-security-module

This adds the ability for threads to request seccomp filter
synchronization across their thread group (at filter attach time).
For example, for Chrome to make sure graphic driver threads are fully
confined after seccomp filters have been attached.

To support this, locking on seccomp changes via thread-group-shared
sighand lock is introduced, along with refactoring of no_new_privs. Races
with thread creation are handled via delayed duplication of the seccomp
task struct field.

This includes a new syscall (instead of adding a new prctl option),
as suggested by Andy Lutomirski and Michael Kerrisk.

Thanks!

-Kees

v9:
 - rearranged/split patches to make things more reviewable
 - added use of cred_guard_mutex to solve exec race (oleg, luto)
 - added barriers for TIF_SECCOMP vs seccomp.mode race (oleg, luto)
 - fixed missed copying of nnp state after v8 refactor (oleg)
v8:
 - drop use of tasklist_lock, appears redundant against sighand (oleg)
 - reduced use of smp_load_acquire to logical minimum (oleg)
 - change nnp to a task struct held atomic flags field (oleg, luto)
 - drop needless irqflags changes in fork.c for holding sighand lock (oleg)
 - cleaned up use of thread for-each loop (oleg)
 - rearranged patch order to keep syscall changes adjacent
 - added example code to manpage (mtk)
v7:
 - rebase on Linus's tree (merged with network bpf changes)
 - wrote manpage text documenting API (follows this series)
v6:
 - switch from seccomp-specific lock to thread-group lock to gain atomicity
 - implement seccomp syscall across all architectures with seccomp filter
 - clean up sparse warnings around locking
v5:
 - move includes around (drysdale)
 - drop set_nnp return value (luto)
 - use smp_load_acquire/store_release (luto)
 - merge nnp changes to seccomp always, fewer ifdef (luto)
v4:
 - cleaned up locking further, as noticed by David Drysdale
v3:
 - added SECCOMP_EXT_ACT_FILTER for new filter install options
v2:
 - reworked to avoid clone races


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v9 0/11] seccomp: add thread sync ability
@ 2014-06-27 23:22 ` Kees Cook
  0 siblings, 0 replies; 59+ messages in thread
From: Kees Cook @ 2014-06-27 23:22 UTC (permalink / raw)
  To: linux-arm-kernel

This adds the ability for threads to request seccomp filter
synchronization across their thread group (at filter attach time).
For example, for Chrome to make sure graphic driver threads are fully
confined after seccomp filters have been attached.

To support this, locking on seccomp changes via thread-group-shared
sighand lock is introduced, along with refactoring of no_new_privs. Races
with thread creation are handled via delayed duplication of the seccomp
task struct field.

This includes a new syscall (instead of adding a new prctl option),
as suggested by Andy Lutomirski and Michael Kerrisk.

Thanks!

-Kees

v9:
 - rearranged/split patches to make things more reviewable
 - added use of cred_guard_mutex to solve exec race (oleg, luto)
 - added barriers for TIF_SECCOMP vs seccomp.mode race (oleg, luto)
 - fixed missed copying of nnp state after v8 refactor (oleg)
v8:
 - drop use of tasklist_lock, appears redundant against sighand (oleg)
 - reduced use of smp_load_acquire to logical minimum (oleg)
 - change nnp to a task struct held atomic flags field (oleg, luto)
 - drop needless irqflags changes in fork.c for holding sighand lock (oleg)
 - cleaned up use of thread for-each loop (oleg)
 - rearranged patch order to keep syscall changes adjacent
 - added example code to manpage (mtk)
v7:
 - rebase on Linus's tree (merged with network bpf changes)
 - wrote manpage text documenting API (follows this series)
v6:
 - switch from seccomp-specific lock to thread-group lock to gain atomicity
 - implement seccomp syscall across all architectures with seccomp filter
 - clean up sparse warnings around locking
v5:
 - move includes around (drysdale)
 - drop set_nnp return value (luto)
 - use smp_load_acquire/store_release (luto)
 - merge nnp changes to seccomp always, fewer ifdef (luto)
v4:
 - cleaned up locking further, as noticed by David Drysdale
v3:
 - added SECCOMP_EXT_ACT_FILTER for new filter install options
v2:
 - reworked to avoid clone races

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v9 01/11] seccomp: create internal mode-setting function
  2014-06-27 23:22 ` Kees Cook
@ 2014-06-27 23:22   ` Kees Cook
  -1 siblings, 0 replies; 59+ messages in thread
From: Kees Cook @ 2014-06-27 23:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, Oleg Nesterov, Andy Lutomirski,
	Michael Kerrisk (man-pages),
	Alexei Starovoitov, Andrew Morton, Daniel Borkmann, Will Drewry,
	Julien Tinnes, David Drysdale, linux-api, x86, linux-arm-kernel,
	linux-mips, linux-arch, linux-security-module

In preparation for having other callers of the seccomp mode setting
logic, split the prctl entry point away from the core logic that performs
seccomp mode setting.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 kernel/seccomp.c |   16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 301bbc24739c..afb916c7e890 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -473,7 +473,7 @@ long prctl_get_seccomp(void)
 }
 
 /**
- * prctl_set_seccomp: configures current->seccomp.mode
+ * seccomp_set_mode: internal function for setting seccomp mode
  * @seccomp_mode: requested mode to use
  * @filter: optional struct sock_fprog for use with SECCOMP_MODE_FILTER
  *
@@ -486,7 +486,7 @@ long prctl_get_seccomp(void)
  *
  * Returns 0 on success or -EINVAL on failure.
  */
-long prctl_set_seccomp(unsigned long seccomp_mode, char __user *filter)
+static long seccomp_set_mode(unsigned long seccomp_mode, char __user *filter)
 {
 	long ret = -EINVAL;
 
@@ -517,3 +517,15 @@ long prctl_set_seccomp(unsigned long seccomp_mode, char __user *filter)
 out:
 	return ret;
 }
+
+/**
+ * prctl_set_seccomp: configures current->seccomp.mode
+ * @seccomp_mode: requested mode to use
+ * @filter: optional struct sock_fprog for use with SECCOMP_MODE_FILTER
+ *
+ * Returns 0 on success or -EINVAL on failure.
+ */
+long prctl_set_seccomp(unsigned long seccomp_mode, char __user *filter)
+{
+	return seccomp_set_mode(seccomp_mode, filter);
+}
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v9 01/11] seccomp: create internal mode-setting function
@ 2014-06-27 23:22   ` Kees Cook
  0 siblings, 0 replies; 59+ messages in thread
From: Kees Cook @ 2014-06-27 23:22 UTC (permalink / raw)
  To: linux-arm-kernel

In preparation for having other callers of the seccomp mode setting
logic, split the prctl entry point away from the core logic that performs
seccomp mode setting.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 kernel/seccomp.c |   16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 301bbc24739c..afb916c7e890 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -473,7 +473,7 @@ long prctl_get_seccomp(void)
 }
 
 /**
- * prctl_set_seccomp: configures current->seccomp.mode
+ * seccomp_set_mode: internal function for setting seccomp mode
  * @seccomp_mode: requested mode to use
  * @filter: optional struct sock_fprog for use with SECCOMP_MODE_FILTER
  *
@@ -486,7 +486,7 @@ long prctl_get_seccomp(void)
  *
  * Returns 0 on success or -EINVAL on failure.
  */
-long prctl_set_seccomp(unsigned long seccomp_mode, char __user *filter)
+static long seccomp_set_mode(unsigned long seccomp_mode, char __user *filter)
 {
 	long ret = -EINVAL;
 
@@ -517,3 +517,15 @@ long prctl_set_seccomp(unsigned long seccomp_mode, char __user *filter)
 out:
 	return ret;
 }
+
+/**
+ * prctl_set_seccomp: configures current->seccomp.mode
+ * @seccomp_mode: requested mode to use
+ * @filter: optional struct sock_fprog for use with SECCOMP_MODE_FILTER
+ *
+ * Returns 0 on success or -EINVAL on failure.
+ */
+long prctl_set_seccomp(unsigned long seccomp_mode, char __user *filter)
+{
+	return seccomp_set_mode(seccomp_mode, filter);
+}
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v9 02/11] seccomp: extract check/assign mode helpers
  2014-06-27 23:22 ` Kees Cook
@ 2014-06-27 23:22   ` Kees Cook
  -1 siblings, 0 replies; 59+ messages in thread
From: Kees Cook @ 2014-06-27 23:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, Oleg Nesterov, Andy Lutomirski,
	Michael Kerrisk (man-pages),
	Alexei Starovoitov, Andrew Morton, Daniel Borkmann, Will Drewry,
	Julien Tinnes, David Drysdale, linux-api, x86, linux-arm-kernel,
	linux-mips, linux-arch, linux-security-module

To support splitting mode 1 from mode 2, extract the mode checking and
assignment logic into common functions.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 kernel/seccomp.c |   22 ++++++++++++++++++----
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index afb916c7e890..03a5959b7930 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -194,7 +194,23 @@ static u32 seccomp_run_filters(int syscall)
 	}
 	return ret;
 }
+#endif /* CONFIG_SECCOMP_FILTER */
 
+static inline bool seccomp_check_mode(unsigned long seccomp_mode)
+{
+	if (current->seccomp.mode && current->seccomp.mode != seccomp_mode)
+		return false;
+
+	return true;
+}
+
+static inline void seccomp_assign_mode(unsigned long seccomp_mode)
+{
+	current->seccomp.mode = seccomp_mode;
+	set_tsk_thread_flag(current, TIF_SECCOMP);
+}
+
+#ifdef CONFIG_SECCOMP_FILTER
 /**
  * seccomp_attach_filter: Attaches a seccomp filter to current.
  * @fprog: BPF program to install
@@ -490,8 +506,7 @@ static long seccomp_set_mode(unsigned long seccomp_mode, char __user *filter)
 {
 	long ret = -EINVAL;
 
-	if (current->seccomp.mode &&
-	    current->seccomp.mode != seccomp_mode)
+	if (!seccomp_check_mode(seccomp_mode))
 		goto out;
 
 	switch (seccomp_mode) {
@@ -512,8 +527,7 @@ static long seccomp_set_mode(unsigned long seccomp_mode, char __user *filter)
 		goto out;
 	}
 
-	current->seccomp.mode = seccomp_mode;
-	set_thread_flag(TIF_SECCOMP);
+	seccomp_assign_mode(seccomp_mode);
 out:
 	return ret;
 }
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v9 02/11] seccomp: extract check/assign mode helpers
@ 2014-06-27 23:22   ` Kees Cook
  0 siblings, 0 replies; 59+ messages in thread
From: Kees Cook @ 2014-06-27 23:22 UTC (permalink / raw)
  To: linux-arm-kernel

To support splitting mode 1 from mode 2, extract the mode checking and
assignment logic into common functions.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 kernel/seccomp.c |   22 ++++++++++++++++++----
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index afb916c7e890..03a5959b7930 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -194,7 +194,23 @@ static u32 seccomp_run_filters(int syscall)
 	}
 	return ret;
 }
+#endif /* CONFIG_SECCOMP_FILTER */
 
+static inline bool seccomp_check_mode(unsigned long seccomp_mode)
+{
+	if (current->seccomp.mode && current->seccomp.mode != seccomp_mode)
+		return false;
+
+	return true;
+}
+
+static inline void seccomp_assign_mode(unsigned long seccomp_mode)
+{
+	current->seccomp.mode = seccomp_mode;
+	set_tsk_thread_flag(current, TIF_SECCOMP);
+}
+
+#ifdef CONFIG_SECCOMP_FILTER
 /**
  * seccomp_attach_filter: Attaches a seccomp filter to current.
  * @fprog: BPF program to install
@@ -490,8 +506,7 @@ static long seccomp_set_mode(unsigned long seccomp_mode, char __user *filter)
 {
 	long ret = -EINVAL;
 
-	if (current->seccomp.mode &&
-	    current->seccomp.mode != seccomp_mode)
+	if (!seccomp_check_mode(seccomp_mode))
 		goto out;
 
 	switch (seccomp_mode) {
@@ -512,8 +527,7 @@ static long seccomp_set_mode(unsigned long seccomp_mode, char __user *filter)
 		goto out;
 	}
 
-	current->seccomp.mode = seccomp_mode;
-	set_thread_flag(TIF_SECCOMP);
+	seccomp_assign_mode(seccomp_mode);
 out:
 	return ret;
 }
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v9 03/11] seccomp: split mode setting routines
  2014-06-27 23:22 ` Kees Cook
  (?)
@ 2014-06-27 23:22   ` Kees Cook
  -1 siblings, 0 replies; 59+ messages in thread
From: Kees Cook @ 2014-06-27 23:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, Oleg Nesterov, Andy Lutomirski,
	Michael Kerrisk (man-pages),
	Alexei Starovoitov, Andrew Morton, Daniel Borkmann, Will Drewry,
	Julien Tinnes, David Drysdale, linux-api, x86, linux-arm-kernel,
	linux-mips, linux-arch, linux-security-module

Separates the two mode setting paths to make things more readable with
fewer #ifdefs within function bodies.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 kernel/seccomp.c |   71 ++++++++++++++++++++++++++++++++++++------------------
 1 file changed, 48 insertions(+), 23 deletions(-)

diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 03a5959b7930..812cea2e7ffb 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -489,48 +489,66 @@ long prctl_get_seccomp(void)
 }
 
 /**
- * seccomp_set_mode: internal function for setting seccomp mode
- * @seccomp_mode: requested mode to use
- * @filter: optional struct sock_fprog for use with SECCOMP_MODE_FILTER
- *
- * This function may be called repeatedly with a @seccomp_mode of
- * SECCOMP_MODE_FILTER to install additional filters.  Every filter
- * successfully installed will be evaluated (in reverse order) for each system
- * call the task makes.
+ * seccomp_set_mode_strict: internal function for setting strict seccomp
  *
  * Once current->seccomp.mode is non-zero, it may not be changed.
  *
  * Returns 0 on success or -EINVAL on failure.
  */
-static long seccomp_set_mode(unsigned long seccomp_mode, char __user *filter)
+static long seccomp_set_mode_strict(void)
 {
+	const unsigned long seccomp_mode = SECCOMP_MODE_STRICT;
 	long ret = -EINVAL;
 
 	if (!seccomp_check_mode(seccomp_mode))
 		goto out;
 
-	switch (seccomp_mode) {
-	case SECCOMP_MODE_STRICT:
-		ret = 0;
 #ifdef TIF_NOTSC
-		disable_TSC();
+	disable_TSC();
 #endif
-		break;
+	seccomp_assign_mode(seccomp_mode);
+	ret = 0;
+
+out:
+
+	return ret;
+}
+
 #ifdef CONFIG_SECCOMP_FILTER
-	case SECCOMP_MODE_FILTER:
-		ret = seccomp_attach_user_filter(filter);
-		if (ret)
-			goto out;
-		break;
-#endif
-	default:
+/**
+ * seccomp_set_mode_filter: internal function for setting seccomp filter
+ * @filter: struct sock_fprog containing filter
+ *
+ * This function may be called repeatedly to install additional filters.
+ * Every filter successfully installed will be evaluated (in reverse order)
+ * for each system call the task makes.
+ *
+ * Once current->seccomp.mode is non-zero, it may not be changed.
+ *
+ * Returns 0 on success or -EINVAL on failure.
+ */
+static long seccomp_set_mode_filter(char __user *filter)
+{
+	const unsigned long seccomp_mode = SECCOMP_MODE_FILTER;
+	long ret = -EINVAL;
+
+	if (!seccomp_check_mode(seccomp_mode))
+		goto out;
+
+	ret = seccomp_attach_user_filter(filter);
+	if (ret)
 		goto out;
-	}
 
 	seccomp_assign_mode(seccomp_mode);
 out:
 	return ret;
 }
+#else
+static inline long seccomp_set_mode_filter(char __user *filter)
+{
+	return -EINVAL;
+}
+#endif
 
 /**
  * prctl_set_seccomp: configures current->seccomp.mode
@@ -541,5 +559,12 @@ out:
  */
 long prctl_set_seccomp(unsigned long seccomp_mode, char __user *filter)
 {
-	return seccomp_set_mode(seccomp_mode, filter);
+	switch (seccomp_mode) {
+	case SECCOMP_MODE_STRICT:
+		return seccomp_set_mode_strict();
+	case SECCOMP_MODE_FILTER:
+		return seccomp_set_mode_filter(filter);
+	default:
+		return -EINVAL;
+	}
 }
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v9 03/11] seccomp: split mode setting routines
@ 2014-06-27 23:22   ` Kees Cook
  0 siblings, 0 replies; 59+ messages in thread
From: Kees Cook @ 2014-06-27 23:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, linux-mips, Will Drewry, Kees Cook,
	linux-security-module, linux-api, x86, Oleg Nesterov,
	Andy Lutomirski, Daniel Borkmann, Julien Tinnes,
	Michael Kerrisk (man-pages),
	Andrew Morton, David Drysdale, linux-arm-kernel,
	Alexei Starovoitov

Separates the two mode setting paths to make things more readable with
fewer #ifdefs within function bodies.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 kernel/seccomp.c |   71 ++++++++++++++++++++++++++++++++++++------------------
 1 file changed, 48 insertions(+), 23 deletions(-)

diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 03a5959b7930..812cea2e7ffb 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -489,48 +489,66 @@ long prctl_get_seccomp(void)
 }
 
 /**
- * seccomp_set_mode: internal function for setting seccomp mode
- * @seccomp_mode: requested mode to use
- * @filter: optional struct sock_fprog for use with SECCOMP_MODE_FILTER
- *
- * This function may be called repeatedly with a @seccomp_mode of
- * SECCOMP_MODE_FILTER to install additional filters.  Every filter
- * successfully installed will be evaluated (in reverse order) for each system
- * call the task makes.
+ * seccomp_set_mode_strict: internal function for setting strict seccomp
  *
  * Once current->seccomp.mode is non-zero, it may not be changed.
  *
  * Returns 0 on success or -EINVAL on failure.
  */
-static long seccomp_set_mode(unsigned long seccomp_mode, char __user *filter)
+static long seccomp_set_mode_strict(void)
 {
+	const unsigned long seccomp_mode = SECCOMP_MODE_STRICT;
 	long ret = -EINVAL;
 
 	if (!seccomp_check_mode(seccomp_mode))
 		goto out;
 
-	switch (seccomp_mode) {
-	case SECCOMP_MODE_STRICT:
-		ret = 0;
 #ifdef TIF_NOTSC
-		disable_TSC();
+	disable_TSC();
 #endif
-		break;
+	seccomp_assign_mode(seccomp_mode);
+	ret = 0;
+
+out:
+
+	return ret;
+}
+
 #ifdef CONFIG_SECCOMP_FILTER
-	case SECCOMP_MODE_FILTER:
-		ret = seccomp_attach_user_filter(filter);
-		if (ret)
-			goto out;
-		break;
-#endif
-	default:
+/**
+ * seccomp_set_mode_filter: internal function for setting seccomp filter
+ * @filter: struct sock_fprog containing filter
+ *
+ * This function may be called repeatedly to install additional filters.
+ * Every filter successfully installed will be evaluated (in reverse order)
+ * for each system call the task makes.
+ *
+ * Once current->seccomp.mode is non-zero, it may not be changed.
+ *
+ * Returns 0 on success or -EINVAL on failure.
+ */
+static long seccomp_set_mode_filter(char __user *filter)
+{
+	const unsigned long seccomp_mode = SECCOMP_MODE_FILTER;
+	long ret = -EINVAL;
+
+	if (!seccomp_check_mode(seccomp_mode))
+		goto out;
+
+	ret = seccomp_attach_user_filter(filter);
+	if (ret)
 		goto out;
-	}
 
 	seccomp_assign_mode(seccomp_mode);
 out:
 	return ret;
 }
+#else
+static inline long seccomp_set_mode_filter(char __user *filter)
+{
+	return -EINVAL;
+}
+#endif
 
 /**
  * prctl_set_seccomp: configures current->seccomp.mode
@@ -541,5 +559,12 @@ out:
  */
 long prctl_set_seccomp(unsigned long seccomp_mode, char __user *filter)
 {
-	return seccomp_set_mode(seccomp_mode, filter);
+	switch (seccomp_mode) {
+	case SECCOMP_MODE_STRICT:
+		return seccomp_set_mode_strict();
+	case SECCOMP_MODE_FILTER:
+		return seccomp_set_mode_filter(filter);
+	default:
+		return -EINVAL;
+	}
 }
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v9 03/11] seccomp: split mode setting routines
@ 2014-06-27 23:22   ` Kees Cook
  0 siblings, 0 replies; 59+ messages in thread
From: Kees Cook @ 2014-06-27 23:22 UTC (permalink / raw)
  To: linux-arm-kernel

Separates the two mode setting paths to make things more readable with
fewer #ifdefs within function bodies.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 kernel/seccomp.c |   71 ++++++++++++++++++++++++++++++++++++------------------
 1 file changed, 48 insertions(+), 23 deletions(-)

diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 03a5959b7930..812cea2e7ffb 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -489,48 +489,66 @@ long prctl_get_seccomp(void)
 }
 
 /**
- * seccomp_set_mode: internal function for setting seccomp mode
- * @seccomp_mode: requested mode to use
- * @filter: optional struct sock_fprog for use with SECCOMP_MODE_FILTER
- *
- * This function may be called repeatedly with a @seccomp_mode of
- * SECCOMP_MODE_FILTER to install additional filters.  Every filter
- * successfully installed will be evaluated (in reverse order) for each system
- * call the task makes.
+ * seccomp_set_mode_strict: internal function for setting strict seccomp
  *
  * Once current->seccomp.mode is non-zero, it may not be changed.
  *
  * Returns 0 on success or -EINVAL on failure.
  */
-static long seccomp_set_mode(unsigned long seccomp_mode, char __user *filter)
+static long seccomp_set_mode_strict(void)
 {
+	const unsigned long seccomp_mode = SECCOMP_MODE_STRICT;
 	long ret = -EINVAL;
 
 	if (!seccomp_check_mode(seccomp_mode))
 		goto out;
 
-	switch (seccomp_mode) {
-	case SECCOMP_MODE_STRICT:
-		ret = 0;
 #ifdef TIF_NOTSC
-		disable_TSC();
+	disable_TSC();
 #endif
-		break;
+	seccomp_assign_mode(seccomp_mode);
+	ret = 0;
+
+out:
+
+	return ret;
+}
+
 #ifdef CONFIG_SECCOMP_FILTER
-	case SECCOMP_MODE_FILTER:
-		ret = seccomp_attach_user_filter(filter);
-		if (ret)
-			goto out;
-		break;
-#endif
-	default:
+/**
+ * seccomp_set_mode_filter: internal function for setting seccomp filter
+ * @filter: struct sock_fprog containing filter
+ *
+ * This function may be called repeatedly to install additional filters.
+ * Every filter successfully installed will be evaluated (in reverse order)
+ * for each system call the task makes.
+ *
+ * Once current->seccomp.mode is non-zero, it may not be changed.
+ *
+ * Returns 0 on success or -EINVAL on failure.
+ */
+static long seccomp_set_mode_filter(char __user *filter)
+{
+	const unsigned long seccomp_mode = SECCOMP_MODE_FILTER;
+	long ret = -EINVAL;
+
+	if (!seccomp_check_mode(seccomp_mode))
+		goto out;
+
+	ret = seccomp_attach_user_filter(filter);
+	if (ret)
 		goto out;
-	}
 
 	seccomp_assign_mode(seccomp_mode);
 out:
 	return ret;
 }
+#else
+static inline long seccomp_set_mode_filter(char __user *filter)
+{
+	return -EINVAL;
+}
+#endif
 
 /**
  * prctl_set_seccomp: configures current->seccomp.mode
@@ -541,5 +559,12 @@ out:
  */
 long prctl_set_seccomp(unsigned long seccomp_mode, char __user *filter)
 {
-	return seccomp_set_mode(seccomp_mode, filter);
+	switch (seccomp_mode) {
+	case SECCOMP_MODE_STRICT:
+		return seccomp_set_mode_strict();
+	case SECCOMP_MODE_FILTER:
+		return seccomp_set_mode_filter(filter);
+	default:
+		return -EINVAL;
+	}
 }
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v9 04/11] seccomp: add "seccomp" syscall
  2014-06-27 23:22 ` Kees Cook
@ 2014-06-27 23:22   ` Kees Cook
  -1 siblings, 0 replies; 59+ messages in thread
From: Kees Cook @ 2014-06-27 23:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, Oleg Nesterov, Andy Lutomirski,
	Michael Kerrisk (man-pages),
	Alexei Starovoitov, Andrew Morton, Daniel Borkmann, Will Drewry,
	Julien Tinnes, David Drysdale, linux-api, x86, linux-arm-kernel,
	linux-mips, linux-arch, linux-security-module

This adds the new "seccomp" syscall with both an "operation" and "flags"
parameter for future expansion. The third argument is a pointer value,
used with the SECCOMP_SET_MODE_FILTER operation. Currently, flags must
be 0. This is functionally equivalent to prctl(PR_SET_SECCOMP, ...).

In addition to the TSYNC flag later in this patch series, there is a
non-zero chance that this syscall could be used for configuring a fixed
argument area for seccomp-tracer-aware processes to pass syscall arguments
in the future. Hence, the use of "seccomp" not simply "seccomp_add_filter"
for this syscall. Additionally, this syscall uses operation, flags,
and user pointer for arguments because strictly passing arguments via
a user pointer would mean seccomp itself would be unable to trivially
filter the seccomp syscall itself.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/Kconfig                      |    1 +
 arch/x86/syscalls/syscall_32.tbl  |    1 +
 arch/x86/syscalls/syscall_64.tbl  |    1 +
 include/linux/syscalls.h          |    2 ++
 include/uapi/asm-generic/unistd.h |    4 ++-
 include/uapi/linux/seccomp.h      |    4 +++
 kernel/seccomp.c                  |   55 +++++++++++++++++++++++++++++++++----
 kernel/sys_ni.c                   |    3 ++
 8 files changed, 65 insertions(+), 6 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 97ff872c7acc..0eae9df35b88 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -321,6 +321,7 @@ config HAVE_ARCH_SECCOMP_FILTER
 	  - secure_computing is called from a ptrace_event()-safe context
 	  - secure_computing return value is checked and a return value of -1
 	    results in the system call being skipped immediately.
+	  - seccomp syscall wired up
 
 config SECCOMP_FILTER
 	def_bool y
diff --git a/arch/x86/syscalls/syscall_32.tbl b/arch/x86/syscalls/syscall_32.tbl
index d6b867921612..7527eac24122 100644
--- a/arch/x86/syscalls/syscall_32.tbl
+++ b/arch/x86/syscalls/syscall_32.tbl
@@ -360,3 +360,4 @@
 351	i386	sched_setattr		sys_sched_setattr
 352	i386	sched_getattr		sys_sched_getattr
 353	i386	renameat2		sys_renameat2
+354	i386	seccomp			sys_seccomp
diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl
index ec255a1646d2..16272a6c12b7 100644
--- a/arch/x86/syscalls/syscall_64.tbl
+++ b/arch/x86/syscalls/syscall_64.tbl
@@ -323,6 +323,7 @@
 314	common	sched_setattr		sys_sched_setattr
 315	common	sched_getattr		sys_sched_getattr
 316	common	renameat2		sys_renameat2
+317	common	seccomp			sys_seccomp
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index b0881a0ed322..1713977ee26f 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -866,4 +866,6 @@ asmlinkage long sys_process_vm_writev(pid_t pid,
 asmlinkage long sys_kcmp(pid_t pid1, pid_t pid2, int type,
 			 unsigned long idx1, unsigned long idx2);
 asmlinkage long sys_finit_module(int fd, const char __user *uargs, int flags);
+asmlinkage long sys_seccomp(unsigned int op, unsigned int flags,
+			    const char __user *uargs);
 #endif
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 333640608087..65acbf0e2867 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -699,9 +699,11 @@ __SYSCALL(__NR_sched_setattr, sys_sched_setattr)
 __SYSCALL(__NR_sched_getattr, sys_sched_getattr)
 #define __NR_renameat2 276
 __SYSCALL(__NR_renameat2, sys_renameat2)
+#define __NR_seccomp 277
+__SYSCALL(__NR_seccomp, sys_seccomp)
 
 #undef __NR_syscalls
-#define __NR_syscalls 277
+#define __NR_syscalls 278
 
 /*
  * All syscalls below here should go away really,
diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
index ac2dc9f72973..b258878ba754 100644
--- a/include/uapi/linux/seccomp.h
+++ b/include/uapi/linux/seccomp.h
@@ -10,6 +10,10 @@
 #define SECCOMP_MODE_STRICT	1 /* uses hard-coded filter. */
 #define SECCOMP_MODE_FILTER	2 /* uses user-supplied filter. */
 
+/* Valid operations for seccomp syscall. */
+#define SECCOMP_SET_MODE_STRICT	0
+#define SECCOMP_SET_MODE_FILTER	1
+
 /*
  * All BPF programs must return a 32-bit value.
  * The bottom 16-bits are for optional return data.
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 812cea2e7ffb..2f83496d6016 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -18,6 +18,7 @@
 #include <linux/compat.h>
 #include <linux/sched.h>
 #include <linux/seccomp.h>
+#include <linux/syscalls.h>
 
 /* #define SECCOMP_DEBUG 1 */
 
@@ -314,7 +315,7 @@ free_prog:
  *
  * Returns 0 on success and non-zero otherwise.
  */
-static long seccomp_attach_user_filter(char __user *user_filter)
+static long seccomp_attach_user_filter(const char __user *user_filter)
 {
 	struct sock_fprog fprog;
 	long ret = -EFAULT;
@@ -517,6 +518,7 @@ out:
 #ifdef CONFIG_SECCOMP_FILTER
 /**
  * seccomp_set_mode_filter: internal function for setting seccomp filter
+ * @flags:  flags to change filter behavior
  * @filter: struct sock_fprog containing filter
  *
  * This function may be called repeatedly to install additional filters.
@@ -527,11 +529,16 @@ out:
  *
  * Returns 0 on success or -EINVAL on failure.
  */
-static long seccomp_set_mode_filter(char __user *filter)
+static long seccomp_set_mode_filter(unsigned int flags,
+				    const char __user *filter)
 {
 	const unsigned long seccomp_mode = SECCOMP_MODE_FILTER;
 	long ret = -EINVAL;
 
+	/* Validate flags. */
+	if (flags != 0)
+		goto out;
+
 	if (!seccomp_check_mode(seccomp_mode))
 		goto out;
 
@@ -544,12 +551,35 @@ out:
 	return ret;
 }
 #else
-static inline long seccomp_set_mode_filter(char __user *filter)
+static inline long seccomp_set_mode_filter(unsigned int flags,
+					   const char __user *filter)
 {
 	return -EINVAL;
 }
 #endif
 
+/* Common entry point for both prctl and syscall. */
+static long do_seccomp(unsigned int op, unsigned int flags,
+		       const char __user *uargs)
+{
+	switch (op) {
+	case SECCOMP_SET_MODE_STRICT:
+		if (flags != 0 || uargs != NULL)
+			return -EINVAL;
+		return seccomp_set_mode_strict();
+	case SECCOMP_SET_MODE_FILTER:
+		return seccomp_set_mode_filter(flags, uargs);
+	default:
+		return -EINVAL;
+	}
+}
+
+SYSCALL_DEFINE3(seccomp, unsigned int, op, unsigned int, flags,
+			 const char __user *, uargs)
+{
+	return do_seccomp(op, flags, uargs);
+}
+
 /**
  * prctl_set_seccomp: configures current->seccomp.mode
  * @seccomp_mode: requested mode to use
@@ -559,12 +589,27 @@ static inline long seccomp_set_mode_filter(char __user *filter)
  */
 long prctl_set_seccomp(unsigned long seccomp_mode, char __user *filter)
 {
+	unsigned int op;
+	char __user *uargs;
+
 	switch (seccomp_mode) {
 	case SECCOMP_MODE_STRICT:
-		return seccomp_set_mode_strict();
+		op = SECCOMP_SET_MODE_STRICT;
+		/*
+		 * Setting strict mode through prctl always ignored filter,
+		 * so make sure it is always NULL here to pass the internal
+		 * check in do_seccomp().
+		 */
+		uargs = NULL;
+		break;
 	case SECCOMP_MODE_FILTER:
-		return seccomp_set_mode_filter(filter);
+		op = SECCOMP_SET_MODE_FILTER;
+		uargs = filter;
+		break;
 	default:
 		return -EINVAL;
 	}
+
+	/* prctl interface doesn't have flags, so they are always zero. */
+	return do_seccomp(op, 0, uargs);
 }
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 36441b51b5df..2904a2105914 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -213,3 +213,6 @@ cond_syscall(compat_sys_open_by_handle_at);
 
 /* compare kernel pointers */
 cond_syscall(sys_kcmp);
+
+/* operate on Secure Computing state */
+cond_syscall(sys_seccomp);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v9 04/11] seccomp: add "seccomp" syscall
@ 2014-06-27 23:22   ` Kees Cook
  0 siblings, 0 replies; 59+ messages in thread
From: Kees Cook @ 2014-06-27 23:22 UTC (permalink / raw)
  To: linux-arm-kernel

This adds the new "seccomp" syscall with both an "operation" and "flags"
parameter for future expansion. The third argument is a pointer value,
used with the SECCOMP_SET_MODE_FILTER operation. Currently, flags must
be 0. This is functionally equivalent to prctl(PR_SET_SECCOMP, ...).

In addition to the TSYNC flag later in this patch series, there is a
non-zero chance that this syscall could be used for configuring a fixed
argument area for seccomp-tracer-aware processes to pass syscall arguments
in the future. Hence, the use of "seccomp" not simply "seccomp_add_filter"
for this syscall. Additionally, this syscall uses operation, flags,
and user pointer for arguments because strictly passing arguments via
a user pointer would mean seccomp itself would be unable to trivially
filter the seccomp syscall itself.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/Kconfig                      |    1 +
 arch/x86/syscalls/syscall_32.tbl  |    1 +
 arch/x86/syscalls/syscall_64.tbl  |    1 +
 include/linux/syscalls.h          |    2 ++
 include/uapi/asm-generic/unistd.h |    4 ++-
 include/uapi/linux/seccomp.h      |    4 +++
 kernel/seccomp.c                  |   55 +++++++++++++++++++++++++++++++++----
 kernel/sys_ni.c                   |    3 ++
 8 files changed, 65 insertions(+), 6 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 97ff872c7acc..0eae9df35b88 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -321,6 +321,7 @@ config HAVE_ARCH_SECCOMP_FILTER
 	  - secure_computing is called from a ptrace_event()-safe context
 	  - secure_computing return value is checked and a return value of -1
 	    results in the system call being skipped immediately.
+	  - seccomp syscall wired up
 
 config SECCOMP_FILTER
 	def_bool y
diff --git a/arch/x86/syscalls/syscall_32.tbl b/arch/x86/syscalls/syscall_32.tbl
index d6b867921612..7527eac24122 100644
--- a/arch/x86/syscalls/syscall_32.tbl
+++ b/arch/x86/syscalls/syscall_32.tbl
@@ -360,3 +360,4 @@
 351	i386	sched_setattr		sys_sched_setattr
 352	i386	sched_getattr		sys_sched_getattr
 353	i386	renameat2		sys_renameat2
+354	i386	seccomp			sys_seccomp
diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl
index ec255a1646d2..16272a6c12b7 100644
--- a/arch/x86/syscalls/syscall_64.tbl
+++ b/arch/x86/syscalls/syscall_64.tbl
@@ -323,6 +323,7 @@
 314	common	sched_setattr		sys_sched_setattr
 315	common	sched_getattr		sys_sched_getattr
 316	common	renameat2		sys_renameat2
+317	common	seccomp			sys_seccomp
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index b0881a0ed322..1713977ee26f 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -866,4 +866,6 @@ asmlinkage long sys_process_vm_writev(pid_t pid,
 asmlinkage long sys_kcmp(pid_t pid1, pid_t pid2, int type,
 			 unsigned long idx1, unsigned long idx2);
 asmlinkage long sys_finit_module(int fd, const char __user *uargs, int flags);
+asmlinkage long sys_seccomp(unsigned int op, unsigned int flags,
+			    const char __user *uargs);
 #endif
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 333640608087..65acbf0e2867 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -699,9 +699,11 @@ __SYSCALL(__NR_sched_setattr, sys_sched_setattr)
 __SYSCALL(__NR_sched_getattr, sys_sched_getattr)
 #define __NR_renameat2 276
 __SYSCALL(__NR_renameat2, sys_renameat2)
+#define __NR_seccomp 277
+__SYSCALL(__NR_seccomp, sys_seccomp)
 
 #undef __NR_syscalls
-#define __NR_syscalls 277
+#define __NR_syscalls 278
 
 /*
  * All syscalls below here should go away really,
diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
index ac2dc9f72973..b258878ba754 100644
--- a/include/uapi/linux/seccomp.h
+++ b/include/uapi/linux/seccomp.h
@@ -10,6 +10,10 @@
 #define SECCOMP_MODE_STRICT	1 /* uses hard-coded filter. */
 #define SECCOMP_MODE_FILTER	2 /* uses user-supplied filter. */
 
+/* Valid operations for seccomp syscall. */
+#define SECCOMP_SET_MODE_STRICT	0
+#define SECCOMP_SET_MODE_FILTER	1
+
 /*
  * All BPF programs must return a 32-bit value.
  * The bottom 16-bits are for optional return data.
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 812cea2e7ffb..2f83496d6016 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -18,6 +18,7 @@
 #include <linux/compat.h>
 #include <linux/sched.h>
 #include <linux/seccomp.h>
+#include <linux/syscalls.h>
 
 /* #define SECCOMP_DEBUG 1 */
 
@@ -314,7 +315,7 @@ free_prog:
  *
  * Returns 0 on success and non-zero otherwise.
  */
-static long seccomp_attach_user_filter(char __user *user_filter)
+static long seccomp_attach_user_filter(const char __user *user_filter)
 {
 	struct sock_fprog fprog;
 	long ret = -EFAULT;
@@ -517,6 +518,7 @@ out:
 #ifdef CONFIG_SECCOMP_FILTER
 /**
  * seccomp_set_mode_filter: internal function for setting seccomp filter
+ * @flags:  flags to change filter behavior
  * @filter: struct sock_fprog containing filter
  *
  * This function may be called repeatedly to install additional filters.
@@ -527,11 +529,16 @@ out:
  *
  * Returns 0 on success or -EINVAL on failure.
  */
-static long seccomp_set_mode_filter(char __user *filter)
+static long seccomp_set_mode_filter(unsigned int flags,
+				    const char __user *filter)
 {
 	const unsigned long seccomp_mode = SECCOMP_MODE_FILTER;
 	long ret = -EINVAL;
 
+	/* Validate flags. */
+	if (flags != 0)
+		goto out;
+
 	if (!seccomp_check_mode(seccomp_mode))
 		goto out;
 
@@ -544,12 +551,35 @@ out:
 	return ret;
 }
 #else
-static inline long seccomp_set_mode_filter(char __user *filter)
+static inline long seccomp_set_mode_filter(unsigned int flags,
+					   const char __user *filter)
 {
 	return -EINVAL;
 }
 #endif
 
+/* Common entry point for both prctl and syscall. */
+static long do_seccomp(unsigned int op, unsigned int flags,
+		       const char __user *uargs)
+{
+	switch (op) {
+	case SECCOMP_SET_MODE_STRICT:
+		if (flags != 0 || uargs != NULL)
+			return -EINVAL;
+		return seccomp_set_mode_strict();
+	case SECCOMP_SET_MODE_FILTER:
+		return seccomp_set_mode_filter(flags, uargs);
+	default:
+		return -EINVAL;
+	}
+}
+
+SYSCALL_DEFINE3(seccomp, unsigned int, op, unsigned int, flags,
+			 const char __user *, uargs)
+{
+	return do_seccomp(op, flags, uargs);
+}
+
 /**
  * prctl_set_seccomp: configures current->seccomp.mode
  * @seccomp_mode: requested mode to use
@@ -559,12 +589,27 @@ static inline long seccomp_set_mode_filter(char __user *filter)
  */
 long prctl_set_seccomp(unsigned long seccomp_mode, char __user *filter)
 {
+	unsigned int op;
+	char __user *uargs;
+
 	switch (seccomp_mode) {
 	case SECCOMP_MODE_STRICT:
-		return seccomp_set_mode_strict();
+		op = SECCOMP_SET_MODE_STRICT;
+		/*
+		 * Setting strict mode through prctl always ignored filter,
+		 * so make sure it is always NULL here to pass the internal
+		 * check in do_seccomp().
+		 */
+		uargs = NULL;
+		break;
 	case SECCOMP_MODE_FILTER:
-		return seccomp_set_mode_filter(filter);
+		op = SECCOMP_SET_MODE_FILTER;
+		uargs = filter;
+		break;
 	default:
 		return -EINVAL;
 	}
+
+	/* prctl interface doesn't have flags, so they are always zero. */
+	return do_seccomp(op, 0, uargs);
 }
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 36441b51b5df..2904a2105914 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -213,3 +213,6 @@ cond_syscall(compat_sys_open_by_handle_at);
 
 /* compare kernel pointers */
 cond_syscall(sys_kcmp);
+
+/* operate on Secure Computing state */
+cond_syscall(sys_seccomp);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v9 05/11] ARM: add seccomp syscall
  2014-06-27 23:22 ` Kees Cook
  (?)
@ 2014-06-27 23:22   ` Kees Cook
  -1 siblings, 0 replies; 59+ messages in thread
From: Kees Cook @ 2014-06-27 23:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, Oleg Nesterov, Andy Lutomirski,
	Michael Kerrisk (man-pages),
	Alexei Starovoitov, Andrew Morton, Daniel Borkmann, Will Drewry,
	Julien Tinnes, David Drysdale, linux-api, x86, linux-arm-kernel,
	linux-mips, linux-arch, linux-security-module

Wires up the new seccomp syscall.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/arm/include/uapi/asm/unistd.h |    1 +
 arch/arm/kernel/calls.S            |    1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/arm/include/uapi/asm/unistd.h b/arch/arm/include/uapi/asm/unistd.h
index ba94446c72d9..e21b4a069701 100644
--- a/arch/arm/include/uapi/asm/unistd.h
+++ b/arch/arm/include/uapi/asm/unistd.h
@@ -409,6 +409,7 @@
 #define __NR_sched_setattr		(__NR_SYSCALL_BASE+380)
 #define __NR_sched_getattr		(__NR_SYSCALL_BASE+381)
 #define __NR_renameat2			(__NR_SYSCALL_BASE+382)
+#define __NR_seccomp			(__NR_SYSCALL_BASE+383)
 
 /*
  * This may need to be greater than __NR_last_syscall+1 in order to
diff --git a/arch/arm/kernel/calls.S b/arch/arm/kernel/calls.S
index 8f51bdcdacbb..bea85f97f363 100644
--- a/arch/arm/kernel/calls.S
+++ b/arch/arm/kernel/calls.S
@@ -392,6 +392,7 @@
 /* 380 */	CALL(sys_sched_setattr)
 		CALL(sys_sched_getattr)
 		CALL(sys_renameat2)
+		CALL(sys_seccomp)
 #ifndef syscalls_counted
 .equ syscalls_padding, ((NR_syscalls + 3) & ~3) - NR_syscalls
 #define syscalls_counted
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v9 05/11] ARM: add seccomp syscall
@ 2014-06-27 23:22   ` Kees Cook
  0 siblings, 0 replies; 59+ messages in thread
From: Kees Cook @ 2014-06-27 23:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, linux-mips, Will Drewry, Kees Cook,
	linux-security-module, linux-api, x86, Oleg Nesterov,
	Andy Lutomirski, Daniel Borkmann, Julien Tinnes,
	Michael Kerrisk (man-pages),
	Andrew Morton, David Drysdale, linux-arm-kernel,
	Alexei Starovoitov

Wires up the new seccomp syscall.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/arm/include/uapi/asm/unistd.h |    1 +
 arch/arm/kernel/calls.S            |    1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/arm/include/uapi/asm/unistd.h b/arch/arm/include/uapi/asm/unistd.h
index ba94446c72d9..e21b4a069701 100644
--- a/arch/arm/include/uapi/asm/unistd.h
+++ b/arch/arm/include/uapi/asm/unistd.h
@@ -409,6 +409,7 @@
 #define __NR_sched_setattr		(__NR_SYSCALL_BASE+380)
 #define __NR_sched_getattr		(__NR_SYSCALL_BASE+381)
 #define __NR_renameat2			(__NR_SYSCALL_BASE+382)
+#define __NR_seccomp			(__NR_SYSCALL_BASE+383)
 
 /*
  * This may need to be greater than __NR_last_syscall+1 in order to
diff --git a/arch/arm/kernel/calls.S b/arch/arm/kernel/calls.S
index 8f51bdcdacbb..bea85f97f363 100644
--- a/arch/arm/kernel/calls.S
+++ b/arch/arm/kernel/calls.S
@@ -392,6 +392,7 @@
 /* 380 */	CALL(sys_sched_setattr)
 		CALL(sys_sched_getattr)
 		CALL(sys_renameat2)
+		CALL(sys_seccomp)
 #ifndef syscalls_counted
 .equ syscalls_padding, ((NR_syscalls + 3) & ~3) - NR_syscalls
 #define syscalls_counted
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v9 05/11] ARM: add seccomp syscall
@ 2014-06-27 23:22   ` Kees Cook
  0 siblings, 0 replies; 59+ messages in thread
From: Kees Cook @ 2014-06-27 23:22 UTC (permalink / raw)
  To: linux-arm-kernel

Wires up the new seccomp syscall.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/arm/include/uapi/asm/unistd.h |    1 +
 arch/arm/kernel/calls.S            |    1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/arm/include/uapi/asm/unistd.h b/arch/arm/include/uapi/asm/unistd.h
index ba94446c72d9..e21b4a069701 100644
--- a/arch/arm/include/uapi/asm/unistd.h
+++ b/arch/arm/include/uapi/asm/unistd.h
@@ -409,6 +409,7 @@
 #define __NR_sched_setattr		(__NR_SYSCALL_BASE+380)
 #define __NR_sched_getattr		(__NR_SYSCALL_BASE+381)
 #define __NR_renameat2			(__NR_SYSCALL_BASE+382)
+#define __NR_seccomp			(__NR_SYSCALL_BASE+383)
 
 /*
  * This may need to be greater than __NR_last_syscall+1 in order to
diff --git a/arch/arm/kernel/calls.S b/arch/arm/kernel/calls.S
index 8f51bdcdacbb..bea85f97f363 100644
--- a/arch/arm/kernel/calls.S
+++ b/arch/arm/kernel/calls.S
@@ -392,6 +392,7 @@
 /* 380 */	CALL(sys_sched_setattr)
 		CALL(sys_sched_getattr)
 		CALL(sys_renameat2)
+		CALL(sys_seccomp)
 #ifndef syscalls_counted
 .equ syscalls_padding, ((NR_syscalls + 3) & ~3) - NR_syscalls
 #define syscalls_counted
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v9 06/11] MIPS: add seccomp syscall
  2014-06-27 23:22 ` Kees Cook
@ 2014-06-27 23:22   ` Kees Cook
  -1 siblings, 0 replies; 59+ messages in thread
From: Kees Cook @ 2014-06-27 23:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, Oleg Nesterov, Andy Lutomirski,
	Michael Kerrisk (man-pages),
	Alexei Starovoitov, Andrew Morton, Daniel Borkmann, Will Drewry,
	Julien Tinnes, David Drysdale, linux-api, x86, linux-arm-kernel,
	linux-mips, linux-arch, linux-security-module

Wires up the new seccomp syscall.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/mips/include/uapi/asm/unistd.h |   15 +++++++++------
 arch/mips/kernel/scall32-o32.S      |    1 +
 arch/mips/kernel/scall64-64.S       |    1 +
 arch/mips/kernel/scall64-n32.S      |    1 +
 arch/mips/kernel/scall64-o32.S      |    1 +
 5 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/arch/mips/include/uapi/asm/unistd.h b/arch/mips/include/uapi/asm/unistd.h
index 5805414777e0..9bc13eaf9d67 100644
--- a/arch/mips/include/uapi/asm/unistd.h
+++ b/arch/mips/include/uapi/asm/unistd.h
@@ -372,16 +372,17 @@
 #define __NR_sched_setattr		(__NR_Linux + 349)
 #define __NR_sched_getattr		(__NR_Linux + 350)
 #define __NR_renameat2			(__NR_Linux + 351)
+#define __NR_seccomp			(__NR_Linux + 352)
 
 /*
  * Offset of the last Linux o32 flavoured syscall
  */
-#define __NR_Linux_syscalls		351
+#define __NR_Linux_syscalls		352
 
 #endif /* _MIPS_SIM == _MIPS_SIM_ABI32 */
 
 #define __NR_O32_Linux			4000
-#define __NR_O32_Linux_syscalls		351
+#define __NR_O32_Linux_syscalls		352
 
 #if _MIPS_SIM == _MIPS_SIM_ABI64
 
@@ -701,16 +702,17 @@
 #define __NR_sched_setattr		(__NR_Linux + 309)
 #define __NR_sched_getattr		(__NR_Linux + 310)
 #define __NR_renameat2			(__NR_Linux + 311)
+#define __NR_seccomp			(__NR_Linux + 312)
 
 /*
  * Offset of the last Linux 64-bit flavoured syscall
  */
-#define __NR_Linux_syscalls		311
+#define __NR_Linux_syscalls		312
 
 #endif /* _MIPS_SIM == _MIPS_SIM_ABI64 */
 
 #define __NR_64_Linux			5000
-#define __NR_64_Linux_syscalls		311
+#define __NR_64_Linux_syscalls		312
 
 #if _MIPS_SIM == _MIPS_SIM_NABI32
 
@@ -1034,15 +1036,16 @@
 #define __NR_sched_setattr		(__NR_Linux + 313)
 #define __NR_sched_getattr		(__NR_Linux + 314)
 #define __NR_renameat2			(__NR_Linux + 315)
+#define __NR_seccomp			(__NR_Linux + 316)
 
 /*
  * Offset of the last N32 flavoured syscall
  */
-#define __NR_Linux_syscalls		315
+#define __NR_Linux_syscalls		316
 
 #endif /* _MIPS_SIM == _MIPS_SIM_NABI32 */
 
 #define __NR_N32_Linux			6000
-#define __NR_N32_Linux_syscalls		315
+#define __NR_N32_Linux_syscalls		316
 
 #endif /* _UAPI_ASM_UNISTD_H */
diff --git a/arch/mips/kernel/scall32-o32.S b/arch/mips/kernel/scall32-o32.S
index 3245474f19d5..ab02d14f1b5c 100644
--- a/arch/mips/kernel/scall32-o32.S
+++ b/arch/mips/kernel/scall32-o32.S
@@ -578,3 +578,4 @@ EXPORT(sys_call_table)
 	PTR	sys_sched_setattr
 	PTR	sys_sched_getattr		/* 4350 */
 	PTR	sys_renameat2
+	PTR	sys_seccomp
diff --git a/arch/mips/kernel/scall64-64.S b/arch/mips/kernel/scall64-64.S
index be2fedd4ae33..010dccf128ec 100644
--- a/arch/mips/kernel/scall64-64.S
+++ b/arch/mips/kernel/scall64-64.S
@@ -431,4 +431,5 @@ EXPORT(sys_call_table)
 	PTR	sys_sched_setattr
 	PTR	sys_sched_getattr		/* 5310 */
 	PTR	sys_renameat2
+	PTR	sys_seccomp
 	.size	sys_call_table,.-sys_call_table
diff --git a/arch/mips/kernel/scall64-n32.S b/arch/mips/kernel/scall64-n32.S
index c1dbcda4b816..c3b3b6525df5 100644
--- a/arch/mips/kernel/scall64-n32.S
+++ b/arch/mips/kernel/scall64-n32.S
@@ -424,4 +424,5 @@ EXPORT(sysn32_call_table)
 	PTR	sys_sched_setattr
 	PTR	sys_sched_getattr
 	PTR	sys_renameat2			/* 6315 */
+	PTR	sys_seccomp
 	.size	sysn32_call_table,.-sysn32_call_table
diff --git a/arch/mips/kernel/scall64-o32.S b/arch/mips/kernel/scall64-o32.S
index f1343ccd7ed7..bb1550b1f501 100644
--- a/arch/mips/kernel/scall64-o32.S
+++ b/arch/mips/kernel/scall64-o32.S
@@ -557,4 +557,5 @@ EXPORT(sys32_call_table)
 	PTR	sys_sched_setattr
 	PTR	sys_sched_getattr		/* 4350 */
 	PTR	sys_renameat2
+	PTR	sys_seccomp
 	.size	sys32_call_table,.-sys32_call_table
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v9 06/11] MIPS: add seccomp syscall
@ 2014-06-27 23:22   ` Kees Cook
  0 siblings, 0 replies; 59+ messages in thread
From: Kees Cook @ 2014-06-27 23:22 UTC (permalink / raw)
  To: linux-arm-kernel

Wires up the new seccomp syscall.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 arch/mips/include/uapi/asm/unistd.h |   15 +++++++++------
 arch/mips/kernel/scall32-o32.S      |    1 +
 arch/mips/kernel/scall64-64.S       |    1 +
 arch/mips/kernel/scall64-n32.S      |    1 +
 arch/mips/kernel/scall64-o32.S      |    1 +
 5 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/arch/mips/include/uapi/asm/unistd.h b/arch/mips/include/uapi/asm/unistd.h
index 5805414777e0..9bc13eaf9d67 100644
--- a/arch/mips/include/uapi/asm/unistd.h
+++ b/arch/mips/include/uapi/asm/unistd.h
@@ -372,16 +372,17 @@
 #define __NR_sched_setattr		(__NR_Linux + 349)
 #define __NR_sched_getattr		(__NR_Linux + 350)
 #define __NR_renameat2			(__NR_Linux + 351)
+#define __NR_seccomp			(__NR_Linux + 352)
 
 /*
  * Offset of the last Linux o32 flavoured syscall
  */
-#define __NR_Linux_syscalls		351
+#define __NR_Linux_syscalls		352
 
 #endif /* _MIPS_SIM == _MIPS_SIM_ABI32 */
 
 #define __NR_O32_Linux			4000
-#define __NR_O32_Linux_syscalls		351
+#define __NR_O32_Linux_syscalls		352
 
 #if _MIPS_SIM == _MIPS_SIM_ABI64
 
@@ -701,16 +702,17 @@
 #define __NR_sched_setattr		(__NR_Linux + 309)
 #define __NR_sched_getattr		(__NR_Linux + 310)
 #define __NR_renameat2			(__NR_Linux + 311)
+#define __NR_seccomp			(__NR_Linux + 312)
 
 /*
  * Offset of the last Linux 64-bit flavoured syscall
  */
-#define __NR_Linux_syscalls		311
+#define __NR_Linux_syscalls		312
 
 #endif /* _MIPS_SIM == _MIPS_SIM_ABI64 */
 
 #define __NR_64_Linux			5000
-#define __NR_64_Linux_syscalls		311
+#define __NR_64_Linux_syscalls		312
 
 #if _MIPS_SIM == _MIPS_SIM_NABI32
 
@@ -1034,15 +1036,16 @@
 #define __NR_sched_setattr		(__NR_Linux + 313)
 #define __NR_sched_getattr		(__NR_Linux + 314)
 #define __NR_renameat2			(__NR_Linux + 315)
+#define __NR_seccomp			(__NR_Linux + 316)
 
 /*
  * Offset of the last N32 flavoured syscall
  */
-#define __NR_Linux_syscalls		315
+#define __NR_Linux_syscalls		316
 
 #endif /* _MIPS_SIM == _MIPS_SIM_NABI32 */
 
 #define __NR_N32_Linux			6000
-#define __NR_N32_Linux_syscalls		315
+#define __NR_N32_Linux_syscalls		316
 
 #endif /* _UAPI_ASM_UNISTD_H */
diff --git a/arch/mips/kernel/scall32-o32.S b/arch/mips/kernel/scall32-o32.S
index 3245474f19d5..ab02d14f1b5c 100644
--- a/arch/mips/kernel/scall32-o32.S
+++ b/arch/mips/kernel/scall32-o32.S
@@ -578,3 +578,4 @@ EXPORT(sys_call_table)
 	PTR	sys_sched_setattr
 	PTR	sys_sched_getattr		/* 4350 */
 	PTR	sys_renameat2
+	PTR	sys_seccomp
diff --git a/arch/mips/kernel/scall64-64.S b/arch/mips/kernel/scall64-64.S
index be2fedd4ae33..010dccf128ec 100644
--- a/arch/mips/kernel/scall64-64.S
+++ b/arch/mips/kernel/scall64-64.S
@@ -431,4 +431,5 @@ EXPORT(sys_call_table)
 	PTR	sys_sched_setattr
 	PTR	sys_sched_getattr		/* 5310 */
 	PTR	sys_renameat2
+	PTR	sys_seccomp
 	.size	sys_call_table,.-sys_call_table
diff --git a/arch/mips/kernel/scall64-n32.S b/arch/mips/kernel/scall64-n32.S
index c1dbcda4b816..c3b3b6525df5 100644
--- a/arch/mips/kernel/scall64-n32.S
+++ b/arch/mips/kernel/scall64-n32.S
@@ -424,4 +424,5 @@ EXPORT(sysn32_call_table)
 	PTR	sys_sched_setattr
 	PTR	sys_sched_getattr
 	PTR	sys_renameat2			/* 6315 */
+	PTR	sys_seccomp
 	.size	sysn32_call_table,.-sysn32_call_table
diff --git a/arch/mips/kernel/scall64-o32.S b/arch/mips/kernel/scall64-o32.S
index f1343ccd7ed7..bb1550b1f501 100644
--- a/arch/mips/kernel/scall64-o32.S
+++ b/arch/mips/kernel/scall64-o32.S
@@ -557,4 +557,5 @@ EXPORT(sys32_call_table)
 	PTR	sys_sched_setattr
 	PTR	sys_sched_getattr		/* 4350 */
 	PTR	sys_renameat2
+	PTR	sys_seccomp
 	.size	sys32_call_table,.-sys32_call_table
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v9 07/11] sched: move no_new_privs into new atomic flags
  2014-06-27 23:22 ` Kees Cook
@ 2014-06-27 23:22   ` Kees Cook
  -1 siblings, 0 replies; 59+ messages in thread
From: Kees Cook @ 2014-06-27 23:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, Oleg Nesterov, Andy Lutomirski,
	Michael Kerrisk (man-pages),
	Alexei Starovoitov, Andrew Morton, Daniel Borkmann, Will Drewry,
	Julien Tinnes, David Drysdale, linux-api, x86, linux-arm-kernel,
	linux-mips, linux-arch, linux-security-module

Since seccomp transitions between threads requires updates to the
no_new_privs flag to be atomic, the flag must be part of an atomic flag
set. This moves the nnp flag into a separate task field, and introduces
accessors.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/exec.c                  |    4 ++--
 include/linux/sched.h      |   18 +++++++++++++++---
 kernel/seccomp.c           |    2 +-
 kernel/sys.c               |    4 ++--
 security/apparmor/domain.c |    4 ++--
 5 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index a3d33fe592d6..0f5c272410f6 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1234,7 +1234,7 @@ static void check_unsafe_exec(struct linux_binprm *bprm)
 	 * This isn't strictly necessary, but it makes it harder for LSMs to
 	 * mess up.
 	 */
-	if (current->no_new_privs)
+	if (task_no_new_privs(current))
 		bprm->unsafe |= LSM_UNSAFE_NO_NEW_PRIVS;
 
 	t = p;
@@ -1272,7 +1272,7 @@ int prepare_binprm(struct linux_binprm *bprm)
 	bprm->cred->egid = current_egid();
 
 	if (!(bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID) &&
-	    !current->no_new_privs &&
+	    !task_no_new_privs(current) &&
 	    kuid_has_mapping(bprm->cred->user_ns, inode->i_uid) &&
 	    kgid_has_mapping(bprm->cred->user_ns, inode->i_gid)) {
 		/* Set-uid? */
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 306f4f0c987a..0fd19055bb64 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1307,13 +1307,12 @@ struct task_struct {
 				 * execve */
 	unsigned in_iowait:1;
 
-	/* task may not gain privileges */
-	unsigned no_new_privs:1;
-
 	/* Revert to default priority/policy when forking */
 	unsigned sched_reset_on_fork:1;
 	unsigned sched_contributes_to_load:1;
 
+	unsigned long atomic_flags; /* Flags needing atomic access. */
+
 	pid_t pid;
 	pid_t tgid;
 
@@ -1967,6 +1966,19 @@ static inline void memalloc_noio_restore(unsigned int flags)
 	current->flags = (current->flags & ~PF_MEMALLOC_NOIO) | flags;
 }
 
+/* Per-process atomic flags. */
+#define PFA_NO_NEW_PRIVS 0x00000001	/* May not gain new privileges. */
+
+static inline bool task_no_new_privs(struct task_struct *p)
+{
+	return test_bit(PFA_NO_NEW_PRIVS, &p->atomic_flags);
+}
+
+static inline void task_set_no_new_privs(struct task_struct *p)
+{
+	set_bit(PFA_NO_NEW_PRIVS, &p->atomic_flags);
+}
+
 /*
  * task->jobctl flags
  */
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 2f83496d6016..137e40c7ae3b 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -241,7 +241,7 @@ static long seccomp_attach_filter(struct sock_fprog *fprog)
 	 * This avoids scenarios where unprivileged tasks can affect the
 	 * behavior of privileged children.
 	 */
-	if (!current->no_new_privs &&
+	if (!task_no_new_privs(current) &&
 	    security_capable_noaudit(current_cred(), current_user_ns(),
 				     CAP_SYS_ADMIN) != 0)
 		return -EACCES;
diff --git a/kernel/sys.c b/kernel/sys.c
index 66a751ebf9d9..ce8129192a26 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1990,12 +1990,12 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 		if (arg2 != 1 || arg3 || arg4 || arg5)
 			return -EINVAL;
 
-		current->no_new_privs = 1;
+		task_set_no_new_privs(current);
 		break;
 	case PR_GET_NO_NEW_PRIVS:
 		if (arg2 || arg3 || arg4 || arg5)
 			return -EINVAL;
-		return current->no_new_privs ? 1 : 0;
+		return task_no_new_privs(current) ? 1 : 0;
 	case PR_GET_THP_DISABLE:
 		if (arg2 || arg3 || arg4 || arg5)
 			return -EINVAL;
diff --git a/security/apparmor/domain.c b/security/apparmor/domain.c
index 452567d3a08e..d97cba3e3849 100644
--- a/security/apparmor/domain.c
+++ b/security/apparmor/domain.c
@@ -621,7 +621,7 @@ int aa_change_hat(const char *hats[], int count, u64 token, bool permtest)
 	 * There is no exception for unconfined as change_hat is not
 	 * available.
 	 */
-	if (current->no_new_privs)
+	if (task_no_new_privs(current))
 		return -EPERM;
 
 	/* released below */
@@ -776,7 +776,7 @@ int aa_change_profile(const char *ns_name, const char *hname, bool onexec,
 	 * no_new_privs is set because this aways results in a reduction
 	 * of permissions.
 	 */
-	if (current->no_new_privs && !unconfined(profile)) {
+	if (task_no_new_privs(current) && !unconfined(profile)) {
 		put_cred(cred);
 		return -EPERM;
 	}
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v9 07/11] sched: move no_new_privs into new atomic flags
@ 2014-06-27 23:22   ` Kees Cook
  0 siblings, 0 replies; 59+ messages in thread
From: Kees Cook @ 2014-06-27 23:22 UTC (permalink / raw)
  To: linux-arm-kernel

Since seccomp transitions between threads requires updates to the
no_new_privs flag to be atomic, the flag must be part of an atomic flag
set. This moves the nnp flag into a separate task field, and introduces
accessors.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/exec.c                  |    4 ++--
 include/linux/sched.h      |   18 +++++++++++++++---
 kernel/seccomp.c           |    2 +-
 kernel/sys.c               |    4 ++--
 security/apparmor/domain.c |    4 ++--
 5 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index a3d33fe592d6..0f5c272410f6 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1234,7 +1234,7 @@ static void check_unsafe_exec(struct linux_binprm *bprm)
 	 * This isn't strictly necessary, but it makes it harder for LSMs to
 	 * mess up.
 	 */
-	if (current->no_new_privs)
+	if (task_no_new_privs(current))
 		bprm->unsafe |= LSM_UNSAFE_NO_NEW_PRIVS;
 
 	t = p;
@@ -1272,7 +1272,7 @@ int prepare_binprm(struct linux_binprm *bprm)
 	bprm->cred->egid = current_egid();
 
 	if (!(bprm->file->f_path.mnt->mnt_flags & MNT_NOSUID) &&
-	    !current->no_new_privs &&
+	    !task_no_new_privs(current) &&
 	    kuid_has_mapping(bprm->cred->user_ns, inode->i_uid) &&
 	    kgid_has_mapping(bprm->cred->user_ns, inode->i_gid)) {
 		/* Set-uid? */
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 306f4f0c987a..0fd19055bb64 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1307,13 +1307,12 @@ struct task_struct {
 				 * execve */
 	unsigned in_iowait:1;
 
-	/* task may not gain privileges */
-	unsigned no_new_privs:1;
-
 	/* Revert to default priority/policy when forking */
 	unsigned sched_reset_on_fork:1;
 	unsigned sched_contributes_to_load:1;
 
+	unsigned long atomic_flags; /* Flags needing atomic access. */
+
 	pid_t pid;
 	pid_t tgid;
 
@@ -1967,6 +1966,19 @@ static inline void memalloc_noio_restore(unsigned int flags)
 	current->flags = (current->flags & ~PF_MEMALLOC_NOIO) | flags;
 }
 
+/* Per-process atomic flags. */
+#define PFA_NO_NEW_PRIVS 0x00000001	/* May not gain new privileges. */
+
+static inline bool task_no_new_privs(struct task_struct *p)
+{
+	return test_bit(PFA_NO_NEW_PRIVS, &p->atomic_flags);
+}
+
+static inline void task_set_no_new_privs(struct task_struct *p)
+{
+	set_bit(PFA_NO_NEW_PRIVS, &p->atomic_flags);
+}
+
 /*
  * task->jobctl flags
  */
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 2f83496d6016..137e40c7ae3b 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -241,7 +241,7 @@ static long seccomp_attach_filter(struct sock_fprog *fprog)
 	 * This avoids scenarios where unprivileged tasks can affect the
 	 * behavior of privileged children.
 	 */
-	if (!current->no_new_privs &&
+	if (!task_no_new_privs(current) &&
 	    security_capable_noaudit(current_cred(), current_user_ns(),
 				     CAP_SYS_ADMIN) != 0)
 		return -EACCES;
diff --git a/kernel/sys.c b/kernel/sys.c
index 66a751ebf9d9..ce8129192a26 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1990,12 +1990,12 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 		if (arg2 != 1 || arg3 || arg4 || arg5)
 			return -EINVAL;
 
-		current->no_new_privs = 1;
+		task_set_no_new_privs(current);
 		break;
 	case PR_GET_NO_NEW_PRIVS:
 		if (arg2 || arg3 || arg4 || arg5)
 			return -EINVAL;
-		return current->no_new_privs ? 1 : 0;
+		return task_no_new_privs(current) ? 1 : 0;
 	case PR_GET_THP_DISABLE:
 		if (arg2 || arg3 || arg4 || arg5)
 			return -EINVAL;
diff --git a/security/apparmor/domain.c b/security/apparmor/domain.c
index 452567d3a08e..d97cba3e3849 100644
--- a/security/apparmor/domain.c
+++ b/security/apparmor/domain.c
@@ -621,7 +621,7 @@ int aa_change_hat(const char *hats[], int count, u64 token, bool permtest)
 	 * There is no exception for unconfined as change_hat is not
 	 * available.
 	 */
-	if (current->no_new_privs)
+	if (task_no_new_privs(current))
 		return -EPERM;
 
 	/* released below */
@@ -776,7 +776,7 @@ int aa_change_profile(const char *ns_name, const char *hname, bool onexec,
 	 * no_new_privs is set because this aways results in a reduction
 	 * of permissions.
 	 */
-	if (current->no_new_privs && !unconfined(profile)) {
+	if (task_no_new_privs(current) && !unconfined(profile)) {
 		put_cred(cred);
 		return -EPERM;
 	}
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v9 08/11] seccomp: split filter prep from check and apply
  2014-06-27 23:22 ` Kees Cook
  (?)
@ 2014-06-27 23:22   ` Kees Cook
  -1 siblings, 0 replies; 59+ messages in thread
From: Kees Cook @ 2014-06-27 23:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, Oleg Nesterov, Andy Lutomirski,
	Michael Kerrisk (man-pages),
	Alexei Starovoitov, Andrew Morton, Daniel Borkmann, Will Drewry,
	Julien Tinnes, David Drysdale, linux-api, x86, linux-arm-kernel,
	linux-mips, linux-arch, linux-security-module

In preparation for adding seccomp locking, move filter creation away
from where it is checked and applied. This will allow for locking where
no memory allocation is happening. The validation, filter attachment,
and seccomp mode setting can all happen under the future locks.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 kernel/seccomp.c |   93 +++++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 64 insertions(+), 29 deletions(-)

diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 137e40c7ae3b..502e54d7f86d 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -18,6 +18,7 @@
 #include <linux/compat.h>
 #include <linux/sched.h>
 #include <linux/seccomp.h>
+#include <linux/slab.h>
 #include <linux/syscalls.h>
 
 /* #define SECCOMP_DEBUG 1 */
@@ -27,7 +28,6 @@
 #include <linux/filter.h>
 #include <linux/ptrace.h>
 #include <linux/security.h>
-#include <linux/slab.h>
 #include <linux/tracehook.h>
 #include <linux/uaccess.h>
 
@@ -213,27 +213,21 @@ static inline void seccomp_assign_mode(unsigned long seccomp_mode)
 
 #ifdef CONFIG_SECCOMP_FILTER
 /**
- * seccomp_attach_filter: Attaches a seccomp filter to current.
+ * seccomp_prepare_filter: Prepares a seccomp filter for use.
  * @fprog: BPF program to install
  *
- * Returns 0 on success or an errno on failure.
+ * Returns filter on success or an ERR_PTR on failure.
  */
-static long seccomp_attach_filter(struct sock_fprog *fprog)
+static struct seccomp_filter *seccomp_prepare_filter(struct sock_fprog *fprog)
 {
 	struct seccomp_filter *filter;
 	unsigned long fp_size = fprog->len * sizeof(struct sock_filter);
-	unsigned long total_insns = fprog->len;
 	struct sock_filter *fp;
 	int new_len;
 	long ret;
 
 	if (fprog->len == 0 || fprog->len > BPF_MAXINSNS)
-		return -EINVAL;
-
-	for (filter = current->seccomp.filter; filter; filter = filter->prev)
-		total_insns += filter->prog->len + 4;  /* include a 4 instr penalty */
-	if (total_insns > MAX_INSNS_PER_PATH)
-		return -ENOMEM;
+		return ERR_PTR(-EINVAL);
 
 	/*
 	 * Installing a seccomp filter requires that the task has
@@ -244,11 +238,11 @@ static long seccomp_attach_filter(struct sock_fprog *fprog)
 	if (!task_no_new_privs(current) &&
 	    security_capable_noaudit(current_cred(), current_user_ns(),
 				     CAP_SYS_ADMIN) != 0)
-		return -EACCES;
+		return ERR_PTR(-EACCES);
 
 	fp = kzalloc(fp_size, GFP_KERNEL|__GFP_NOWARN);
 	if (!fp)
-		return -ENOMEM;
+		return ERR_PTR(-ENOMEM);
 
 	/* Copy the instructions from fprog. */
 	ret = -EFAULT;
@@ -292,13 +286,7 @@ static long seccomp_attach_filter(struct sock_fprog *fprog)
 
 	sk_filter_select_runtime(filter->prog);
 
-	/*
-	 * If there is an existing filter, make it the prev and don't drop its
-	 * task reference.
-	 */
-	filter->prev = current->seccomp.filter;
-	current->seccomp.filter = filter;
-	return 0;
+	return filter;
 
 free_filter_prog:
 	kfree(filter->prog);
@@ -306,19 +294,20 @@ free_filter:
 	kfree(filter);
 free_prog:
 	kfree(fp);
-	return ret;
+	return ERR_PTR(ret);
 }
 
 /**
- * seccomp_attach_user_filter - attaches a user-supplied sock_fprog
+ * seccomp_prepare_user_filter - prepares a user-supplied sock_fprog
  * @user_filter: pointer to the user data containing a sock_fprog.
  *
  * Returns 0 on success and non-zero otherwise.
  */
-static long seccomp_attach_user_filter(const char __user *user_filter)
+static struct seccomp_filter *
+seccomp_prepare_user_filter(const char __user *user_filter)
 {
 	struct sock_fprog fprog;
-	long ret = -EFAULT;
+	struct seccomp_filter *filter = ERR_PTR(-EFAULT);
 
 #ifdef CONFIG_COMPAT
 	if (is_compat_task()) {
@@ -331,9 +320,39 @@ static long seccomp_attach_user_filter(const char __user *user_filter)
 #endif
 	if (copy_from_user(&fprog, user_filter, sizeof(fprog)))
 		goto out;
-	ret = seccomp_attach_filter(&fprog);
+	filter = seccomp_prepare_filter(&fprog);
 out:
-	return ret;
+	return filter;
+}
+
+/**
+ * seccomp_attach_filter: validate and attach filter
+ * @flags:  flags to change filter behavior
+ * @filter: seccomp filter to add to the current process
+ *
+ * Returns 0 on success, -ve on error.
+ */
+static long seccomp_attach_filter(unsigned int flags,
+				  struct seccomp_filter *filter)
+{
+	unsigned long total_insns;
+	struct seccomp_filter *walker;
+
+	/* Validate resulting filter length. */
+	total_insns = filter->prog->len;
+	for (walker = current->seccomp.filter; walker; walker = walker->prev)
+		total_insns += walker->prog->len + 4;  /* 4 instr penalty */
+	if (total_insns > MAX_INSNS_PER_PATH)
+		return -ENOMEM;
+
+	/*
+	 * If there is an existing filter, make it the prev and don't drop its
+	 * task reference.
+	 */
+	filter->prev = current->seccomp.filter;
+	current->seccomp.filter = filter;
+
+	return 0;
 }
 
 /* get_seccomp_filter - increments the reference count of the filter on @tsk */
@@ -346,6 +365,14 @@ void get_seccomp_filter(struct task_struct *tsk)
 	atomic_inc(&orig->usage);
 }
 
+static inline void seccomp_filter_free(struct seccomp_filter *filter)
+{
+	if (filter) {
+		sk_filter_free(filter->prog);
+		kfree(filter);
+	}
+}
+
 /* put_seccomp_filter - decrements the ref count of tsk->seccomp.filter */
 void put_seccomp_filter(struct task_struct *tsk)
 {
@@ -354,8 +381,7 @@ void put_seccomp_filter(struct task_struct *tsk)
 	while (orig && atomic_dec_and_test(&orig->usage)) {
 		struct seccomp_filter *freeme = orig;
 		orig = orig->prev;
-		sk_filter_free(freeme->prog);
-		kfree(freeme);
+		seccomp_filter_free(freeme);
 	}
 }
 
@@ -533,21 +559,30 @@ static long seccomp_set_mode_filter(unsigned int flags,
 				    const char __user *filter)
 {
 	const unsigned long seccomp_mode = SECCOMP_MODE_FILTER;
+	struct seccomp_filter *prepared = NULL;
 	long ret = -EINVAL;
 
 	/* Validate flags. */
 	if (flags != 0)
 		goto out;
 
+	/* Prepare the new filter before holding any locks. */
+	prepared = seccomp_prepare_user_filter(filter);
+	if (IS_ERR(prepared))
+		return PTR_ERR(prepared);
+
 	if (!seccomp_check_mode(seccomp_mode))
 		goto out;
 
-	ret = seccomp_attach_user_filter(filter);
+	ret = seccomp_attach_filter(flags, prepared);
 	if (ret)
 		goto out;
+	/* Do not free the successfully attached filter. */
+	prepared = NULL;
 
 	seccomp_assign_mode(seccomp_mode);
 out:
+	seccomp_filter_free(prepared);
 	return ret;
 }
 #else
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v9 08/11] seccomp: split filter prep from check and apply
@ 2014-06-27 23:22   ` Kees Cook
  0 siblings, 0 replies; 59+ messages in thread
From: Kees Cook @ 2014-06-27 23:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, linux-mips, Will Drewry, Kees Cook,
	linux-security-module, linux-api, x86, Oleg Nesterov,
	Andy Lutomirski, Daniel Borkmann, Julien Tinnes,
	Michael Kerrisk (man-pages),
	Andrew Morton, David Drysdale, linux-arm-kernel,
	Alexei Starovoitov

In preparation for adding seccomp locking, move filter creation away
from where it is checked and applied. This will allow for locking where
no memory allocation is happening. The validation, filter attachment,
and seccomp mode setting can all happen under the future locks.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 kernel/seccomp.c |   93 +++++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 64 insertions(+), 29 deletions(-)

diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 137e40c7ae3b..502e54d7f86d 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -18,6 +18,7 @@
 #include <linux/compat.h>
 #include <linux/sched.h>
 #include <linux/seccomp.h>
+#include <linux/slab.h>
 #include <linux/syscalls.h>
 
 /* #define SECCOMP_DEBUG 1 */
@@ -27,7 +28,6 @@
 #include <linux/filter.h>
 #include <linux/ptrace.h>
 #include <linux/security.h>
-#include <linux/slab.h>
 #include <linux/tracehook.h>
 #include <linux/uaccess.h>
 
@@ -213,27 +213,21 @@ static inline void seccomp_assign_mode(unsigned long seccomp_mode)
 
 #ifdef CONFIG_SECCOMP_FILTER
 /**
- * seccomp_attach_filter: Attaches a seccomp filter to current.
+ * seccomp_prepare_filter: Prepares a seccomp filter for use.
  * @fprog: BPF program to install
  *
- * Returns 0 on success or an errno on failure.
+ * Returns filter on success or an ERR_PTR on failure.
  */
-static long seccomp_attach_filter(struct sock_fprog *fprog)
+static struct seccomp_filter *seccomp_prepare_filter(struct sock_fprog *fprog)
 {
 	struct seccomp_filter *filter;
 	unsigned long fp_size = fprog->len * sizeof(struct sock_filter);
-	unsigned long total_insns = fprog->len;
 	struct sock_filter *fp;
 	int new_len;
 	long ret;
 
 	if (fprog->len == 0 || fprog->len > BPF_MAXINSNS)
-		return -EINVAL;
-
-	for (filter = current->seccomp.filter; filter; filter = filter->prev)
-		total_insns += filter->prog->len + 4;  /* include a 4 instr penalty */
-	if (total_insns > MAX_INSNS_PER_PATH)
-		return -ENOMEM;
+		return ERR_PTR(-EINVAL);
 
 	/*
 	 * Installing a seccomp filter requires that the task has
@@ -244,11 +238,11 @@ static long seccomp_attach_filter(struct sock_fprog *fprog)
 	if (!task_no_new_privs(current) &&
 	    security_capable_noaudit(current_cred(), current_user_ns(),
 				     CAP_SYS_ADMIN) != 0)
-		return -EACCES;
+		return ERR_PTR(-EACCES);
 
 	fp = kzalloc(fp_size, GFP_KERNEL|__GFP_NOWARN);
 	if (!fp)
-		return -ENOMEM;
+		return ERR_PTR(-ENOMEM);
 
 	/* Copy the instructions from fprog. */
 	ret = -EFAULT;
@@ -292,13 +286,7 @@ static long seccomp_attach_filter(struct sock_fprog *fprog)
 
 	sk_filter_select_runtime(filter->prog);
 
-	/*
-	 * If there is an existing filter, make it the prev and don't drop its
-	 * task reference.
-	 */
-	filter->prev = current->seccomp.filter;
-	current->seccomp.filter = filter;
-	return 0;
+	return filter;
 
 free_filter_prog:
 	kfree(filter->prog);
@@ -306,19 +294,20 @@ free_filter:
 	kfree(filter);
 free_prog:
 	kfree(fp);
-	return ret;
+	return ERR_PTR(ret);
 }
 
 /**
- * seccomp_attach_user_filter - attaches a user-supplied sock_fprog
+ * seccomp_prepare_user_filter - prepares a user-supplied sock_fprog
  * @user_filter: pointer to the user data containing a sock_fprog.
  *
  * Returns 0 on success and non-zero otherwise.
  */
-static long seccomp_attach_user_filter(const char __user *user_filter)
+static struct seccomp_filter *
+seccomp_prepare_user_filter(const char __user *user_filter)
 {
 	struct sock_fprog fprog;
-	long ret = -EFAULT;
+	struct seccomp_filter *filter = ERR_PTR(-EFAULT);
 
 #ifdef CONFIG_COMPAT
 	if (is_compat_task()) {
@@ -331,9 +320,39 @@ static long seccomp_attach_user_filter(const char __user *user_filter)
 #endif
 	if (copy_from_user(&fprog, user_filter, sizeof(fprog)))
 		goto out;
-	ret = seccomp_attach_filter(&fprog);
+	filter = seccomp_prepare_filter(&fprog);
 out:
-	return ret;
+	return filter;
+}
+
+/**
+ * seccomp_attach_filter: validate and attach filter
+ * @flags:  flags to change filter behavior
+ * @filter: seccomp filter to add to the current process
+ *
+ * Returns 0 on success, -ve on error.
+ */
+static long seccomp_attach_filter(unsigned int flags,
+				  struct seccomp_filter *filter)
+{
+	unsigned long total_insns;
+	struct seccomp_filter *walker;
+
+	/* Validate resulting filter length. */
+	total_insns = filter->prog->len;
+	for (walker = current->seccomp.filter; walker; walker = walker->prev)
+		total_insns += walker->prog->len + 4;  /* 4 instr penalty */
+	if (total_insns > MAX_INSNS_PER_PATH)
+		return -ENOMEM;
+
+	/*
+	 * If there is an existing filter, make it the prev and don't drop its
+	 * task reference.
+	 */
+	filter->prev = current->seccomp.filter;
+	current->seccomp.filter = filter;
+
+	return 0;
 }
 
 /* get_seccomp_filter - increments the reference count of the filter on @tsk */
@@ -346,6 +365,14 @@ void get_seccomp_filter(struct task_struct *tsk)
 	atomic_inc(&orig->usage);
 }
 
+static inline void seccomp_filter_free(struct seccomp_filter *filter)
+{
+	if (filter) {
+		sk_filter_free(filter->prog);
+		kfree(filter);
+	}
+}
+
 /* put_seccomp_filter - decrements the ref count of tsk->seccomp.filter */
 void put_seccomp_filter(struct task_struct *tsk)
 {
@@ -354,8 +381,7 @@ void put_seccomp_filter(struct task_struct *tsk)
 	while (orig && atomic_dec_and_test(&orig->usage)) {
 		struct seccomp_filter *freeme = orig;
 		orig = orig->prev;
-		sk_filter_free(freeme->prog);
-		kfree(freeme);
+		seccomp_filter_free(freeme);
 	}
 }
 
@@ -533,21 +559,30 @@ static long seccomp_set_mode_filter(unsigned int flags,
 				    const char __user *filter)
 {
 	const unsigned long seccomp_mode = SECCOMP_MODE_FILTER;
+	struct seccomp_filter *prepared = NULL;
 	long ret = -EINVAL;
 
 	/* Validate flags. */
 	if (flags != 0)
 		goto out;
 
+	/* Prepare the new filter before holding any locks. */
+	prepared = seccomp_prepare_user_filter(filter);
+	if (IS_ERR(prepared))
+		return PTR_ERR(prepared);
+
 	if (!seccomp_check_mode(seccomp_mode))
 		goto out;
 
-	ret = seccomp_attach_user_filter(filter);
+	ret = seccomp_attach_filter(flags, prepared);
 	if (ret)
 		goto out;
+	/* Do not free the successfully attached filter. */
+	prepared = NULL;
 
 	seccomp_assign_mode(seccomp_mode);
 out:
+	seccomp_filter_free(prepared);
 	return ret;
 }
 #else
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v9 08/11] seccomp: split filter prep from check and apply
@ 2014-06-27 23:22   ` Kees Cook
  0 siblings, 0 replies; 59+ messages in thread
From: Kees Cook @ 2014-06-27 23:22 UTC (permalink / raw)
  To: linux-arm-kernel

In preparation for adding seccomp locking, move filter creation away
from where it is checked and applied. This will allow for locking where
no memory allocation is happening. The validation, filter attachment,
and seccomp mode setting can all happen under the future locks.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 kernel/seccomp.c |   93 +++++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 64 insertions(+), 29 deletions(-)

diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 137e40c7ae3b..502e54d7f86d 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -18,6 +18,7 @@
 #include <linux/compat.h>
 #include <linux/sched.h>
 #include <linux/seccomp.h>
+#include <linux/slab.h>
 #include <linux/syscalls.h>
 
 /* #define SECCOMP_DEBUG 1 */
@@ -27,7 +28,6 @@
 #include <linux/filter.h>
 #include <linux/ptrace.h>
 #include <linux/security.h>
-#include <linux/slab.h>
 #include <linux/tracehook.h>
 #include <linux/uaccess.h>
 
@@ -213,27 +213,21 @@ static inline void seccomp_assign_mode(unsigned long seccomp_mode)
 
 #ifdef CONFIG_SECCOMP_FILTER
 /**
- * seccomp_attach_filter: Attaches a seccomp filter to current.
+ * seccomp_prepare_filter: Prepares a seccomp filter for use.
  * @fprog: BPF program to install
  *
- * Returns 0 on success or an errno on failure.
+ * Returns filter on success or an ERR_PTR on failure.
  */
-static long seccomp_attach_filter(struct sock_fprog *fprog)
+static struct seccomp_filter *seccomp_prepare_filter(struct sock_fprog *fprog)
 {
 	struct seccomp_filter *filter;
 	unsigned long fp_size = fprog->len * sizeof(struct sock_filter);
-	unsigned long total_insns = fprog->len;
 	struct sock_filter *fp;
 	int new_len;
 	long ret;
 
 	if (fprog->len == 0 || fprog->len > BPF_MAXINSNS)
-		return -EINVAL;
-
-	for (filter = current->seccomp.filter; filter; filter = filter->prev)
-		total_insns += filter->prog->len + 4;  /* include a 4 instr penalty */
-	if (total_insns > MAX_INSNS_PER_PATH)
-		return -ENOMEM;
+		return ERR_PTR(-EINVAL);
 
 	/*
 	 * Installing a seccomp filter requires that the task has
@@ -244,11 +238,11 @@ static long seccomp_attach_filter(struct sock_fprog *fprog)
 	if (!task_no_new_privs(current) &&
 	    security_capable_noaudit(current_cred(), current_user_ns(),
 				     CAP_SYS_ADMIN) != 0)
-		return -EACCES;
+		return ERR_PTR(-EACCES);
 
 	fp = kzalloc(fp_size, GFP_KERNEL|__GFP_NOWARN);
 	if (!fp)
-		return -ENOMEM;
+		return ERR_PTR(-ENOMEM);
 
 	/* Copy the instructions from fprog. */
 	ret = -EFAULT;
@@ -292,13 +286,7 @@ static long seccomp_attach_filter(struct sock_fprog *fprog)
 
 	sk_filter_select_runtime(filter->prog);
 
-	/*
-	 * If there is an existing filter, make it the prev and don't drop its
-	 * task reference.
-	 */
-	filter->prev = current->seccomp.filter;
-	current->seccomp.filter = filter;
-	return 0;
+	return filter;
 
 free_filter_prog:
 	kfree(filter->prog);
@@ -306,19 +294,20 @@ free_filter:
 	kfree(filter);
 free_prog:
 	kfree(fp);
-	return ret;
+	return ERR_PTR(ret);
 }
 
 /**
- * seccomp_attach_user_filter - attaches a user-supplied sock_fprog
+ * seccomp_prepare_user_filter - prepares a user-supplied sock_fprog
  * @user_filter: pointer to the user data containing a sock_fprog.
  *
  * Returns 0 on success and non-zero otherwise.
  */
-static long seccomp_attach_user_filter(const char __user *user_filter)
+static struct seccomp_filter *
+seccomp_prepare_user_filter(const char __user *user_filter)
 {
 	struct sock_fprog fprog;
-	long ret = -EFAULT;
+	struct seccomp_filter *filter = ERR_PTR(-EFAULT);
 
 #ifdef CONFIG_COMPAT
 	if (is_compat_task()) {
@@ -331,9 +320,39 @@ static long seccomp_attach_user_filter(const char __user *user_filter)
 #endif
 	if (copy_from_user(&fprog, user_filter, sizeof(fprog)))
 		goto out;
-	ret = seccomp_attach_filter(&fprog);
+	filter = seccomp_prepare_filter(&fprog);
 out:
-	return ret;
+	return filter;
+}
+
+/**
+ * seccomp_attach_filter: validate and attach filter
+ * @flags:  flags to change filter behavior
+ * @filter: seccomp filter to add to the current process
+ *
+ * Returns 0 on success, -ve on error.
+ */
+static long seccomp_attach_filter(unsigned int flags,
+				  struct seccomp_filter *filter)
+{
+	unsigned long total_insns;
+	struct seccomp_filter *walker;
+
+	/* Validate resulting filter length. */
+	total_insns = filter->prog->len;
+	for (walker = current->seccomp.filter; walker; walker = walker->prev)
+		total_insns += walker->prog->len + 4;  /* 4 instr penalty */
+	if (total_insns > MAX_INSNS_PER_PATH)
+		return -ENOMEM;
+
+	/*
+	 * If there is an existing filter, make it the prev and don't drop its
+	 * task reference.
+	 */
+	filter->prev = current->seccomp.filter;
+	current->seccomp.filter = filter;
+
+	return 0;
 }
 
 /* get_seccomp_filter - increments the reference count of the filter on @tsk */
@@ -346,6 +365,14 @@ void get_seccomp_filter(struct task_struct *tsk)
 	atomic_inc(&orig->usage);
 }
 
+static inline void seccomp_filter_free(struct seccomp_filter *filter)
+{
+	if (filter) {
+		sk_filter_free(filter->prog);
+		kfree(filter);
+	}
+}
+
 /* put_seccomp_filter - decrements the ref count of tsk->seccomp.filter */
 void put_seccomp_filter(struct task_struct *tsk)
 {
@@ -354,8 +381,7 @@ void put_seccomp_filter(struct task_struct *tsk)
 	while (orig && atomic_dec_and_test(&orig->usage)) {
 		struct seccomp_filter *freeme = orig;
 		orig = orig->prev;
-		sk_filter_free(freeme->prog);
-		kfree(freeme);
+		seccomp_filter_free(freeme);
 	}
 }
 
@@ -533,21 +559,30 @@ static long seccomp_set_mode_filter(unsigned int flags,
 				    const char __user *filter)
 {
 	const unsigned long seccomp_mode = SECCOMP_MODE_FILTER;
+	struct seccomp_filter *prepared = NULL;
 	long ret = -EINVAL;
 
 	/* Validate flags. */
 	if (flags != 0)
 		goto out;
 
+	/* Prepare the new filter before holding any locks. */
+	prepared = seccomp_prepare_user_filter(filter);
+	if (IS_ERR(prepared))
+		return PTR_ERR(prepared);
+
 	if (!seccomp_check_mode(seccomp_mode))
 		goto out;
 
-	ret = seccomp_attach_user_filter(filter);
+	ret = seccomp_attach_filter(flags, prepared);
 	if (ret)
 		goto out;
+	/* Do not free the successfully attached filter. */
+	prepared = NULL;
 
 	seccomp_assign_mode(seccomp_mode);
 out:
+	seccomp_filter_free(prepared);
 	return ret;
 }
 #else
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v9 09/11] seccomp: introduce writer locking
  2014-06-27 23:22 ` Kees Cook
@ 2014-06-27 23:22   ` Kees Cook
  -1 siblings, 0 replies; 59+ messages in thread
From: Kees Cook @ 2014-06-27 23:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, Oleg Nesterov, Andy Lutomirski,
	Michael Kerrisk (man-pages),
	Alexei Starovoitov, Andrew Morton, Daniel Borkmann, Will Drewry,
	Julien Tinnes, David Drysdale, linux-api, x86, linux-arm-kernel,
	linux-mips, linux-arch, linux-security-module

Normally, task_struct.seccomp.filter is only ever read or modified by
the task that owns it (current). This property aids in fast access
during system call filtering as read access is lockless.

Updating the pointer from another task, however, opens up race
conditions. To allow cross-thread filter pointer updates, writes to
the seccomp fields are now protected by the sighand spinlock (which
is unique to the thread group). Read access remains lockless because
pointer updates themselves are atomic.  However, writes (or cloning)
often entail additional checking (like maximum instruction counts)
which require locking to perform safely.

In the case of cloning threads, the child is invisible to the system
until it enters the task list. To make sure a child can't be cloned from
a thread and left in a prior state, seccomp duplication is additionally
moved under the sighand lock. Then parent and child are certain have
the same seccomp state when they exit the lock.

Based on patches by Will Drewry and David Drysdale.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 include/linux/seccomp.h |    6 +++---
 kernel/fork.c           |   45 ++++++++++++++++++++++++++++++++++++++++++++-
 kernel/seccomp.c        |   26 ++++++++++++++++++++------
 3 files changed, 67 insertions(+), 10 deletions(-)

diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index 4054b0994071..9ff98b4bfe2e 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -14,11 +14,11 @@ struct seccomp_filter;
  *
  * @mode:  indicates one of the valid values above for controlled
  *         system calls available to a process.
- * @filter: The metadata and ruleset for determining what system calls
- *          are allowed for a task.
+ * @filter: must always point to a valid seccomp-filter or NULL as it is
+ *          accessed without locking during system call entry.
  *
  *          @filter must only be accessed from the context of current as there
- *          is no locking.
+ *          is no read locking.
  */
 struct seccomp {
 	int mode;
diff --git a/kernel/fork.c b/kernel/fork.c
index 6a13c46cd87d..ffc1b43e351f 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -315,6 +315,15 @@ static struct task_struct *dup_task_struct(struct task_struct *orig)
 		goto free_ti;
 
 	tsk->stack = ti;
+#ifdef CONFIG_SECCOMP
+	/*
+	 * We must handle setting up seccomp filters once we're under
+	 * the sighand lock in case orig has changed between now and
+	 * then. Until then, filter must be NULL to avoid messing up
+	 * the usage counts on the error path calling free_task.
+	 */
+	tsk->seccomp.filter = NULL;
+#endif
 
 	setup_thread_stack(tsk, orig);
 	clear_user_return_notifier(tsk);
@@ -1081,6 +1090,35 @@ static int copy_signal(unsigned long clone_flags, struct task_struct *tsk)
 	return 0;
 }
 
+static void copy_seccomp(struct task_struct *p)
+{
+#ifdef CONFIG_SECCOMP
+	/*
+	 * Must be called with sighand->lock held, which is common to
+	 * all threads in the group. Holding cred_guard_mutex is not
+	 * needed because this new task is not yet running and cannot
+	 * be racing exec.
+	 */
+	BUG_ON(!spin_is_locked(&current->sighand->siglock));
+
+	/* Ref-count the new filter user, and assign it. */
+	get_seccomp_filter(current);
+	p->seccomp = current->seccomp;
+
+	/*
+	 * Explicitly enable no_new_privs here in case it got set
+	 * between the task_struct being duplicated and holding the
+	 * sighand lock. The seccomp state and nnp must be in sync.
+	 */
+	if (task_no_new_privs(current))
+		task_set_no_new_privs(p);
+
+	/* If we have a seccomp mode, enable the thread flag. */
+	if (p->seccomp.mode != SECCOMP_MODE_DISABLED)
+		set_tsk_thread_flag(p, TIF_SECCOMP);
+#endif
+}
+
 SYSCALL_DEFINE1(set_tid_address, int __user *, tidptr)
 {
 	current->clear_child_tid = tidptr;
@@ -1196,7 +1234,6 @@ static struct task_struct *copy_process(unsigned long clone_flags,
 		goto fork_out;
 
 	ftrace_graph_init_task(p);
-	get_seccomp_filter(p);
 
 	rt_mutex_init_task(p);
 
@@ -1437,6 +1474,12 @@ static struct task_struct *copy_process(unsigned long clone_flags,
 	spin_lock(&current->sighand->siglock);
 
 	/*
+	 * Copy seccomp details explicitly here, in case they were changed
+	 * before holding sighand lock.
+	 */
+	copy_seccomp(p);
+
+	/*
 	 * Process group and session signals need to be delivered to just the
 	 * parent before the fork or both the parent and the child after the
 	 * fork. Restart if a signal comes in before we add the new process to
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 502e54d7f86d..e1ff2c193190 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -173,12 +173,12 @@ static int seccomp_check_filter(struct sock_filter *filter, unsigned int flen)
  */
 static u32 seccomp_run_filters(int syscall)
 {
-	struct seccomp_filter *f;
+	struct seccomp_filter *f = ACCESS_ONCE(current->seccomp.filter);
 	struct seccomp_data sd;
 	u32 ret = SECCOMP_RET_ALLOW;
 
 	/* Ensure unexpected behavior doesn't result in failing open. */
-	if (WARN_ON(current->seccomp.filter == NULL))
+	if (unlikely(WARN_ON(f == NULL)))
 		return SECCOMP_RET_KILL;
 
 	populate_seccomp_data(&sd);
@@ -187,7 +187,7 @@ static u32 seccomp_run_filters(int syscall)
 	 * All filters in the list are evaluated and the lowest BPF return
 	 * value always takes priority (ignoring the DATA).
 	 */
-	for (f = current->seccomp.filter; f; f = f->prev) {
+	for (; f; f = f->prev) {
 		u32 cur_ret = SK_RUN_FILTER(f->prog, (void *)&sd);
 
 		if ((cur_ret & SECCOMP_RET_ACTION) < (ret & SECCOMP_RET_ACTION))
@@ -199,6 +199,8 @@ static u32 seccomp_run_filters(int syscall)
 
 static inline bool seccomp_check_mode(unsigned long seccomp_mode)
 {
+	BUG_ON(!spin_is_locked(&current->sighand->siglock));
+
 	if (current->seccomp.mode && current->seccomp.mode != seccomp_mode)
 		return false;
 
@@ -207,6 +209,8 @@ static inline bool seccomp_check_mode(unsigned long seccomp_mode)
 
 static inline void seccomp_assign_mode(unsigned long seccomp_mode)
 {
+	BUG_ON(!spin_is_locked(&current->sighand->siglock));
+
 	current->seccomp.mode = seccomp_mode;
 	set_tsk_thread_flag(current, TIF_SECCOMP);
 }
@@ -330,6 +334,8 @@ out:
  * @flags:  flags to change filter behavior
  * @filter: seccomp filter to add to the current process
  *
+ * Caller must be holding current->sighand->siglock lock.
+ *
  * Returns 0 on success, -ve on error.
  */
 static long seccomp_attach_filter(unsigned int flags,
@@ -338,6 +344,8 @@ static long seccomp_attach_filter(unsigned int flags,
 	unsigned long total_insns;
 	struct seccomp_filter *walker;
 
+	BUG_ON(!spin_is_locked(&current->sighand->siglock));
+
 	/* Validate resulting filter length. */
 	total_insns = filter->prog->len;
 	for (walker = current->seccomp.filter; walker; walker = walker->prev)
@@ -350,7 +358,7 @@ static long seccomp_attach_filter(unsigned int flags,
 	 * task reference.
 	 */
 	filter->prev = current->seccomp.filter;
-	current->seccomp.filter = filter;
+	smp_store_release(&current->seccomp.filter, filter);
 
 	return 0;
 }
@@ -376,7 +384,7 @@ static inline void seccomp_filter_free(struct seccomp_filter *filter)
 /* put_seccomp_filter - decrements the ref count of tsk->seccomp.filter */
 void put_seccomp_filter(struct task_struct *tsk)
 {
-	struct seccomp_filter *orig = tsk->seccomp.filter;
+	struct seccomp_filter *orig = smp_load_acquire(&tsk->seccomp.filter);
 	/* Clean up single-reference branches iteratively. */
 	while (orig && atomic_dec_and_test(&orig->usage)) {
 		struct seccomp_filter *freeme = orig;
@@ -527,6 +535,8 @@ static long seccomp_set_mode_strict(void)
 	const unsigned long seccomp_mode = SECCOMP_MODE_STRICT;
 	long ret = -EINVAL;
 
+	spin_lock_irq(&current->sighand->siglock);
+
 	if (!seccomp_check_mode(seccomp_mode))
 		goto out;
 
@@ -537,6 +547,7 @@ static long seccomp_set_mode_strict(void)
 	ret = 0;
 
 out:
+	spin_unlock_irq(&current->sighand->siglock);
 
 	return ret;
 }
@@ -564,13 +575,15 @@ static long seccomp_set_mode_filter(unsigned int flags,
 
 	/* Validate flags. */
 	if (flags != 0)
-		goto out;
+		return -EINVAL;
 
 	/* Prepare the new filter before holding any locks. */
 	prepared = seccomp_prepare_user_filter(filter);
 	if (IS_ERR(prepared))
 		return PTR_ERR(prepared);
 
+	spin_lock_irq(&current->sighand->siglock);
+
 	if (!seccomp_check_mode(seccomp_mode))
 		goto out;
 
@@ -582,6 +595,7 @@ static long seccomp_set_mode_filter(unsigned int flags,
 
 	seccomp_assign_mode(seccomp_mode);
 out:
+	spin_unlock_irq(&current->sighand->siglock);
 	seccomp_filter_free(prepared);
 	return ret;
 }
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v9 09/11] seccomp: introduce writer locking
@ 2014-06-27 23:22   ` Kees Cook
  0 siblings, 0 replies; 59+ messages in thread
From: Kees Cook @ 2014-06-27 23:22 UTC (permalink / raw)
  To: linux-arm-kernel

Normally, task_struct.seccomp.filter is only ever read or modified by
the task that owns it (current). This property aids in fast access
during system call filtering as read access is lockless.

Updating the pointer from another task, however, opens up race
conditions. To allow cross-thread filter pointer updates, writes to
the seccomp fields are now protected by the sighand spinlock (which
is unique to the thread group). Read access remains lockless because
pointer updates themselves are atomic.  However, writes (or cloning)
often entail additional checking (like maximum instruction counts)
which require locking to perform safely.

In the case of cloning threads, the child is invisible to the system
until it enters the task list. To make sure a child can't be cloned from
a thread and left in a prior state, seccomp duplication is additionally
moved under the sighand lock. Then parent and child are certain have
the same seccomp state when they exit the lock.

Based on patches by Will Drewry and David Drysdale.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 include/linux/seccomp.h |    6 +++---
 kernel/fork.c           |   45 ++++++++++++++++++++++++++++++++++++++++++++-
 kernel/seccomp.c        |   26 ++++++++++++++++++++------
 3 files changed, 67 insertions(+), 10 deletions(-)

diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index 4054b0994071..9ff98b4bfe2e 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -14,11 +14,11 @@ struct seccomp_filter;
  *
  * @mode:  indicates one of the valid values above for controlled
  *         system calls available to a process.
- * @filter: The metadata and ruleset for determining what system calls
- *          are allowed for a task.
+ * @filter: must always point to a valid seccomp-filter or NULL as it is
+ *          accessed without locking during system call entry.
  *
  *          @filter must only be accessed from the context of current as there
- *          is no locking.
+ *          is no read locking.
  */
 struct seccomp {
 	int mode;
diff --git a/kernel/fork.c b/kernel/fork.c
index 6a13c46cd87d..ffc1b43e351f 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -315,6 +315,15 @@ static struct task_struct *dup_task_struct(struct task_struct *orig)
 		goto free_ti;
 
 	tsk->stack = ti;
+#ifdef CONFIG_SECCOMP
+	/*
+	 * We must handle setting up seccomp filters once we're under
+	 * the sighand lock in case orig has changed between now and
+	 * then. Until then, filter must be NULL to avoid messing up
+	 * the usage counts on the error path calling free_task.
+	 */
+	tsk->seccomp.filter = NULL;
+#endif
 
 	setup_thread_stack(tsk, orig);
 	clear_user_return_notifier(tsk);
@@ -1081,6 +1090,35 @@ static int copy_signal(unsigned long clone_flags, struct task_struct *tsk)
 	return 0;
 }
 
+static void copy_seccomp(struct task_struct *p)
+{
+#ifdef CONFIG_SECCOMP
+	/*
+	 * Must be called with sighand->lock held, which is common to
+	 * all threads in the group. Holding cred_guard_mutex is not
+	 * needed because this new task is not yet running and cannot
+	 * be racing exec.
+	 */
+	BUG_ON(!spin_is_locked(&current->sighand->siglock));
+
+	/* Ref-count the new filter user, and assign it. */
+	get_seccomp_filter(current);
+	p->seccomp = current->seccomp;
+
+	/*
+	 * Explicitly enable no_new_privs here in case it got set
+	 * between the task_struct being duplicated and holding the
+	 * sighand lock. The seccomp state and nnp must be in sync.
+	 */
+	if (task_no_new_privs(current))
+		task_set_no_new_privs(p);
+
+	/* If we have a seccomp mode, enable the thread flag. */
+	if (p->seccomp.mode != SECCOMP_MODE_DISABLED)
+		set_tsk_thread_flag(p, TIF_SECCOMP);
+#endif
+}
+
 SYSCALL_DEFINE1(set_tid_address, int __user *, tidptr)
 {
 	current->clear_child_tid = tidptr;
@@ -1196,7 +1234,6 @@ static struct task_struct *copy_process(unsigned long clone_flags,
 		goto fork_out;
 
 	ftrace_graph_init_task(p);
-	get_seccomp_filter(p);
 
 	rt_mutex_init_task(p);
 
@@ -1437,6 +1474,12 @@ static struct task_struct *copy_process(unsigned long clone_flags,
 	spin_lock(&current->sighand->siglock);
 
 	/*
+	 * Copy seccomp details explicitly here, in case they were changed
+	 * before holding sighand lock.
+	 */
+	copy_seccomp(p);
+
+	/*
 	 * Process group and session signals need to be delivered to just the
 	 * parent before the fork or both the parent and the child after the
 	 * fork. Restart if a signal comes in before we add the new process to
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 502e54d7f86d..e1ff2c193190 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -173,12 +173,12 @@ static int seccomp_check_filter(struct sock_filter *filter, unsigned int flen)
  */
 static u32 seccomp_run_filters(int syscall)
 {
-	struct seccomp_filter *f;
+	struct seccomp_filter *f = ACCESS_ONCE(current->seccomp.filter);
 	struct seccomp_data sd;
 	u32 ret = SECCOMP_RET_ALLOW;
 
 	/* Ensure unexpected behavior doesn't result in failing open. */
-	if (WARN_ON(current->seccomp.filter == NULL))
+	if (unlikely(WARN_ON(f == NULL)))
 		return SECCOMP_RET_KILL;
 
 	populate_seccomp_data(&sd);
@@ -187,7 +187,7 @@ static u32 seccomp_run_filters(int syscall)
 	 * All filters in the list are evaluated and the lowest BPF return
 	 * value always takes priority (ignoring the DATA).
 	 */
-	for (f = current->seccomp.filter; f; f = f->prev) {
+	for (; f; f = f->prev) {
 		u32 cur_ret = SK_RUN_FILTER(f->prog, (void *)&sd);
 
 		if ((cur_ret & SECCOMP_RET_ACTION) < (ret & SECCOMP_RET_ACTION))
@@ -199,6 +199,8 @@ static u32 seccomp_run_filters(int syscall)
 
 static inline bool seccomp_check_mode(unsigned long seccomp_mode)
 {
+	BUG_ON(!spin_is_locked(&current->sighand->siglock));
+
 	if (current->seccomp.mode && current->seccomp.mode != seccomp_mode)
 		return false;
 
@@ -207,6 +209,8 @@ static inline bool seccomp_check_mode(unsigned long seccomp_mode)
 
 static inline void seccomp_assign_mode(unsigned long seccomp_mode)
 {
+	BUG_ON(!spin_is_locked(&current->sighand->siglock));
+
 	current->seccomp.mode = seccomp_mode;
 	set_tsk_thread_flag(current, TIF_SECCOMP);
 }
@@ -330,6 +334,8 @@ out:
  * @flags:  flags to change filter behavior
  * @filter: seccomp filter to add to the current process
  *
+ * Caller must be holding current->sighand->siglock lock.
+ *
  * Returns 0 on success, -ve on error.
  */
 static long seccomp_attach_filter(unsigned int flags,
@@ -338,6 +344,8 @@ static long seccomp_attach_filter(unsigned int flags,
 	unsigned long total_insns;
 	struct seccomp_filter *walker;
 
+	BUG_ON(!spin_is_locked(&current->sighand->siglock));
+
 	/* Validate resulting filter length. */
 	total_insns = filter->prog->len;
 	for (walker = current->seccomp.filter; walker; walker = walker->prev)
@@ -350,7 +358,7 @@ static long seccomp_attach_filter(unsigned int flags,
 	 * task reference.
 	 */
 	filter->prev = current->seccomp.filter;
-	current->seccomp.filter = filter;
+	smp_store_release(&current->seccomp.filter, filter);
 
 	return 0;
 }
@@ -376,7 +384,7 @@ static inline void seccomp_filter_free(struct seccomp_filter *filter)
 /* put_seccomp_filter - decrements the ref count of tsk->seccomp.filter */
 void put_seccomp_filter(struct task_struct *tsk)
 {
-	struct seccomp_filter *orig = tsk->seccomp.filter;
+	struct seccomp_filter *orig = smp_load_acquire(&tsk->seccomp.filter);
 	/* Clean up single-reference branches iteratively. */
 	while (orig && atomic_dec_and_test(&orig->usage)) {
 		struct seccomp_filter *freeme = orig;
@@ -527,6 +535,8 @@ static long seccomp_set_mode_strict(void)
 	const unsigned long seccomp_mode = SECCOMP_MODE_STRICT;
 	long ret = -EINVAL;
 
+	spin_lock_irq(&current->sighand->siglock);
+
 	if (!seccomp_check_mode(seccomp_mode))
 		goto out;
 
@@ -537,6 +547,7 @@ static long seccomp_set_mode_strict(void)
 	ret = 0;
 
 out:
+	spin_unlock_irq(&current->sighand->siglock);
 
 	return ret;
 }
@@ -564,13 +575,15 @@ static long seccomp_set_mode_filter(unsigned int flags,
 
 	/* Validate flags. */
 	if (flags != 0)
-		goto out;
+		return -EINVAL;
 
 	/* Prepare the new filter before holding any locks. */
 	prepared = seccomp_prepare_user_filter(filter);
 	if (IS_ERR(prepared))
 		return PTR_ERR(prepared);
 
+	spin_lock_irq(&current->sighand->siglock);
+
 	if (!seccomp_check_mode(seccomp_mode))
 		goto out;
 
@@ -582,6 +595,7 @@ static long seccomp_set_mode_filter(unsigned int flags,
 
 	seccomp_assign_mode(seccomp_mode);
 out:
+	spin_unlock_irq(&current->sighand->siglock);
 	seccomp_filter_free(prepared);
 	return ret;
 }
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v9 10/11] seccomp: allow mode setting across threads
  2014-06-27 23:22 ` Kees Cook
  (?)
@ 2014-06-27 23:22   ` Kees Cook
  -1 siblings, 0 replies; 59+ messages in thread
From: Kees Cook @ 2014-06-27 23:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, Oleg Nesterov, Andy Lutomirski,
	Michael Kerrisk (man-pages),
	Alexei Starovoitov, Andrew Morton, Daniel Borkmann, Will Drewry,
	Julien Tinnes, David Drysdale, linux-api, x86, linux-arm-kernel,
	linux-mips, linux-arch, linux-security-module

This changes the mode setting helper to allow threads to change the
seccomp mode from another thread. We must maintain barriers to keep
TIF_SECCOMP synchronized with the rest of the seccomp state.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 kernel/seccomp.c |   27 +++++++++++++++++++--------
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index e1ff2c193190..7bbcb9ed16df 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -207,12 +207,18 @@ static inline bool seccomp_check_mode(unsigned long seccomp_mode)
 	return true;
 }
 
-static inline void seccomp_assign_mode(unsigned long seccomp_mode)
+static inline void seccomp_assign_mode(struct task_struct *task,
+				       unsigned long seccomp_mode)
 {
-	BUG_ON(!spin_is_locked(&current->sighand->siglock));
+	BUG_ON(!spin_is_locked(&task->sighand->siglock));
 
-	current->seccomp.mode = seccomp_mode;
-	set_tsk_thread_flag(current, TIF_SECCOMP);
+	task->seccomp.mode = seccomp_mode;
+	/*
+	 * Make sure TIF_SECCOMP cannot be set before the mode (and
+	 * filter) is set.
+	 */
+	smp_mb__before_atomic();
+	set_tsk_thread_flag(task, TIF_SECCOMP);
 }
 
 #ifdef CONFIG_SECCOMP_FILTER
@@ -433,12 +439,17 @@ static int mode1_syscalls_32[] = {
 
 int __secure_computing(int this_syscall)
 {
-	int mode = current->seccomp.mode;
 	int exit_sig = 0;
 	int *syscall;
 	u32 ret;
 
-	switch (mode) {
+	/*
+	 * Make sure that any changes to mode from another thread have
+	 * been seen after TIF_SECCOMP was seen.
+	 */
+	rmb();
+
+	switch (current->seccomp.mode) {
 	case SECCOMP_MODE_STRICT:
 		syscall = mode1_syscalls;
 #ifdef CONFIG_COMPAT
@@ -543,7 +554,7 @@ static long seccomp_set_mode_strict(void)
 #ifdef TIF_NOTSC
 	disable_TSC();
 #endif
-	seccomp_assign_mode(seccomp_mode);
+	seccomp_assign_mode(current, seccomp_mode);
 	ret = 0;
 
 out:
@@ -593,7 +604,7 @@ static long seccomp_set_mode_filter(unsigned int flags,
 	/* Do not free the successfully attached filter. */
 	prepared = NULL;
 
-	seccomp_assign_mode(seccomp_mode);
+	seccomp_assign_mode(current, seccomp_mode);
 out:
 	spin_unlock_irq(&current->sighand->siglock);
 	seccomp_filter_free(prepared);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v9 10/11] seccomp: allow mode setting across threads
@ 2014-06-27 23:22   ` Kees Cook
  0 siblings, 0 replies; 59+ messages in thread
From: Kees Cook @ 2014-06-27 23:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, linux-mips, Will Drewry, Kees Cook,
	linux-security-module, linux-api, x86, Oleg Nesterov,
	Andy Lutomirski, Daniel Borkmann, Julien Tinnes,
	Michael Kerrisk (man-pages),
	Andrew Morton, David Drysdale, linux-arm-kernel,
	Alexei Starovoitov

This changes the mode setting helper to allow threads to change the
seccomp mode from another thread. We must maintain barriers to keep
TIF_SECCOMP synchronized with the rest of the seccomp state.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 kernel/seccomp.c |   27 +++++++++++++++++++--------
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index e1ff2c193190..7bbcb9ed16df 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -207,12 +207,18 @@ static inline bool seccomp_check_mode(unsigned long seccomp_mode)
 	return true;
 }
 
-static inline void seccomp_assign_mode(unsigned long seccomp_mode)
+static inline void seccomp_assign_mode(struct task_struct *task,
+				       unsigned long seccomp_mode)
 {
-	BUG_ON(!spin_is_locked(&current->sighand->siglock));
+	BUG_ON(!spin_is_locked(&task->sighand->siglock));
 
-	current->seccomp.mode = seccomp_mode;
-	set_tsk_thread_flag(current, TIF_SECCOMP);
+	task->seccomp.mode = seccomp_mode;
+	/*
+	 * Make sure TIF_SECCOMP cannot be set before the mode (and
+	 * filter) is set.
+	 */
+	smp_mb__before_atomic();
+	set_tsk_thread_flag(task, TIF_SECCOMP);
 }
 
 #ifdef CONFIG_SECCOMP_FILTER
@@ -433,12 +439,17 @@ static int mode1_syscalls_32[] = {
 
 int __secure_computing(int this_syscall)
 {
-	int mode = current->seccomp.mode;
 	int exit_sig = 0;
 	int *syscall;
 	u32 ret;
 
-	switch (mode) {
+	/*
+	 * Make sure that any changes to mode from another thread have
+	 * been seen after TIF_SECCOMP was seen.
+	 */
+	rmb();
+
+	switch (current->seccomp.mode) {
 	case SECCOMP_MODE_STRICT:
 		syscall = mode1_syscalls;
 #ifdef CONFIG_COMPAT
@@ -543,7 +554,7 @@ static long seccomp_set_mode_strict(void)
 #ifdef TIF_NOTSC
 	disable_TSC();
 #endif
-	seccomp_assign_mode(seccomp_mode);
+	seccomp_assign_mode(current, seccomp_mode);
 	ret = 0;
 
 out:
@@ -593,7 +604,7 @@ static long seccomp_set_mode_filter(unsigned int flags,
 	/* Do not free the successfully attached filter. */
 	prepared = NULL;
 
-	seccomp_assign_mode(seccomp_mode);
+	seccomp_assign_mode(current, seccomp_mode);
 out:
 	spin_unlock_irq(&current->sighand->siglock);
 	seccomp_filter_free(prepared);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v9 10/11] seccomp: allow mode setting across threads
@ 2014-06-27 23:22   ` Kees Cook
  0 siblings, 0 replies; 59+ messages in thread
From: Kees Cook @ 2014-06-27 23:22 UTC (permalink / raw)
  To: linux-arm-kernel

This changes the mode setting helper to allow threads to change the
seccomp mode from another thread. We must maintain barriers to keep
TIF_SECCOMP synchronized with the rest of the seccomp state.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 kernel/seccomp.c |   27 +++++++++++++++++++--------
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index e1ff2c193190..7bbcb9ed16df 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -207,12 +207,18 @@ static inline bool seccomp_check_mode(unsigned long seccomp_mode)
 	return true;
 }
 
-static inline void seccomp_assign_mode(unsigned long seccomp_mode)
+static inline void seccomp_assign_mode(struct task_struct *task,
+				       unsigned long seccomp_mode)
 {
-	BUG_ON(!spin_is_locked(&current->sighand->siglock));
+	BUG_ON(!spin_is_locked(&task->sighand->siglock));
 
-	current->seccomp.mode = seccomp_mode;
-	set_tsk_thread_flag(current, TIF_SECCOMP);
+	task->seccomp.mode = seccomp_mode;
+	/*
+	 * Make sure TIF_SECCOMP cannot be set before the mode (and
+	 * filter) is set.
+	 */
+	smp_mb__before_atomic();
+	set_tsk_thread_flag(task, TIF_SECCOMP);
 }
 
 #ifdef CONFIG_SECCOMP_FILTER
@@ -433,12 +439,17 @@ static int mode1_syscalls_32[] = {
 
 int __secure_computing(int this_syscall)
 {
-	int mode = current->seccomp.mode;
 	int exit_sig = 0;
 	int *syscall;
 	u32 ret;
 
-	switch (mode) {
+	/*
+	 * Make sure that any changes to mode from another thread have
+	 * been seen after TIF_SECCOMP was seen.
+	 */
+	rmb();
+
+	switch (current->seccomp.mode) {
 	case SECCOMP_MODE_STRICT:
 		syscall = mode1_syscalls;
 #ifdef CONFIG_COMPAT
@@ -543,7 +554,7 @@ static long seccomp_set_mode_strict(void)
 #ifdef TIF_NOTSC
 	disable_TSC();
 #endif
-	seccomp_assign_mode(seccomp_mode);
+	seccomp_assign_mode(current, seccomp_mode);
 	ret = 0;
 
 out:
@@ -593,7 +604,7 @@ static long seccomp_set_mode_filter(unsigned int flags,
 	/* Do not free the successfully attached filter. */
 	prepared = NULL;
 
-	seccomp_assign_mode(seccomp_mode);
+	seccomp_assign_mode(current, seccomp_mode);
 out:
 	spin_unlock_irq(&current->sighand->siglock);
 	seccomp_filter_free(prepared);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v9 11/11] seccomp: implement SECCOMP_FILTER_FLAG_TSYNC
  2014-06-27 23:22 ` Kees Cook
  (?)
@ 2014-06-27 23:23   ` Kees Cook
  -1 siblings, 0 replies; 59+ messages in thread
From: Kees Cook @ 2014-06-27 23:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, Oleg Nesterov, Andy Lutomirski,
	Michael Kerrisk (man-pages),
	Alexei Starovoitov, Andrew Morton, Daniel Borkmann, Will Drewry,
	Julien Tinnes, David Drysdale, linux-api, x86, linux-arm-kernel,
	linux-mips, linux-arch, linux-security-module

Applying restrictive seccomp filter programs to large or diverse
codebases often requires handling threads which may be started early in
the process lifetime (e.g., by code that is linked in). While it is
possible to apply permissive programs prior to process start up, it is
difficult to further restrict the kernel ABI to those threads after that
point.

This change adds a new seccomp syscall flag to SECCOMP_SET_MODE_FILTER for
synchronizing thread group seccomp filters at filter installation time.

When calling seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC,
filter) an attempt will be made to synchronize all threads in current's
threadgroup to its new seccomp filter program. This is possible iff all
threads are using a filter that is an ancestor to the filter current is
attempting to synchronize to. NULL filters (where the task is running as
SECCOMP_MODE_NONE) are also treated as ancestors allowing threads to be
transitioned into SECCOMP_MODE_FILTER. If prctrl(PR_SET_NO_NEW_PRIVS,
...) has been set on the calling thread, no_new_privs will be set for
all synchronized threads too. On success, 0 is returned. On failure,
the pid of one of the failing threads will be returned and no filters
will have been applied.

The race conditions against another thread are:
- requesting TSYNC (already handled by sighand lock)
- performing a clone (already handled by sighand lock)
- changing its filter (already handled by sighand lock)
- calling exec (handled by cred_guard_mutex)
The clone case is assisted by the fact that new threads will have their
seccomp state duplicated from their parent before appearing on the tasklist.

Holding cred_guard_mutex means that seccomp filters cannot be assigned
while in the middle of another thread's exec (potentially bypassing
no_new_privs or similar). To make sure that de_thread() is actually able
to kill other threads during an exec, any sighand holders need to check
if they've been scheduled to be killed, and to give up on their work.

Based on patches by Will Drewry.

Suggested-by: Julien Tinnes <jln@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/exec.c                    |    2 +-
 include/linux/seccomp.h      |    2 +
 include/uapi/linux/seccomp.h |    3 +
 kernel/seccomp.c             |  139 +++++++++++++++++++++++++++++++++++++++++-
 4 files changed, 144 insertions(+), 2 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 0f5c272410f6..ab1f1200ce5d 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1216,7 +1216,7 @@ EXPORT_SYMBOL(install_exec_creds);
 /*
  * determine how safe it is to execute the proposed program
  * - the caller must hold ->cred_guard_mutex to protect against
- *   PTRACE_ATTACH
+ *   PTRACE_ATTACH or seccomp thread-sync
  */
 static void check_unsafe_exec(struct linux_binprm *bprm)
 {
diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index 9ff98b4bfe2e..15de2a711518 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -3,6 +3,8 @@
 
 #include <uapi/linux/seccomp.h>
 
+#define SECCOMP_FILTER_FLAG_MASK	~(SECCOMP_FILTER_FLAG_TSYNC)
+
 #ifdef CONFIG_SECCOMP
 
 #include <linux/thread_info.h>
diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
index b258878ba754..0f238a43ff1e 100644
--- a/include/uapi/linux/seccomp.h
+++ b/include/uapi/linux/seccomp.h
@@ -14,6 +14,9 @@
 #define SECCOMP_SET_MODE_STRICT	0
 #define SECCOMP_SET_MODE_FILTER	1
 
+/* Valid flags for SECCOMP_SET_MODE_FILTER */
+#define SECCOMP_FILTER_FLAG_TSYNC	1
+
 /*
  * All BPF programs must return a 32-bit value.
  * The bottom 16-bits are for optional return data.
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 7bbcb9ed16df..0a82e16da7ef 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -26,6 +26,7 @@
 #ifdef CONFIG_SECCOMP_FILTER
 #include <asm/syscall.h>
 #include <linux/filter.h>
+#include <linux/pid.h>
 #include <linux/ptrace.h>
 #include <linux/security.h>
 #include <linux/tracehook.h>
@@ -222,6 +223,108 @@ static inline void seccomp_assign_mode(struct task_struct *task,
 }
 
 #ifdef CONFIG_SECCOMP_FILTER
+/* Returns 1 if the candidate is an ancestor. */
+static int is_ancestor(struct seccomp_filter *candidate,
+		       struct seccomp_filter *child)
+{
+	/* NULL is the root ancestor. */
+	if (candidate == NULL)
+		return 1;
+	for (; child; child = child->prev)
+		if (child == candidate)
+			return 1;
+	return 0;
+}
+
+/**
+ * seccomp_can_sync_threads: checks if all threads can be synchronized
+ *
+ * Expects sighand and cred_guard_mutex locks to be held.
+ *
+ * Returns 0 on success, -ve on error, or the pid of a thread which was
+ * either not in the correct seccomp mode or it did not have an ancestral
+ * seccomp filter.
+ */
+static inline pid_t seccomp_can_sync_threads(void)
+{
+	struct task_struct *thread, *caller;
+
+	BUG_ON(!mutex_is_locked(&current->signal->cred_guard_mutex));
+	BUG_ON(!spin_is_locked(&current->sighand->siglock));
+
+	if (current->seccomp.mode != SECCOMP_MODE_FILTER)
+		return -EACCES;
+
+	/* Validate all threads being eligible for synchronization. */
+	caller = current;
+	for_each_thread(caller, thread) {
+		pid_t failed;
+
+		if (thread->seccomp.mode == SECCOMP_MODE_DISABLED ||
+		    (thread->seccomp.mode == SECCOMP_MODE_FILTER &&
+		     is_ancestor(thread->seccomp.filter,
+				 caller->seccomp.filter)))
+			continue;
+
+		/* Return the first thread that cannot be synchronized. */
+		failed = task_pid_vnr(thread);
+		/* If the pid cannot be resolved, then return -ESRCH */
+		if (unlikely(WARN_ON(failed == 0)))
+			failed = -ESRCH;
+		return failed;
+	}
+
+	return 0;
+}
+
+/**
+ * seccomp_sync_threads: sets all threads to use current's filter
+ *
+ * Expects sighand and cred_guard_mutex locks to be held, and for
+ * seccomp_can_sync_threads() to have returned success already
+ * without dropping the locks.
+ *
+ */
+static inline void seccomp_sync_threads(void)
+{
+	struct task_struct *thread, *caller;
+
+	BUG_ON(!mutex_is_locked(&current->signal->cred_guard_mutex));
+	BUG_ON(!spin_is_locked(&current->sighand->siglock));
+
+	/* Synchronize all threads. */
+	caller = current;
+	for_each_thread(caller, thread) {
+		/* Get a task reference for the new leaf node. */
+		get_seccomp_filter(caller);
+		/*
+		 * Drop the task reference to the shared ancestor since
+		 * current's path will hold a reference.  (This also
+		 * allows a put before the assignment.)
+		 */
+		put_seccomp_filter(thread);
+		thread->seccomp.filter = caller->seccomp.filter;
+		/*
+		 * Opt the other thread into seccomp if needed.
+		 * As threads are considered to be trust-realm
+		 * equivalent (see ptrace_may_access), it is safe to
+		 * allow one thread to transition the other.
+		 */
+		if (thread->seccomp.mode == SECCOMP_MODE_DISABLED) {
+			/*
+			 * Don't let an unprivileged task work around
+			 * the no_new_privs restriction by creating
+			 * a thread that sets it up, enters seccomp,
+			 * then dies.
+			 */
+			if (task_no_new_privs(caller))
+				task_set_no_new_privs(thread);
+
+			seccomp_assign_mode(thread, SECCOMP_MODE_FILTER);
+		}
+	}
+}
+
 /**
  * seccomp_prepare_filter: Prepares a seccomp filter for use.
  * @fprog: BPF program to install
@@ -359,6 +462,15 @@ static long seccomp_attach_filter(unsigned int flags,
 	if (total_insns > MAX_INSNS_PER_PATH)
 		return -ENOMEM;
 
+	/* If thread sync has been requested, check that it is possible. */
+	if (flags & SECCOMP_FILTER_FLAG_TSYNC) {
+		int ret;
+
+		ret = seccomp_can_sync_threads();
+		if (ret)
+			return ret;
+	}
+
 	/*
 	 * If there is an existing filter, make it the prev and don't drop its
 	 * task reference.
@@ -366,6 +478,10 @@ static long seccomp_attach_filter(unsigned int flags,
 	filter->prev = current->seccomp.filter;
 	smp_store_release(&current->seccomp.filter, filter);
 
+	/* Now that the new filter is in place, synchronize to all threads. */
+	if (flags & SECCOMP_FILTER_FLAG_TSYNC)
+		seccomp_sync_threads();
+
 	return 0;
 }
 
@@ -547,6 +663,11 @@ static long seccomp_set_mode_strict(void)
 	long ret = -EINVAL;
 
 	spin_lock_irq(&current->sighand->siglock);
+	if (unlikely(signal_group_exit(current->signal))) {
+		/* If thread is dying, return to process the signal. */
+		ret = -EAGAIN;
+		goto out;
+	}
 
 	if (!seccomp_check_mode(seccomp_mode))
 		goto out;
@@ -585,7 +706,7 @@ static long seccomp_set_mode_filter(unsigned int flags,
 	long ret = -EINVAL;
 
 	/* Validate flags. */
-	if (flags != 0)
+	if (flags & SECCOMP_FILTER_FLAG_MASK)
 		return -EINVAL;
 
 	/* Prepare the new filter before holding any locks. */
@@ -593,7 +714,20 @@ static long seccomp_set_mode_filter(unsigned int flags,
 	if (IS_ERR(prepared))
 		return PTR_ERR(prepared);
 
+	/*
+	 * Make sure we cannot change seccomp or nnp state via TSYNC
+	 * while another thread is in the middle of calling exec.
+	 */
+	if (flags & SECCOMP_FILTER_FLAG_TSYNC &&
+	    mutex_lock_killable(&current->signal->cred_guard_mutex))
+		goto out_free;
+
 	spin_lock_irq(&current->sighand->siglock);
+	if (unlikely(signal_group_exit(current->signal))) {
+		/* If thread is dying, return to process the signal. */
+		ret = -EAGAIN;
+		goto out;
+	}
 
 	if (!seccomp_check_mode(seccomp_mode))
 		goto out;
@@ -607,6 +741,9 @@ static long seccomp_set_mode_filter(unsigned int flags,
 	seccomp_assign_mode(current, seccomp_mode);
 out:
 	spin_unlock_irq(&current->sighand->siglock);
+	if (flags & SECCOMP_FILTER_FLAG_TSYNC)
+		mutex_unlock(&current->signal->cred_guard_mutex);
+out_free:
 	seccomp_filter_free(prepared);
 	return ret;
 }
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v9 11/11] seccomp: implement SECCOMP_FILTER_FLAG_TSYNC
@ 2014-06-27 23:23   ` Kees Cook
  0 siblings, 0 replies; 59+ messages in thread
From: Kees Cook @ 2014-06-27 23:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arch, linux-mips, Will Drewry, Kees Cook,
	linux-security-module, linux-api, x86, Oleg Nesterov,
	Andy Lutomirski, Daniel Borkmann, Julien Tinnes,
	Michael Kerrisk (man-pages),
	Andrew Morton, David Drysdale, linux-arm-kernel,
	Alexei Starovoitov

Applying restrictive seccomp filter programs to large or diverse
codebases often requires handling threads which may be started early in
the process lifetime (e.g., by code that is linked in). While it is
possible to apply permissive programs prior to process start up, it is
difficult to further restrict the kernel ABI to those threads after that
point.

This change adds a new seccomp syscall flag to SECCOMP_SET_MODE_FILTER for
synchronizing thread group seccomp filters at filter installation time.

When calling seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC,
filter) an attempt will be made to synchronize all threads in current's
threadgroup to its new seccomp filter program. This is possible iff all
threads are using a filter that is an ancestor to the filter current is
attempting to synchronize to. NULL filters (where the task is running as
SECCOMP_MODE_NONE) are also treated as ancestors allowing threads to be
transitioned into SECCOMP_MODE_FILTER. If prctrl(PR_SET_NO_NEW_PRIVS,
...) has been set on the calling thread, no_new_privs will be set for
all synchronized threads too. On success, 0 is returned. On failure,
the pid of one of the failing threads will be returned and no filters
will have been applied.

The race conditions against another thread are:
- requesting TSYNC (already handled by sighand lock)
- performing a clone (already handled by sighand lock)
- changing its filter (already handled by sighand lock)
- calling exec (handled by cred_guard_mutex)
The clone case is assisted by the fact that new threads will have their
seccomp state duplicated from their parent before appearing on the tasklist.

Holding cred_guard_mutex means that seccomp filters cannot be assigned
while in the middle of another thread's exec (potentially bypassing
no_new_privs or similar). To make sure that de_thread() is actually able
to kill other threads during an exec, any sighand holders need to check
if they've been scheduled to be killed, and to give up on their work.

Based on patches by Will Drewry.

Suggested-by: Julien Tinnes <jln@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/exec.c                    |    2 +-
 include/linux/seccomp.h      |    2 +
 include/uapi/linux/seccomp.h |    3 +
 kernel/seccomp.c             |  139 +++++++++++++++++++++++++++++++++++++++++-
 4 files changed, 144 insertions(+), 2 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 0f5c272410f6..ab1f1200ce5d 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1216,7 +1216,7 @@ EXPORT_SYMBOL(install_exec_creds);
 /*
  * determine how safe it is to execute the proposed program
  * - the caller must hold ->cred_guard_mutex to protect against
- *   PTRACE_ATTACH
+ *   PTRACE_ATTACH or seccomp thread-sync
  */
 static void check_unsafe_exec(struct linux_binprm *bprm)
 {
diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index 9ff98b4bfe2e..15de2a711518 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -3,6 +3,8 @@
 
 #include <uapi/linux/seccomp.h>
 
+#define SECCOMP_FILTER_FLAG_MASK	~(SECCOMP_FILTER_FLAG_TSYNC)
+
 #ifdef CONFIG_SECCOMP
 
 #include <linux/thread_info.h>
diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
index b258878ba754..0f238a43ff1e 100644
--- a/include/uapi/linux/seccomp.h
+++ b/include/uapi/linux/seccomp.h
@@ -14,6 +14,9 @@
 #define SECCOMP_SET_MODE_STRICT	0
 #define SECCOMP_SET_MODE_FILTER	1
 
+/* Valid flags for SECCOMP_SET_MODE_FILTER */
+#define SECCOMP_FILTER_FLAG_TSYNC	1
+
 /*
  * All BPF programs must return a 32-bit value.
  * The bottom 16-bits are for optional return data.
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 7bbcb9ed16df..0a82e16da7ef 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -26,6 +26,7 @@
 #ifdef CONFIG_SECCOMP_FILTER
 #include <asm/syscall.h>
 #include <linux/filter.h>
+#include <linux/pid.h>
 #include <linux/ptrace.h>
 #include <linux/security.h>
 #include <linux/tracehook.h>
@@ -222,6 +223,108 @@ static inline void seccomp_assign_mode(struct task_struct *task,
 }
 
 #ifdef CONFIG_SECCOMP_FILTER
+/* Returns 1 if the candidate is an ancestor. */
+static int is_ancestor(struct seccomp_filter *candidate,
+		       struct seccomp_filter *child)
+{
+	/* NULL is the root ancestor. */
+	if (candidate == NULL)
+		return 1;
+	for (; child; child = child->prev)
+		if (child == candidate)
+			return 1;
+	return 0;
+}
+
+/**
+ * seccomp_can_sync_threads: checks if all threads can be synchronized
+ *
+ * Expects sighand and cred_guard_mutex locks to be held.
+ *
+ * Returns 0 on success, -ve on error, or the pid of a thread which was
+ * either not in the correct seccomp mode or it did not have an ancestral
+ * seccomp filter.
+ */
+static inline pid_t seccomp_can_sync_threads(void)
+{
+	struct task_struct *thread, *caller;
+
+	BUG_ON(!mutex_is_locked(&current->signal->cred_guard_mutex));
+	BUG_ON(!spin_is_locked(&current->sighand->siglock));
+
+	if (current->seccomp.mode != SECCOMP_MODE_FILTER)
+		return -EACCES;
+
+	/* Validate all threads being eligible for synchronization. */
+	caller = current;
+	for_each_thread(caller, thread) {
+		pid_t failed;
+
+		if (thread->seccomp.mode == SECCOMP_MODE_DISABLED ||
+		    (thread->seccomp.mode == SECCOMP_MODE_FILTER &&
+		     is_ancestor(thread->seccomp.filter,
+				 caller->seccomp.filter)))
+			continue;
+
+		/* Return the first thread that cannot be synchronized. */
+		failed = task_pid_vnr(thread);
+		/* If the pid cannot be resolved, then return -ESRCH */
+		if (unlikely(WARN_ON(failed == 0)))
+			failed = -ESRCH;
+		return failed;
+	}
+
+	return 0;
+}
+
+/**
+ * seccomp_sync_threads: sets all threads to use current's filter
+ *
+ * Expects sighand and cred_guard_mutex locks to be held, and for
+ * seccomp_can_sync_threads() to have returned success already
+ * without dropping the locks.
+ *
+ */
+static inline void seccomp_sync_threads(void)
+{
+	struct task_struct *thread, *caller;
+
+	BUG_ON(!mutex_is_locked(&current->signal->cred_guard_mutex));
+	BUG_ON(!spin_is_locked(&current->sighand->siglock));
+
+	/* Synchronize all threads. */
+	caller = current;
+	for_each_thread(caller, thread) {
+		/* Get a task reference for the new leaf node. */
+		get_seccomp_filter(caller);
+		/*
+		 * Drop the task reference to the shared ancestor since
+		 * current's path will hold a reference.  (This also
+		 * allows a put before the assignment.)
+		 */
+		put_seccomp_filter(thread);
+		thread->seccomp.filter = caller->seccomp.filter;
+		/*
+		 * Opt the other thread into seccomp if needed.
+		 * As threads are considered to be trust-realm
+		 * equivalent (see ptrace_may_access), it is safe to
+		 * allow one thread to transition the other.
+		 */
+		if (thread->seccomp.mode == SECCOMP_MODE_DISABLED) {
+			/*
+			 * Don't let an unprivileged task work around
+			 * the no_new_privs restriction by creating
+			 * a thread that sets it up, enters seccomp,
+			 * then dies.
+			 */
+			if (task_no_new_privs(caller))
+				task_set_no_new_privs(thread);
+
+			seccomp_assign_mode(thread, SECCOMP_MODE_FILTER);
+		}
+	}
+}
+
 /**
  * seccomp_prepare_filter: Prepares a seccomp filter for use.
  * @fprog: BPF program to install
@@ -359,6 +462,15 @@ static long seccomp_attach_filter(unsigned int flags,
 	if (total_insns > MAX_INSNS_PER_PATH)
 		return -ENOMEM;
 
+	/* If thread sync has been requested, check that it is possible. */
+	if (flags & SECCOMP_FILTER_FLAG_TSYNC) {
+		int ret;
+
+		ret = seccomp_can_sync_threads();
+		if (ret)
+			return ret;
+	}
+
 	/*
 	 * If there is an existing filter, make it the prev and don't drop its
 	 * task reference.
@@ -366,6 +478,10 @@ static long seccomp_attach_filter(unsigned int flags,
 	filter->prev = current->seccomp.filter;
 	smp_store_release(&current->seccomp.filter, filter);
 
+	/* Now that the new filter is in place, synchronize to all threads. */
+	if (flags & SECCOMP_FILTER_FLAG_TSYNC)
+		seccomp_sync_threads();
+
 	return 0;
 }
 
@@ -547,6 +663,11 @@ static long seccomp_set_mode_strict(void)
 	long ret = -EINVAL;
 
 	spin_lock_irq(&current->sighand->siglock);
+	if (unlikely(signal_group_exit(current->signal))) {
+		/* If thread is dying, return to process the signal. */
+		ret = -EAGAIN;
+		goto out;
+	}
 
 	if (!seccomp_check_mode(seccomp_mode))
 		goto out;
@@ -585,7 +706,7 @@ static long seccomp_set_mode_filter(unsigned int flags,
 	long ret = -EINVAL;
 
 	/* Validate flags. */
-	if (flags != 0)
+	if (flags & SECCOMP_FILTER_FLAG_MASK)
 		return -EINVAL;
 
 	/* Prepare the new filter before holding any locks. */
@@ -593,7 +714,20 @@ static long seccomp_set_mode_filter(unsigned int flags,
 	if (IS_ERR(prepared))
 		return PTR_ERR(prepared);
 
+	/*
+	 * Make sure we cannot change seccomp or nnp state via TSYNC
+	 * while another thread is in the middle of calling exec.
+	 */
+	if (flags & SECCOMP_FILTER_FLAG_TSYNC &&
+	    mutex_lock_killable(&current->signal->cred_guard_mutex))
+		goto out_free;
+
 	spin_lock_irq(&current->sighand->siglock);
+	if (unlikely(signal_group_exit(current->signal))) {
+		/* If thread is dying, return to process the signal. */
+		ret = -EAGAIN;
+		goto out;
+	}
 
 	if (!seccomp_check_mode(seccomp_mode))
 		goto out;
@@ -607,6 +741,9 @@ static long seccomp_set_mode_filter(unsigned int flags,
 	seccomp_assign_mode(current, seccomp_mode);
 out:
 	spin_unlock_irq(&current->sighand->siglock);
+	if (flags & SECCOMP_FILTER_FLAG_TSYNC)
+		mutex_unlock(&current->signal->cred_guard_mutex);
+out_free:
 	seccomp_filter_free(prepared);
 	return ret;
 }
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH v9 11/11] seccomp: implement SECCOMP_FILTER_FLAG_TSYNC
@ 2014-06-27 23:23   ` Kees Cook
  0 siblings, 0 replies; 59+ messages in thread
From: Kees Cook @ 2014-06-27 23:23 UTC (permalink / raw)
  To: linux-arm-kernel

Applying restrictive seccomp filter programs to large or diverse
codebases often requires handling threads which may be started early in
the process lifetime (e.g., by code that is linked in). While it is
possible to apply permissive programs prior to process start up, it is
difficult to further restrict the kernel ABI to those threads after that
point.

This change adds a new seccomp syscall flag to SECCOMP_SET_MODE_FILTER for
synchronizing thread group seccomp filters at filter installation time.

When calling seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC,
filter) an attempt will be made to synchronize all threads in current's
threadgroup to its new seccomp filter program. This is possible iff all
threads are using a filter that is an ancestor to the filter current is
attempting to synchronize to. NULL filters (where the task is running as
SECCOMP_MODE_NONE) are also treated as ancestors allowing threads to be
transitioned into SECCOMP_MODE_FILTER. If prctrl(PR_SET_NO_NEW_PRIVS,
...) has been set on the calling thread, no_new_privs will be set for
all synchronized threads too. On success, 0 is returned. On failure,
the pid of one of the failing threads will be returned and no filters
will have been applied.

The race conditions against another thread are:
- requesting TSYNC (already handled by sighand lock)
- performing a clone (already handled by sighand lock)
- changing its filter (already handled by sighand lock)
- calling exec (handled by cred_guard_mutex)
The clone case is assisted by the fact that new threads will have their
seccomp state duplicated from their parent before appearing on the tasklist.

Holding cred_guard_mutex means that seccomp filters cannot be assigned
while in the middle of another thread's exec (potentially bypassing
no_new_privs or similar). To make sure that de_thread() is actually able
to kill other threads during an exec, any sighand holders need to check
if they've been scheduled to be killed, and to give up on their work.

Based on patches by Will Drewry.

Suggested-by: Julien Tinnes <jln@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/exec.c                    |    2 +-
 include/linux/seccomp.h      |    2 +
 include/uapi/linux/seccomp.h |    3 +
 kernel/seccomp.c             |  139 +++++++++++++++++++++++++++++++++++++++++-
 4 files changed, 144 insertions(+), 2 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 0f5c272410f6..ab1f1200ce5d 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1216,7 +1216,7 @@ EXPORT_SYMBOL(install_exec_creds);
 /*
  * determine how safe it is to execute the proposed program
  * - the caller must hold ->cred_guard_mutex to protect against
- *   PTRACE_ATTACH
+ *   PTRACE_ATTACH or seccomp thread-sync
  */
 static void check_unsafe_exec(struct linux_binprm *bprm)
 {
diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index 9ff98b4bfe2e..15de2a711518 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -3,6 +3,8 @@
 
 #include <uapi/linux/seccomp.h>
 
+#define SECCOMP_FILTER_FLAG_MASK	~(SECCOMP_FILTER_FLAG_TSYNC)
+
 #ifdef CONFIG_SECCOMP
 
 #include <linux/thread_info.h>
diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
index b258878ba754..0f238a43ff1e 100644
--- a/include/uapi/linux/seccomp.h
+++ b/include/uapi/linux/seccomp.h
@@ -14,6 +14,9 @@
 #define SECCOMP_SET_MODE_STRICT	0
 #define SECCOMP_SET_MODE_FILTER	1
 
+/* Valid flags for SECCOMP_SET_MODE_FILTER */
+#define SECCOMP_FILTER_FLAG_TSYNC	1
+
 /*
  * All BPF programs must return a 32-bit value.
  * The bottom 16-bits are for optional return data.
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 7bbcb9ed16df..0a82e16da7ef 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -26,6 +26,7 @@
 #ifdef CONFIG_SECCOMP_FILTER
 #include <asm/syscall.h>
 #include <linux/filter.h>
+#include <linux/pid.h>
 #include <linux/ptrace.h>
 #include <linux/security.h>
 #include <linux/tracehook.h>
@@ -222,6 +223,108 @@ static inline void seccomp_assign_mode(struct task_struct *task,
 }
 
 #ifdef CONFIG_SECCOMP_FILTER
+/* Returns 1 if the candidate is an ancestor. */
+static int is_ancestor(struct seccomp_filter *candidate,
+		       struct seccomp_filter *child)
+{
+	/* NULL is the root ancestor. */
+	if (candidate == NULL)
+		return 1;
+	for (; child; child = child->prev)
+		if (child == candidate)
+			return 1;
+	return 0;
+}
+
+/**
+ * seccomp_can_sync_threads: checks if all threads can be synchronized
+ *
+ * Expects sighand and cred_guard_mutex locks to be held.
+ *
+ * Returns 0 on success, -ve on error, or the pid of a thread which was
+ * either not in the correct seccomp mode or it did not have an ancestral
+ * seccomp filter.
+ */
+static inline pid_t seccomp_can_sync_threads(void)
+{
+	struct task_struct *thread, *caller;
+
+	BUG_ON(!mutex_is_locked(&current->signal->cred_guard_mutex));
+	BUG_ON(!spin_is_locked(&current->sighand->siglock));
+
+	if (current->seccomp.mode != SECCOMP_MODE_FILTER)
+		return -EACCES;
+
+	/* Validate all threads being eligible for synchronization. */
+	caller = current;
+	for_each_thread(caller, thread) {
+		pid_t failed;
+
+		if (thread->seccomp.mode == SECCOMP_MODE_DISABLED ||
+		    (thread->seccomp.mode == SECCOMP_MODE_FILTER &&
+		     is_ancestor(thread->seccomp.filter,
+				 caller->seccomp.filter)))
+			continue;
+
+		/* Return the first thread that cannot be synchronized. */
+		failed = task_pid_vnr(thread);
+		/* If the pid cannot be resolved, then return -ESRCH */
+		if (unlikely(WARN_ON(failed == 0)))
+			failed = -ESRCH;
+		return failed;
+	}
+
+	return 0;
+}
+
+/**
+ * seccomp_sync_threads: sets all threads to use current's filter
+ *
+ * Expects sighand and cred_guard_mutex locks to be held, and for
+ * seccomp_can_sync_threads() to have returned success already
+ * without dropping the locks.
+ *
+ */
+static inline void seccomp_sync_threads(void)
+{
+	struct task_struct *thread, *caller;
+
+	BUG_ON(!mutex_is_locked(&current->signal->cred_guard_mutex));
+	BUG_ON(!spin_is_locked(&current->sighand->siglock));
+
+	/* Synchronize all threads. */
+	caller = current;
+	for_each_thread(caller, thread) {
+		/* Get a task reference for the new leaf node. */
+		get_seccomp_filter(caller);
+		/*
+		 * Drop the task reference to the shared ancestor since
+		 * current's path will hold a reference.  (This also
+		 * allows a put before the assignment.)
+		 */
+		put_seccomp_filter(thread);
+		thread->seccomp.filter = caller->seccomp.filter;
+		/*
+		 * Opt the other thread into seccomp if needed.
+		 * As threads are considered to be trust-realm
+		 * equivalent (see ptrace_may_access), it is safe to
+		 * allow one thread to transition the other.
+		 */
+		if (thread->seccomp.mode == SECCOMP_MODE_DISABLED) {
+			/*
+			 * Don't let an unprivileged task work around
+			 * the no_new_privs restriction by creating
+			 * a thread that sets it up, enters seccomp,
+			 * then dies.
+			 */
+			if (task_no_new_privs(caller))
+				task_set_no_new_privs(thread);
+
+			seccomp_assign_mode(thread, SECCOMP_MODE_FILTER);
+		}
+	}
+}
+
 /**
  * seccomp_prepare_filter: Prepares a seccomp filter for use.
  * @fprog: BPF program to install
@@ -359,6 +462,15 @@ static long seccomp_attach_filter(unsigned int flags,
 	if (total_insns > MAX_INSNS_PER_PATH)
 		return -ENOMEM;
 
+	/* If thread sync has been requested, check that it is possible. */
+	if (flags & SECCOMP_FILTER_FLAG_TSYNC) {
+		int ret;
+
+		ret = seccomp_can_sync_threads();
+		if (ret)
+			return ret;
+	}
+
 	/*
 	 * If there is an existing filter, make it the prev and don't drop its
 	 * task reference.
@@ -366,6 +478,10 @@ static long seccomp_attach_filter(unsigned int flags,
 	filter->prev = current->seccomp.filter;
 	smp_store_release(&current->seccomp.filter, filter);
 
+	/* Now that the new filter is in place, synchronize to all threads. */
+	if (flags & SECCOMP_FILTER_FLAG_TSYNC)
+		seccomp_sync_threads();
+
 	return 0;
 }
 
@@ -547,6 +663,11 @@ static long seccomp_set_mode_strict(void)
 	long ret = -EINVAL;
 
 	spin_lock_irq(&current->sighand->siglock);
+	if (unlikely(signal_group_exit(current->signal))) {
+		/* If thread is dying, return to process the signal. */
+		ret = -EAGAIN;
+		goto out;
+	}
 
 	if (!seccomp_check_mode(seccomp_mode))
 		goto out;
@@ -585,7 +706,7 @@ static long seccomp_set_mode_filter(unsigned int flags,
 	long ret = -EINVAL;
 
 	/* Validate flags. */
-	if (flags != 0)
+	if (flags & SECCOMP_FILTER_FLAG_MASK)
 		return -EINVAL;
 
 	/* Prepare the new filter before holding any locks. */
@@ -593,7 +714,20 @@ static long seccomp_set_mode_filter(unsigned int flags,
 	if (IS_ERR(prepared))
 		return PTR_ERR(prepared);
 
+	/*
+	 * Make sure we cannot change seccomp or nnp state via TSYNC
+	 * while another thread is in the middle of calling exec.
+	 */
+	if (flags & SECCOMP_FILTER_FLAG_TSYNC &&
+	    mutex_lock_killable(&current->signal->cred_guard_mutex))
+		goto out_free;
+
 	spin_lock_irq(&current->sighand->siglock);
+	if (unlikely(signal_group_exit(current->signal))) {
+		/* If thread is dying, return to process the signal. */
+		ret = -EAGAIN;
+		goto out;
+	}
 
 	if (!seccomp_check_mode(seccomp_mode))
 		goto out;
@@ -607,6 +741,9 @@ static long seccomp_set_mode_filter(unsigned int flags,
 	seccomp_assign_mode(current, seccomp_mode);
 out:
 	spin_unlock_irq(&current->sighand->siglock);
+	if (flags & SECCOMP_FILTER_FLAG_TSYNC)
+		mutex_unlock(&current->signal->cred_guard_mutex);
+out_free:
 	seccomp_filter_free(prepared);
 	return ret;
 }
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH v9 11/11] seccomp: implement SECCOMP_FILTER_FLAG_TSYNC
  2014-06-27 23:23   ` Kees Cook
@ 2014-07-09 18:05     ` Oleg Nesterov
  -1 siblings, 0 replies; 59+ messages in thread
From: Oleg Nesterov @ 2014-07-09 18:05 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, Andy Lutomirski, Michael Kerrisk (man-pages),
	Alexei Starovoitov, Andrew Morton, Daniel Borkmann, Will Drewry,
	Julien Tinnes, David Drysdale, linux-api, x86, linux-arm-kernel,
	linux-mips, linux-arch, linux-security-module

First of all, sorry for delay ;)

So far I quickly glanced at this series and everything look fine, but
I am confused by the signal_group_exit() check,

On 06/27, Kees Cook wrote:
>
> To make sure that de_thread() is actually able
> to kill other threads during an exec, any sighand holders need to check
> if they've been scheduled to be killed, and to give up on their work.

Probably this connects to that check below? I can't understand this...

> +	/*
> +	 * Make sure we cannot change seccomp or nnp state via TSYNC
> +	 * while another thread is in the middle of calling exec.
> +	 */
> +	if (flags & SECCOMP_FILTER_FLAG_TSYNC &&
> +	    mutex_lock_killable(&current->signal->cred_guard_mutex))
> +		goto out_free;

-EINVAL looks a bit confusing in this case, but this is cosemtic because
userspace won't see this error-code anyway.

>  	spin_lock_irq(&current->sighand->siglock);
> +	if (unlikely(signal_group_exit(current->signal))) {
> +		/* If thread is dying, return to process the signal. */

OK, this doesn't hurt, but why?

You could check __fatal_signal_pending() with the same effect. And since
we hold this mutex, exec (de_thread) can be the source of that SIGKILL.
We take this mutex specially to avoid the race with exec.

So why do we need to abort if we race with kill() or exit_grouo() ?

Oleg.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v9 11/11] seccomp: implement SECCOMP_FILTER_FLAG_TSYNC
@ 2014-07-09 18:05     ` Oleg Nesterov
  0 siblings, 0 replies; 59+ messages in thread
From: Oleg Nesterov @ 2014-07-09 18:05 UTC (permalink / raw)
  To: linux-arm-kernel

First of all, sorry for delay ;)

So far I quickly glanced at this series and everything look fine, but
I am confused by the signal_group_exit() check,

On 06/27, Kees Cook wrote:
>
> To make sure that de_thread() is actually able
> to kill other threads during an exec, any sighand holders need to check
> if they've been scheduled to be killed, and to give up on their work.

Probably this connects to that check below? I can't understand this...

> +	/*
> +	 * Make sure we cannot change seccomp or nnp state via TSYNC
> +	 * while another thread is in the middle of calling exec.
> +	 */
> +	if (flags & SECCOMP_FILTER_FLAG_TSYNC &&
> +	    mutex_lock_killable(&current->signal->cred_guard_mutex))
> +		goto out_free;

-EINVAL looks a bit confusing in this case, but this is cosemtic because
userspace won't see this error-code anyway.

>  	spin_lock_irq(&current->sighand->siglock);
> +	if (unlikely(signal_group_exit(current->signal))) {
> +		/* If thread is dying, return to process the signal. */

OK, this doesn't hurt, but why?

You could check __fatal_signal_pending() with the same effect. And since
we hold this mutex, exec (de_thread) can be the source of that SIGKILL.
We take this mutex specially to avoid the race with exec.

So why do we need to abort if we race with kill() or exit_grouo() ?

Oleg.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v9 09/11] seccomp: introduce writer locking
  2014-06-27 23:22   ` Kees Cook
@ 2014-07-09 18:42     ` Oleg Nesterov
  -1 siblings, 0 replies; 59+ messages in thread
From: Oleg Nesterov @ 2014-07-09 18:42 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, Andy Lutomirski, Michael Kerrisk (man-pages),
	Alexei Starovoitov, Andrew Morton, Daniel Borkmann, Will Drewry,
	Julien Tinnes, David Drysdale, linux-api, x86, linux-arm-kernel,
	linux-mips, linux-arch, linux-security-module

On 06/27, Kees Cook wrote:
>
>  static u32 seccomp_run_filters(int syscall)
>  {
> -	struct seccomp_filter *f;
> +	struct seccomp_filter *f = ACCESS_ONCE(current->seccomp.filter);

I am not sure...

This is fine if this ->filter is the 1st (and only) one, in this case
we can rely on rmb() in the caller.

But the new filter can be installed at any moment. Say, right after that
rmb() although this doesn't matter. Either we need smp_read_barrier_depends()
after that, or smp_load_acquire() like the previous version did?

Oleg.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v9 09/11] seccomp: introduce writer locking
@ 2014-07-09 18:42     ` Oleg Nesterov
  0 siblings, 0 replies; 59+ messages in thread
From: Oleg Nesterov @ 2014-07-09 18:42 UTC (permalink / raw)
  To: linux-arm-kernel

On 06/27, Kees Cook wrote:
>
>  static u32 seccomp_run_filters(int syscall)
>  {
> -	struct seccomp_filter *f;
> +	struct seccomp_filter *f = ACCESS_ONCE(current->seccomp.filter);

I am not sure...

This is fine if this ->filter is the 1st (and only) one, in this case
we can rely on rmb() in the caller.

But the new filter can be installed at any moment. Say, right after that
rmb() although this doesn't matter. Either we need smp_read_barrier_depends()
after that, or smp_load_acquire() like the previous version did?

Oleg.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v9 09/11] seccomp: introduce writer locking
  2014-07-09 18:42     ` Oleg Nesterov
@ 2014-07-09 18:55       ` Oleg Nesterov
  -1 siblings, 0 replies; 59+ messages in thread
From: Oleg Nesterov @ 2014-07-09 18:55 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, Andy Lutomirski, Michael Kerrisk (man-pages),
	Alexei Starovoitov, Andrew Morton, Daniel Borkmann, Will Drewry,
	Julien Tinnes, David Drysdale, linux-api, x86, linux-arm-kernel,
	linux-mips, linux-arch, linux-security-module

On 07/09, Oleg Nesterov wrote:
>
> On 06/27, Kees Cook wrote:
> >
> >  static u32 seccomp_run_filters(int syscall)
> >  {
> > -	struct seccomp_filter *f;
> > +	struct seccomp_filter *f = ACCESS_ONCE(current->seccomp.filter);
>
> I am not sure...
>
> This is fine if this ->filter is the 1st (and only) one, in this case
> we can rely on rmb() in the caller.
>
> But the new filter can be installed at any moment. Say, right after that
> rmb() although this doesn't matter. Either we need smp_read_barrier_depends()
> after that, or smp_load_acquire() like the previous version did?

Wait... and it seems that seccomp_sync_threads() needs smp_store_release()
when it sets thread->filter = current->filter by the same reason?

OTOH. smp_store_release() in seccomp_attach_filter() can die, "current"
doesn't need a barrier to serialize with itself.

Oleg.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v9 09/11] seccomp: introduce writer locking
@ 2014-07-09 18:55       ` Oleg Nesterov
  0 siblings, 0 replies; 59+ messages in thread
From: Oleg Nesterov @ 2014-07-09 18:55 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/09, Oleg Nesterov wrote:
>
> On 06/27, Kees Cook wrote:
> >
> >  static u32 seccomp_run_filters(int syscall)
> >  {
> > -	struct seccomp_filter *f;
> > +	struct seccomp_filter *f = ACCESS_ONCE(current->seccomp.filter);
>
> I am not sure...
>
> This is fine if this ->filter is the 1st (and only) one, in this case
> we can rely on rmb() in the caller.
>
> But the new filter can be installed at any moment. Say, right after that
> rmb() although this doesn't matter. Either we need smp_read_barrier_depends()
> after that, or smp_load_acquire() like the previous version did?

Wait... and it seems that seccomp_sync_threads() needs smp_store_release()
when it sets thread->filter = current->filter by the same reason?

OTOH. smp_store_release() in seccomp_attach_filter() can die, "current"
doesn't need a barrier to serialize with itself.

Oleg.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v9 09/11] seccomp: introduce writer locking
  2014-06-27 23:22   ` Kees Cook
@ 2014-07-09 18:59     ` Oleg Nesterov
  -1 siblings, 0 replies; 59+ messages in thread
From: Oleg Nesterov @ 2014-07-09 18:59 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-kernel, Andy Lutomirski, Michael Kerrisk (man-pages),
	Alexei Starovoitov, Andrew Morton, Daniel Borkmann, Will Drewry,
	Julien Tinnes, David Drysdale, linux-api, x86, linux-arm-kernel,
	linux-mips, linux-arch, linux-security-module

On 06/27, Kees Cook wrote:
>
>  void put_seccomp_filter(struct task_struct *tsk)
>  {
> -	struct seccomp_filter *orig = tsk->seccomp.filter;
> +	struct seccomp_filter *orig = smp_load_acquire(&tsk->seccomp.filter);
>  	/* Clean up single-reference branches iteratively. */
>  	while (orig && atomic_dec_and_test(&orig->usage)) {

And this smp_load_acquire() looks unnecessary. atomic_dec_and_test()
is a barrier.

Oleg.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v9 09/11] seccomp: introduce writer locking
@ 2014-07-09 18:59     ` Oleg Nesterov
  0 siblings, 0 replies; 59+ messages in thread
From: Oleg Nesterov @ 2014-07-09 18:59 UTC (permalink / raw)
  To: linux-arm-kernel

On 06/27, Kees Cook wrote:
>
>  void put_seccomp_filter(struct task_struct *tsk)
>  {
> -	struct seccomp_filter *orig = tsk->seccomp.filter;
> +	struct seccomp_filter *orig = smp_load_acquire(&tsk->seccomp.filter);
>  	/* Clean up single-reference branches iteratively. */
>  	while (orig && atomic_dec_and_test(&orig->usage)) {

And this smp_load_acquire() looks unnecessary. atomic_dec_and_test()
is a barrier.

Oleg.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v9 11/11] seccomp: implement SECCOMP_FILTER_FLAG_TSYNC
  2014-07-09 18:05     ` Oleg Nesterov
  (?)
@ 2014-07-10  9:17       ` Kees Cook
  -1 siblings, 0 replies; 59+ messages in thread
From: Kees Cook @ 2014-07-10  9:17 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: LKML, Andy Lutomirski, Michael Kerrisk (man-pages),
	Alexei Starovoitov, Andrew Morton, Daniel Borkmann, Will Drewry,
	Julien Tinnes, David Drysdale, Linux API, x86, linux-arm-kernel,
	linux-mips, linux-arch, linux-security-module

On Wed, Jul 9, 2014 at 11:05 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> First of all, sorry for delay ;)
>
> So far I quickly glanced at this series and everything look fine, but
> I am confused by the signal_group_exit() check,
>
> On 06/27, Kees Cook wrote:
>>
>> To make sure that de_thread() is actually able
>> to kill other threads during an exec, any sighand holders need to check
>> if they've been scheduled to be killed, and to give up on their work.
>
> Probably this connects to that check below? I can't understand this...
>
>> +     /*
>> +      * Make sure we cannot change seccomp or nnp state via TSYNC
>> +      * while another thread is in the middle of calling exec.
>> +      */
>> +     if (flags & SECCOMP_FILTER_FLAG_TSYNC &&
>> +         mutex_lock_killable(&current->signal->cred_guard_mutex))
>> +             goto out_free;
>
> -EINVAL looks a bit confusing in this case, but this is cosemtic because
> userspace won't see this error-code anyway.

Happy to use whatever since, as you say, it's cosmetic. Perhaps -EAGAIN?

>
>>       spin_lock_irq(&current->sighand->siglock);
>> +     if (unlikely(signal_group_exit(current->signal))) {
>> +             /* If thread is dying, return to process the signal. */
>
> OK, this doesn't hurt, but why?
>
> You could check __fatal_signal_pending() with the same effect. And since
> we hold this mutex, exec (de_thread) can be the source of that SIGKILL.
> We take this mutex specially to avoid the race with exec.
>
> So why do we need to abort if we race with kill() or exit_grouo() ?

In my initial code inspection that we could block waiting for the
cred_guard mutex, with exec holding it, exec would schedule death in
de_thread, and then once it released, the tsync thread would try to
keep running.

However, in looking at this again, now I'm concerned this produces a
dead-lock in de_thread, since it waits for all threads to actually
die, but tsync will be waiting with the killable mutex.

So I think I got too defensive when I read the top of de_thread where
it checks for pending signals itself.

It seems like I can just safely remove the singal_group_exit checks?
The other paths (non-tsync seccomp_set_mode_filter, and
seccomp_set_mode_strict) would just run until it finished the syscall,
and then died. I can't decide which feels cleaner: just letting stuff
clean up naturally on death or to short-circuit after taking
sighand->siglock.

What do you think?

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v9 11/11] seccomp: implement SECCOMP_FILTER_FLAG_TSYNC
@ 2014-07-10  9:17       ` Kees Cook
  0 siblings, 0 replies; 59+ messages in thread
From: Kees Cook @ 2014-07-10  9:17 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: LKML, Andy Lutomirski, Michael Kerrisk (man-pages),
	Alexei Starovoitov, Andrew Morton, Daniel Borkmann, Will Drewry,
	Julien Tinnes, David Drysdale, Linux API, x86, linux-arm-kernel,
	linux-mips, linux-arch, linux-security-module

On Wed, Jul 9, 2014 at 11:05 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> First of all, sorry for delay ;)
>
> So far I quickly glanced at this series and everything look fine, but
> I am confused by the signal_group_exit() check,
>
> On 06/27, Kees Cook wrote:
>>
>> To make sure that de_thread() is actually able
>> to kill other threads during an exec, any sighand holders need to check
>> if they've been scheduled to be killed, and to give up on their work.
>
> Probably this connects to that check below? I can't understand this...
>
>> +     /*
>> +      * Make sure we cannot change seccomp or nnp state via TSYNC
>> +      * while another thread is in the middle of calling exec.
>> +      */
>> +     if (flags & SECCOMP_FILTER_FLAG_TSYNC &&
>> +         mutex_lock_killable(&current->signal->cred_guard_mutex))
>> +             goto out_free;
>
> -EINVAL looks a bit confusing in this case, but this is cosemtic because
> userspace won't see this error-code anyway.

Happy to use whatever since, as you say, it's cosmetic. Perhaps -EAGAIN?

>
>>       spin_lock_irq(&current->sighand->siglock);
>> +     if (unlikely(signal_group_exit(current->signal))) {
>> +             /* If thread is dying, return to process the signal. */
>
> OK, this doesn't hurt, but why?
>
> You could check __fatal_signal_pending() with the same effect. And since
> we hold this mutex, exec (de_thread) can be the source of that SIGKILL.
> We take this mutex specially to avoid the race with exec.
>
> So why do we need to abort if we race with kill() or exit_grouo() ?

In my initial code inspection that we could block waiting for the
cred_guard mutex, with exec holding it, exec would schedule death in
de_thread, and then once it released, the tsync thread would try to
keep running.

However, in looking at this again, now I'm concerned this produces a
dead-lock in de_thread, since it waits for all threads to actually
die, but tsync will be waiting with the killable mutex.

So I think I got too defensive when I read the top of de_thread where
it checks for pending signals itself.

It seems like I can just safely remove the singal_group_exit checks?
The other paths (non-tsync seccomp_set_mode_filter, and
seccomp_set_mode_strict) would just run until it finished the syscall,
and then died. I can't decide which feels cleaner: just letting stuff
clean up naturally on death or to short-circuit after taking
sighand->siglock.

What do you think?

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v9 11/11] seccomp: implement SECCOMP_FILTER_FLAG_TSYNC
@ 2014-07-10  9:17       ` Kees Cook
  0 siblings, 0 replies; 59+ messages in thread
From: Kees Cook @ 2014-07-10  9:17 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jul 9, 2014 at 11:05 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> First of all, sorry for delay ;)
>
> So far I quickly glanced at this series and everything look fine, but
> I am confused by the signal_group_exit() check,
>
> On 06/27, Kees Cook wrote:
>>
>> To make sure that de_thread() is actually able
>> to kill other threads during an exec, any sighand holders need to check
>> if they've been scheduled to be killed, and to give up on their work.
>
> Probably this connects to that check below? I can't understand this...
>
>> +     /*
>> +      * Make sure we cannot change seccomp or nnp state via TSYNC
>> +      * while another thread is in the middle of calling exec.
>> +      */
>> +     if (flags & SECCOMP_FILTER_FLAG_TSYNC &&
>> +         mutex_lock_killable(&current->signal->cred_guard_mutex))
>> +             goto out_free;
>
> -EINVAL looks a bit confusing in this case, but this is cosemtic because
> userspace won't see this error-code anyway.

Happy to use whatever since, as you say, it's cosmetic. Perhaps -EAGAIN?

>
>>       spin_lock_irq(&current->sighand->siglock);
>> +     if (unlikely(signal_group_exit(current->signal))) {
>> +             /* If thread is dying, return to process the signal. */
>
> OK, this doesn't hurt, but why?
>
> You could check __fatal_signal_pending() with the same effect. And since
> we hold this mutex, exec (de_thread) can be the source of that SIGKILL.
> We take this mutex specially to avoid the race with exec.
>
> So why do we need to abort if we race with kill() or exit_grouo() ?

In my initial code inspection that we could block waiting for the
cred_guard mutex, with exec holding it, exec would schedule death in
de_thread, and then once it released, the tsync thread would try to
keep running.

However, in looking at this again, now I'm concerned this produces a
dead-lock in de_thread, since it waits for all threads to actually
die, but tsync will be waiting with the killable mutex.

So I think I got too defensive when I read the top of de_thread where
it checks for pending signals itself.

It seems like I can just safely remove the singal_group_exit checks?
The other paths (non-tsync seccomp_set_mode_filter, and
seccomp_set_mode_strict) would just run until it finished the syscall,
and then died. I can't decide which feels cleaner: just letting stuff
clean up naturally on death or to short-circuit after taking
sighand->siglock.

What do you think?

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v9 09/11] seccomp: introduce writer locking
  2014-07-09 18:55       ` Oleg Nesterov
  (?)
@ 2014-07-10  9:25         ` Kees Cook
  -1 siblings, 0 replies; 59+ messages in thread
From: Kees Cook @ 2014-07-10  9:25 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: LKML, Andy Lutomirski, Michael Kerrisk (man-pages),
	Alexei Starovoitov, Andrew Morton, Daniel Borkmann, Will Drewry,
	Julien Tinnes, David Drysdale, Linux API, x86, linux-arm-kernel,
	linux-mips, linux-arch, linux-security-module

On Wed, Jul 9, 2014 at 11:55 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 07/09, Oleg Nesterov wrote:
>>
>> On 06/27, Kees Cook wrote:
>> >
>> >  static u32 seccomp_run_filters(int syscall)
>> >  {
>> > -   struct seccomp_filter *f;
>> > +   struct seccomp_filter *f = ACCESS_ONCE(current->seccomp.filter);
>>
>> I am not sure...
>>
>> This is fine if this ->filter is the 1st (and only) one, in this case
>> we can rely on rmb() in the caller.
>>
>> But the new filter can be installed at any moment. Say, right after that
>> rmb() although this doesn't matter. Either we need smp_read_barrier_depends()
>> after that, or smp_load_acquire() like the previous version did?
>
> Wait... and it seems that seccomp_sync_threads() needs smp_store_release()
> when it sets thread->filter = current->filter by the same reason?
>
> OTOH. smp_store_release() in seccomp_attach_filter() can die, "current"
> doesn't need a barrier to serialize with itself.

I have lost track of what you're suggesting to change. :)

Since rmb() happens before run_filters, isn't the ACCESS_ONCE
sufficient? We only care that TIF_SECCOMP, mode, and some filter is
valid. In a tsync thread race, it's okay to use not use the deepest
filter node in the list, it just needs A filter.

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v9 09/11] seccomp: introduce writer locking
@ 2014-07-10  9:25         ` Kees Cook
  0 siblings, 0 replies; 59+ messages in thread
From: Kees Cook @ 2014-07-10  9:25 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: LKML, Andy Lutomirski, Michael Kerrisk (man-pages),
	Alexei Starovoitov, Andrew Morton, Daniel Borkmann, Will Drewry,
	Julien Tinnes, David Drysdale, Linux API, x86, linux-arm-kernel,
	linux-mips, linux-arch, linux-security-module

On Wed, Jul 9, 2014 at 11:55 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 07/09, Oleg Nesterov wrote:
>>
>> On 06/27, Kees Cook wrote:
>> >
>> >  static u32 seccomp_run_filters(int syscall)
>> >  {
>> > -   struct seccomp_filter *f;
>> > +   struct seccomp_filter *f = ACCESS_ONCE(current->seccomp.filter);
>>
>> I am not sure...
>>
>> This is fine if this ->filter is the 1st (and only) one, in this case
>> we can rely on rmb() in the caller.
>>
>> But the new filter can be installed at any moment. Say, right after that
>> rmb() although this doesn't matter. Either we need smp_read_barrier_depends()
>> after that, or smp_load_acquire() like the previous version did?
>
> Wait... and it seems that seccomp_sync_threads() needs smp_store_release()
> when it sets thread->filter = current->filter by the same reason?
>
> OTOH. smp_store_release() in seccomp_attach_filter() can die, "current"
> doesn't need a barrier to serialize with itself.

I have lost track of what you're suggesting to change. :)

Since rmb() happens before run_filters, isn't the ACCESS_ONCE
sufficient? We only care that TIF_SECCOMP, mode, and some filter is
valid. In a tsync thread race, it's okay to use not use the deepest
filter node in the list, it just needs A filter.

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v9 09/11] seccomp: introduce writer locking
@ 2014-07-10  9:25         ` Kees Cook
  0 siblings, 0 replies; 59+ messages in thread
From: Kees Cook @ 2014-07-10  9:25 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jul 9, 2014 at 11:55 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 07/09, Oleg Nesterov wrote:
>>
>> On 06/27, Kees Cook wrote:
>> >
>> >  static u32 seccomp_run_filters(int syscall)
>> >  {
>> > -   struct seccomp_filter *f;
>> > +   struct seccomp_filter *f = ACCESS_ONCE(current->seccomp.filter);
>>
>> I am not sure...
>>
>> This is fine if this ->filter is the 1st (and only) one, in this case
>> we can rely on rmb() in the caller.
>>
>> But the new filter can be installed at any moment. Say, right after that
>> rmb() although this doesn't matter. Either we need smp_read_barrier_depends()
>> after that, or smp_load_acquire() like the previous version did?
>
> Wait... and it seems that seccomp_sync_threads() needs smp_store_release()
> when it sets thread->filter = current->filter by the same reason?
>
> OTOH. smp_store_release() in seccomp_attach_filter() can die, "current"
> doesn't need a barrier to serialize with itself.

I have lost track of what you're suggesting to change. :)

Since rmb() happens before run_filters, isn't the ACCESS_ONCE
sufficient? We only care that TIF_SECCOMP, mode, and some filter is
valid. In a tsync thread race, it's okay to use not use the deepest
filter node in the list, it just needs A filter.

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v9 11/11] seccomp: implement SECCOMP_FILTER_FLAG_TSYNC
  2014-07-10  9:17       ` Kees Cook
  (?)
@ 2014-07-10 15:08         ` Oleg Nesterov
  -1 siblings, 0 replies; 59+ messages in thread
From: Oleg Nesterov @ 2014-07-10 15:08 UTC (permalink / raw)
  To: Kees Cook
  Cc: LKML, Andy Lutomirski, Michael Kerrisk (man-pages),
	Alexei Starovoitov, Andrew Morton, Daniel Borkmann, Will Drewry,
	Julien Tinnes, David Drysdale, Linux API, x86, linux-arm-kernel,
	linux-mips, linux-arch, linux-security-module

On 07/10, Kees Cook wrote:
>
> On Wed, Jul 9, 2014 at 11:05 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> >
> >> +     /*
> >> +      * Make sure we cannot change seccomp or nnp state via TSYNC
> >> +      * while another thread is in the middle of calling exec.
> >> +      */
> >> +     if (flags & SECCOMP_FILTER_FLAG_TSYNC &&
> >> +         mutex_lock_killable(&current->signal->cred_guard_mutex))
> >> +             goto out_free;
> >
> > -EINVAL looks a bit confusing in this case, but this is cosemtic because
> > userspace won't see this error-code anyway.
>
> Happy to use whatever since, as you say, it's cosmetic. Perhaps -EAGAIN?

Or -EINTR. I do not really mind, I only mentioned this because I had another
nit.

> >>       spin_lock_irq(&current->sighand->siglock);
> >> +     if (unlikely(signal_group_exit(current->signal))) {
> >> +             /* If thread is dying, return to process the signal. */
> >
> > OK, this doesn't hurt, but why?
> >
> > You could check __fatal_signal_pending() with the same effect. And since
> > we hold this mutex, exec (de_thread) can be the source of that SIGKILL.
> > We take this mutex specially to avoid the race with exec.
> >
> > So why do we need to abort if we race with kill() or exit_grouo() ?
>
> In my initial code inspection that we could block waiting for the
> cred_guard mutex, with exec holding it, exec would schedule death in
> de_thread, and then once it released, the tsync thread would try to
> keep running.
>
> However, in looking at this again, now I'm concerned this produces a
> dead-lock in de_thread, since it waits for all threads to actually
> die, but tsync will be waiting with the killable mutex.

That is why you should always use _killable (or _interruptible) if you
want to take ->cred_guard_mutex.

If this thread races with de_thread() which holds this mutex, it will
be killed and mutex_lock_killable() will fail.

(to clarify; this deadlock is not "fatal", de_thread() can be killed too,
 but this doesn't really matter).

> So I think I got too defensive when I read the top of de_thread where
> it checks for pending signals itself.
>
> It seems like I can just safely remove the singal_group_exit checks?
> The other paths (non-tsync seccomp_set_mode_filter, and
> seccomp_set_mode_strict)

Yes, I missed another signal_group_exit() in seccomp_set_mode_strict().
It looks equally unneeded.

> I can't decide which feels cleaner: just letting stuff
> clean up naturally on death or to short-circuit after taking
> sighand->siglock.

I'd prefer to simply remove the singal_group_exit checks.

I won't argue if you prefer to keep them, but then please add a comment
to explain that this is not needed for correctness.

Because otherwise the code looks confusing, as if there is a subtle reason
why we must not do this if killed.

Oleg.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v9 11/11] seccomp: implement SECCOMP_FILTER_FLAG_TSYNC
@ 2014-07-10 15:08         ` Oleg Nesterov
  0 siblings, 0 replies; 59+ messages in thread
From: Oleg Nesterov @ 2014-07-10 15:08 UTC (permalink / raw)
  To: Kees Cook
  Cc: LKML, Andy Lutomirski, Michael Kerrisk (man-pages),
	Alexei Starovoitov, Andrew Morton, Daniel Borkmann, Will Drewry,
	Julien Tinnes, David Drysdale, Linux API, x86, linux-arm-kernel,
	linux-mips, linux-arch, linux-security-module

On 07/10, Kees Cook wrote:
>
> On Wed, Jul 9, 2014 at 11:05 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> >
> >> +     /*
> >> +      * Make sure we cannot change seccomp or nnp state via TSYNC
> >> +      * while another thread is in the middle of calling exec.
> >> +      */
> >> +     if (flags & SECCOMP_FILTER_FLAG_TSYNC &&
> >> +         mutex_lock_killable(&current->signal->cred_guard_mutex))
> >> +             goto out_free;
> >
> > -EINVAL looks a bit confusing in this case, but this is cosemtic because
> > userspace won't see this error-code anyway.
>
> Happy to use whatever since, as you say, it's cosmetic. Perhaps -EAGAIN?

Or -EINTR. I do not really mind, I only mentioned this because I had another
nit.

> >>       spin_lock_irq(&current->sighand->siglock);
> >> +     if (unlikely(signal_group_exit(current->signal))) {
> >> +             /* If thread is dying, return to process the signal. */
> >
> > OK, this doesn't hurt, but why?
> >
> > You could check __fatal_signal_pending() with the same effect. And since
> > we hold this mutex, exec (de_thread) can be the source of that SIGKILL.
> > We take this mutex specially to avoid the race with exec.
> >
> > So why do we need to abort if we race with kill() or exit_grouo() ?
>
> In my initial code inspection that we could block waiting for the
> cred_guard mutex, with exec holding it, exec would schedule death in
> de_thread, and then once it released, the tsync thread would try to
> keep running.
>
> However, in looking at this again, now I'm concerned this produces a
> dead-lock in de_thread, since it waits for all threads to actually
> die, but tsync will be waiting with the killable mutex.

That is why you should always use _killable (or _interruptible) if you
want to take ->cred_guard_mutex.

If this thread races with de_thread() which holds this mutex, it will
be killed and mutex_lock_killable() will fail.

(to clarify; this deadlock is not "fatal", de_thread() can be killed too,
 but this doesn't really matter).

> So I think I got too defensive when I read the top of de_thread where
> it checks for pending signals itself.
>
> It seems like I can just safely remove the singal_group_exit checks?
> The other paths (non-tsync seccomp_set_mode_filter, and
> seccomp_set_mode_strict)

Yes, I missed another signal_group_exit() in seccomp_set_mode_strict().
It looks equally unneeded.

> I can't decide which feels cleaner: just letting stuff
> clean up naturally on death or to short-circuit after taking
> sighand->siglock.

I'd prefer to simply remove the singal_group_exit checks.

I won't argue if you prefer to keep them, but then please add a comment
to explain that this is not needed for correctness.

Because otherwise the code looks confusing, as if there is a subtle reason
why we must not do this if killed.

Oleg.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v9 11/11] seccomp: implement SECCOMP_FILTER_FLAG_TSYNC
@ 2014-07-10 15:08         ` Oleg Nesterov
  0 siblings, 0 replies; 59+ messages in thread
From: Oleg Nesterov @ 2014-07-10 15:08 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/10, Kees Cook wrote:
>
> On Wed, Jul 9, 2014 at 11:05 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> >
> >> +     /*
> >> +      * Make sure we cannot change seccomp or nnp state via TSYNC
> >> +      * while another thread is in the middle of calling exec.
> >> +      */
> >> +     if (flags & SECCOMP_FILTER_FLAG_TSYNC &&
> >> +         mutex_lock_killable(&current->signal->cred_guard_mutex))
> >> +             goto out_free;
> >
> > -EINVAL looks a bit confusing in this case, but this is cosemtic because
> > userspace won't see this error-code anyway.
>
> Happy to use whatever since, as you say, it's cosmetic. Perhaps -EAGAIN?

Or -EINTR. I do not really mind, I only mentioned this because I had another
nit.

> >>       spin_lock_irq(&current->sighand->siglock);
> >> +     if (unlikely(signal_group_exit(current->signal))) {
> >> +             /* If thread is dying, return to process the signal. */
> >
> > OK, this doesn't hurt, but why?
> >
> > You could check __fatal_signal_pending() with the same effect. And since
> > we hold this mutex, exec (de_thread) can be the source of that SIGKILL.
> > We take this mutex specially to avoid the race with exec.
> >
> > So why do we need to abort if we race with kill() or exit_grouo() ?
>
> In my initial code inspection that we could block waiting for the
> cred_guard mutex, with exec holding it, exec would schedule death in
> de_thread, and then once it released, the tsync thread would try to
> keep running.
>
> However, in looking at this again, now I'm concerned this produces a
> dead-lock in de_thread, since it waits for all threads to actually
> die, but tsync will be waiting with the killable mutex.

That is why you should always use _killable (or _interruptible) if you
want to take ->cred_guard_mutex.

If this thread races with de_thread() which holds this mutex, it will
be killed and mutex_lock_killable() will fail.

(to clarify; this deadlock is not "fatal", de_thread() can be killed too,
 but this doesn't really matter).

> So I think I got too defensive when I read the top of de_thread where
> it checks for pending signals itself.
>
> It seems like I can just safely remove the singal_group_exit checks?
> The other paths (non-tsync seccomp_set_mode_filter, and
> seccomp_set_mode_strict)

Yes, I missed another signal_group_exit() in seccomp_set_mode_strict().
It looks equally unneeded.

> I can't decide which feels cleaner: just letting stuff
> clean up naturally on death or to short-circuit after taking
> sighand->siglock.

I'd prefer to simply remove the singal_group_exit checks.

I won't argue if you prefer to keep them, but then please add a comment
to explain that this is not needed for correctness.

Because otherwise the code looks confusing, as if there is a subtle reason
why we must not do this if killed.

Oleg.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v9 09/11] seccomp: introduce writer locking
  2014-07-10  9:25         ` Kees Cook
  (?)
@ 2014-07-10 15:24           ` Oleg Nesterov
  -1 siblings, 0 replies; 59+ messages in thread
From: Oleg Nesterov @ 2014-07-10 15:24 UTC (permalink / raw)
  To: Kees Cook
  Cc: LKML, Andy Lutomirski, Michael Kerrisk (man-pages),
	Alexei Starovoitov, Andrew Morton, Daniel Borkmann, Will Drewry,
	Julien Tinnes, David Drysdale, Linux API, x86, linux-arm-kernel,
	linux-mips, linux-arch, linux-security-module

On 07/10, Kees Cook wrote:
>
> On Wed, Jul 9, 2014 at 11:55 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> > On 07/09, Oleg Nesterov wrote:
> >>
> >> On 06/27, Kees Cook wrote:
> >> >
> >> >  static u32 seccomp_run_filters(int syscall)
> >> >  {
> >> > -   struct seccomp_filter *f;
> >> > +   struct seccomp_filter *f = ACCESS_ONCE(current->seccomp.filter);
> >>
> >> I am not sure...
> >>
> >> This is fine if this ->filter is the 1st (and only) one, in this case
> >> we can rely on rmb() in the caller.
> >>
> >> But the new filter can be installed at any moment. Say, right after that
> >> rmb() although this doesn't matter. Either we need smp_read_barrier_depends()
> >> after that, or smp_load_acquire() like the previous version did?
> >
> > Wait... and it seems that seccomp_sync_threads() needs smp_store_release()
> > when it sets thread->filter = current->filter by the same reason?
> >
> > OTOH. smp_store_release() in seccomp_attach_filter() can die, "current"
> > doesn't need a barrier to serialize with itself.
>
> I have lost track of what you're suggesting to change. :)

Perhaps I am just trying to confuse you and myself ;)

But,

> Since rmb() happens before run_filters, isn't the ACCESS_ONCE
> sufficient?

Yes. But see above. ACCESS_ONCE is sufficient if we read the first filter
installed by another thread, in this case rmb() pairs with mb_before_atomic()
before set_bit(TIF_SECCOMP).

IOW, if this threads sees TIF_SECCOMP, it should also see all modifications
which were done before set_bit, including the data in ->filter points to.

> We only care that TIF_SECCOMP, mode, and some filter is
> valid. In a tsync thread race, it's okay to use not use the deepest
> filter node in the list,

Yes, it is fine if we miss yet another filter which was just installed by
another thread.

But, unless I missed something, the problem is that we can get this new
filter.

Just to simplify. Suppose TIF_SECCOMP was set a long ago. This thread
has a single filter F1 and it enters seccomp_run_filters().

Right before it does ACCESS_ONCE() to read the pointer, another thread
does seccomp_sync_threads() and sets .filter = F2.

If ACCESS_ONCE() returns F1 - everything is fine. But it can see the new
pointer F2, and in this case we need a barrier to ensure that, say,
LOAD(F2->prog) will see all the preceding changes in this memory.

Oleg.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v9 09/11] seccomp: introduce writer locking
@ 2014-07-10 15:24           ` Oleg Nesterov
  0 siblings, 0 replies; 59+ messages in thread
From: Oleg Nesterov @ 2014-07-10 15:24 UTC (permalink / raw)
  To: Kees Cook
  Cc: LKML, Andy Lutomirski, Michael Kerrisk (man-pages),
	Alexei Starovoitov, Andrew Morton, Daniel Borkmann, Will Drewry,
	Julien Tinnes, David Drysdale, Linux API, x86, linux-arm-kernel,
	linux-mips, linux-arch, linux-security-module

On 07/10, Kees Cook wrote:
>
> On Wed, Jul 9, 2014 at 11:55 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> > On 07/09, Oleg Nesterov wrote:
> >>
> >> On 06/27, Kees Cook wrote:
> >> >
> >> >  static u32 seccomp_run_filters(int syscall)
> >> >  {
> >> > -   struct seccomp_filter *f;
> >> > +   struct seccomp_filter *f = ACCESS_ONCE(current->seccomp.filter);
> >>
> >> I am not sure...
> >>
> >> This is fine if this ->filter is the 1st (and only) one, in this case
> >> we can rely on rmb() in the caller.
> >>
> >> But the new filter can be installed at any moment. Say, right after that
> >> rmb() although this doesn't matter. Either we need smp_read_barrier_depends()
> >> after that, or smp_load_acquire() like the previous version did?
> >
> > Wait... and it seems that seccomp_sync_threads() needs smp_store_release()
> > when it sets thread->filter = current->filter by the same reason?
> >
> > OTOH. smp_store_release() in seccomp_attach_filter() can die, "current"
> > doesn't need a barrier to serialize with itself.
>
> I have lost track of what you're suggesting to change. :)

Perhaps I am just trying to confuse you and myself ;)

But,

> Since rmb() happens before run_filters, isn't the ACCESS_ONCE
> sufficient?

Yes. But see above. ACCESS_ONCE is sufficient if we read the first filter
installed by another thread, in this case rmb() pairs with mb_before_atomic()
before set_bit(TIF_SECCOMP).

IOW, if this threads sees TIF_SECCOMP, it should also see all modifications
which were done before set_bit, including the data in ->filter points to.

> We only care that TIF_SECCOMP, mode, and some filter is
> valid. In a tsync thread race, it's okay to use not use the deepest
> filter node in the list,

Yes, it is fine if we miss yet another filter which was just installed by
another thread.

But, unless I missed something, the problem is that we can get this new
filter.

Just to simplify. Suppose TIF_SECCOMP was set a long ago. This thread
has a single filter F1 and it enters seccomp_run_filters().

Right before it does ACCESS_ONCE() to read the pointer, another thread
does seccomp_sync_threads() and sets .filter = F2.

If ACCESS_ONCE() returns F1 - everything is fine. But it can see the new
pointer F2, and in this case we need a barrier to ensure that, say,
LOAD(F2->prog) will see all the preceding changes in this memory.

Oleg.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v9 09/11] seccomp: introduce writer locking
@ 2014-07-10 15:24           ` Oleg Nesterov
  0 siblings, 0 replies; 59+ messages in thread
From: Oleg Nesterov @ 2014-07-10 15:24 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/10, Kees Cook wrote:
>
> On Wed, Jul 9, 2014 at 11:55 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> > On 07/09, Oleg Nesterov wrote:
> >>
> >> On 06/27, Kees Cook wrote:
> >> >
> >> >  static u32 seccomp_run_filters(int syscall)
> >> >  {
> >> > -   struct seccomp_filter *f;
> >> > +   struct seccomp_filter *f = ACCESS_ONCE(current->seccomp.filter);
> >>
> >> I am not sure...
> >>
> >> This is fine if this ->filter is the 1st (and only) one, in this case
> >> we can rely on rmb() in the caller.
> >>
> >> But the new filter can be installed at any moment. Say, right after that
> >> rmb() although this doesn't matter. Either we need smp_read_barrier_depends()
> >> after that, or smp_load_acquire() like the previous version did?
> >
> > Wait... and it seems that seccomp_sync_threads() needs smp_store_release()
> > when it sets thread->filter = current->filter by the same reason?
> >
> > OTOH. smp_store_release() in seccomp_attach_filter() can die, "current"
> > doesn't need a barrier to serialize with itself.
>
> I have lost track of what you're suggesting to change. :)

Perhaps I am just trying to confuse you and myself ;)

But,

> Since rmb() happens before run_filters, isn't the ACCESS_ONCE
> sufficient?

Yes. But see above. ACCESS_ONCE is sufficient if we read the first filter
installed by another thread, in this case rmb() pairs with mb_before_atomic()
before set_bit(TIF_SECCOMP).

IOW, if this threads sees TIF_SECCOMP, it should also see all modifications
which were done before set_bit, including the data in ->filter points to.

> We only care that TIF_SECCOMP, mode, and some filter is
> valid. In a tsync thread race, it's okay to use not use the deepest
> filter node in the list,

Yes, it is fine if we miss yet another filter which was just installed by
another thread.

But, unless I missed something, the problem is that we can get this new
filter.

Just to simplify. Suppose TIF_SECCOMP was set a long ago. This thread
has a single filter F1 and it enters seccomp_run_filters().

Right before it does ACCESS_ONCE() to read the pointer, another thread
does seccomp_sync_threads() and sets .filter = F2.

If ACCESS_ONCE() returns F1 - everything is fine. But it can see the new
pointer F2, and in this case we need a barrier to ensure that, say,
LOAD(F2->prog) will see all the preceding changes in this memory.

Oleg.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v9 11/11] seccomp: implement SECCOMP_FILTER_FLAG_TSYNC
@ 2014-07-10 16:03           ` Kees Cook
  0 siblings, 0 replies; 59+ messages in thread
From: Kees Cook @ 2014-07-10 16:03 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: LKML, Andy Lutomirski, Michael Kerrisk (man-pages),
	Alexei Starovoitov, Andrew Morton, Daniel Borkmann, Will Drewry,
	Julien Tinnes, David Drysdale, Linux API, x86, linux-arm-kernel,
	linux-mips, linux-arch, linux-security-module

On Thu, Jul 10, 2014 at 8:08 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 07/10, Kees Cook wrote:
>>
>> On Wed, Jul 9, 2014 at 11:05 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>> >
>> >> +     /*
>> >> +      * Make sure we cannot change seccomp or nnp state via TSYNC
>> >> +      * while another thread is in the middle of calling exec.
>> >> +      */
>> >> +     if (flags & SECCOMP_FILTER_FLAG_TSYNC &&
>> >> +         mutex_lock_killable(&current->signal->cred_guard_mutex))
>> >> +             goto out_free;
>> >
>> > -EINVAL looks a bit confusing in this case, but this is cosemtic because
>> > userspace won't see this error-code anyway.
>>
>> Happy to use whatever since, as you say, it's cosmetic. Perhaps -EAGAIN?
>
> Or -EINTR. I do not really mind, I only mentioned this because I had another
> nit.
>
>> >>       spin_lock_irq(&current->sighand->siglock);
>> >> +     if (unlikely(signal_group_exit(current->signal))) {
>> >> +             /* If thread is dying, return to process the signal. */
>> >
>> > OK, this doesn't hurt, but why?
>> >
>> > You could check __fatal_signal_pending() with the same effect. And since
>> > we hold this mutex, exec (de_thread) can be the source of that SIGKILL.
>> > We take this mutex specially to avoid the race with exec.
>> >
>> > So why do we need to abort if we race with kill() or exit_grouo() ?
>>
>> In my initial code inspection that we could block waiting for the
>> cred_guard mutex, with exec holding it, exec would schedule death in
>> de_thread, and then once it released, the tsync thread would try to
>> keep running.
>>
>> However, in looking at this again, now I'm concerned this produces a
>> dead-lock in de_thread, since it waits for all threads to actually
>> die, but tsync will be waiting with the killable mutex.
>
> That is why you should always use _killable (or _interruptible) if you
> want to take ->cred_guard_mutex.
>
> If this thread races with de_thread() which holds this mutex, it will
> be killed and mutex_lock_killable() will fail.
>
> (to clarify; this deadlock is not "fatal", de_thread() can be killed too,
>  but this doesn't really matter).
>
>> So I think I got too defensive when I read the top of de_thread where
>> it checks for pending signals itself.
>>
>> It seems like I can just safely remove the singal_group_exit checks?
>> The other paths (non-tsync seccomp_set_mode_filter, and
>> seccomp_set_mode_strict)
>
> Yes, I missed another signal_group_exit() in seccomp_set_mode_strict().
> It looks equally unneeded.
>
>> I can't decide which feels cleaner: just letting stuff
>> clean up naturally on death or to short-circuit after taking
>> sighand->siglock.
>
> I'd prefer to simply remove the singal_group_exit checks.
>
> I won't argue if you prefer to keep them, but then please add a comment
> to explain that this is not needed for correctness.
>
> Because otherwise the code looks confusing, as if there is a subtle reason
> why we must not do this if killed.

Sounds good! I'll clean it all up for v10.

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v9 11/11] seccomp: implement SECCOMP_FILTER_FLAG_TSYNC
@ 2014-07-10 16:03           ` Kees Cook
  0 siblings, 0 replies; 59+ messages in thread
From: Kees Cook @ 2014-07-10 16:03 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: LKML, Andy Lutomirski, Michael Kerrisk (man-pages),
	Alexei Starovoitov, Andrew Morton, Daniel Borkmann, Will Drewry,
	Julien Tinnes, David Drysdale, Linux API,
	x86-DgEjT+Ai2ygdnm+yROfE0A,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-mips-6z/3iImG2C8G8FEW9MqTrA, linux-arch,
	linux-security-module

On Thu, Jul 10, 2014 at 8:08 AM, Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> On 07/10, Kees Cook wrote:
>>
>> On Wed, Jul 9, 2014 at 11:05 AM, Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>> >
>> >> +     /*
>> >> +      * Make sure we cannot change seccomp or nnp state via TSYNC
>> >> +      * while another thread is in the middle of calling exec.
>> >> +      */
>> >> +     if (flags & SECCOMP_FILTER_FLAG_TSYNC &&
>> >> +         mutex_lock_killable(&current->signal->cred_guard_mutex))
>> >> +             goto out_free;
>> >
>> > -EINVAL looks a bit confusing in this case, but this is cosemtic because
>> > userspace won't see this error-code anyway.
>>
>> Happy to use whatever since, as you say, it's cosmetic. Perhaps -EAGAIN?
>
> Or -EINTR. I do not really mind, I only mentioned this because I had another
> nit.
>
>> >>       spin_lock_irq(&current->sighand->siglock);
>> >> +     if (unlikely(signal_group_exit(current->signal))) {
>> >> +             /* If thread is dying, return to process the signal. */
>> >
>> > OK, this doesn't hurt, but why?
>> >
>> > You could check __fatal_signal_pending() with the same effect. And since
>> > we hold this mutex, exec (de_thread) can be the source of that SIGKILL.
>> > We take this mutex specially to avoid the race with exec.
>> >
>> > So why do we need to abort if we race with kill() or exit_grouo() ?
>>
>> In my initial code inspection that we could block waiting for the
>> cred_guard mutex, with exec holding it, exec would schedule death in
>> de_thread, and then once it released, the tsync thread would try to
>> keep running.
>>
>> However, in looking at this again, now I'm concerned this produces a
>> dead-lock in de_thread, since it waits for all threads to actually
>> die, but tsync will be waiting with the killable mutex.
>
> That is why you should always use _killable (or _interruptible) if you
> want to take ->cred_guard_mutex.
>
> If this thread races with de_thread() which holds this mutex, it will
> be killed and mutex_lock_killable() will fail.
>
> (to clarify; this deadlock is not "fatal", de_thread() can be killed too,
>  but this doesn't really matter).
>
>> So I think I got too defensive when I read the top of de_thread where
>> it checks for pending signals itself.
>>
>> It seems like I can just safely remove the singal_group_exit checks?
>> The other paths (non-tsync seccomp_set_mode_filter, and
>> seccomp_set_mode_strict)
>
> Yes, I missed another signal_group_exit() in seccomp_set_mode_strict().
> It looks equally unneeded.
>
>> I can't decide which feels cleaner: just letting stuff
>> clean up naturally on death or to short-circuit after taking
>> sighand->siglock.
>
> I'd prefer to simply remove the singal_group_exit checks.
>
> I won't argue if you prefer to keep them, but then please add a comment
> to explain that this is not needed for correctness.
>
> Because otherwise the code looks confusing, as if there is a subtle reason
> why we must not do this if killed.

Sounds good! I'll clean it all up for v10.

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v9 11/11] seccomp: implement SECCOMP_FILTER_FLAG_TSYNC
@ 2014-07-10 16:03           ` Kees Cook
  0 siblings, 0 replies; 59+ messages in thread
From: Kees Cook @ 2014-07-10 16:03 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: LKML, Andy Lutomirski, Michael Kerrisk (man-pages),
	Alexei Starovoitov, Andrew Morton, Daniel Borkmann, Will Drewry,
	Julien Tinnes, David Drysdale, Linux API, x86, linux-arm-kernel,
	linux-mips, linux-arch, linux-security-module

On Thu, Jul 10, 2014 at 8:08 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 07/10, Kees Cook wrote:
>>
>> On Wed, Jul 9, 2014 at 11:05 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>> >
>> >> +     /*
>> >> +      * Make sure we cannot change seccomp or nnp state via TSYNC
>> >> +      * while another thread is in the middle of calling exec.
>> >> +      */
>> >> +     if (flags & SECCOMP_FILTER_FLAG_TSYNC &&
>> >> +         mutex_lock_killable(&current->signal->cred_guard_mutex))
>> >> +             goto out_free;
>> >
>> > -EINVAL looks a bit confusing in this case, but this is cosemtic because
>> > userspace won't see this error-code anyway.
>>
>> Happy to use whatever since, as you say, it's cosmetic. Perhaps -EAGAIN?
>
> Or -EINTR. I do not really mind, I only mentioned this because I had another
> nit.
>
>> >>       spin_lock_irq(&current->sighand->siglock);
>> >> +     if (unlikely(signal_group_exit(current->signal))) {
>> >> +             /* If thread is dying, return to process the signal. */
>> >
>> > OK, this doesn't hurt, but why?
>> >
>> > You could check __fatal_signal_pending() with the same effect. And since
>> > we hold this mutex, exec (de_thread) can be the source of that SIGKILL.
>> > We take this mutex specially to avoid the race with exec.
>> >
>> > So why do we need to abort if we race with kill() or exit_grouo() ?
>>
>> In my initial code inspection that we could block waiting for the
>> cred_guard mutex, with exec holding it, exec would schedule death in
>> de_thread, and then once it released, the tsync thread would try to
>> keep running.
>>
>> However, in looking at this again, now I'm concerned this produces a
>> dead-lock in de_thread, since it waits for all threads to actually
>> die, but tsync will be waiting with the killable mutex.
>
> That is why you should always use _killable (or _interruptible) if you
> want to take ->cred_guard_mutex.
>
> If this thread races with de_thread() which holds this mutex, it will
> be killed and mutex_lock_killable() will fail.
>
> (to clarify; this deadlock is not "fatal", de_thread() can be killed too,
>  but this doesn't really matter).
>
>> So I think I got too defensive when I read the top of de_thread where
>> it checks for pending signals itself.
>>
>> It seems like I can just safely remove the singal_group_exit checks?
>> The other paths (non-tsync seccomp_set_mode_filter, and
>> seccomp_set_mode_strict)
>
> Yes, I missed another signal_group_exit() in seccomp_set_mode_strict().
> It looks equally unneeded.
>
>> I can't decide which feels cleaner: just letting stuff
>> clean up naturally on death or to short-circuit after taking
>> sighand->siglock.
>
> I'd prefer to simply remove the singal_group_exit checks.
>
> I won't argue if you prefer to keep them, but then please add a comment
> to explain that this is not needed for correctness.
>
> Because otherwise the code looks confusing, as if there is a subtle reason
> why we must not do this if killed.

Sounds good! I'll clean it all up for v10.

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v9 11/11] seccomp: implement SECCOMP_FILTER_FLAG_TSYNC
@ 2014-07-10 16:03           ` Kees Cook
  0 siblings, 0 replies; 59+ messages in thread
From: Kees Cook @ 2014-07-10 16:03 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jul 10, 2014 at 8:08 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 07/10, Kees Cook wrote:
>>
>> On Wed, Jul 9, 2014 at 11:05 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>> >
>> >> +     /*
>> >> +      * Make sure we cannot change seccomp or nnp state via TSYNC
>> >> +      * while another thread is in the middle of calling exec.
>> >> +      */
>> >> +     if (flags & SECCOMP_FILTER_FLAG_TSYNC &&
>> >> +         mutex_lock_killable(&current->signal->cred_guard_mutex))
>> >> +             goto out_free;
>> >
>> > -EINVAL looks a bit confusing in this case, but this is cosemtic because
>> > userspace won't see this error-code anyway.
>>
>> Happy to use whatever since, as you say, it's cosmetic. Perhaps -EAGAIN?
>
> Or -EINTR. I do not really mind, I only mentioned this because I had another
> nit.
>
>> >>       spin_lock_irq(&current->sighand->siglock);
>> >> +     if (unlikely(signal_group_exit(current->signal))) {
>> >> +             /* If thread is dying, return to process the signal. */
>> >
>> > OK, this doesn't hurt, but why?
>> >
>> > You could check __fatal_signal_pending() with the same effect. And since
>> > we hold this mutex, exec (de_thread) can be the source of that SIGKILL.
>> > We take this mutex specially to avoid the race with exec.
>> >
>> > So why do we need to abort if we race with kill() or exit_grouo() ?
>>
>> In my initial code inspection that we could block waiting for the
>> cred_guard mutex, with exec holding it, exec would schedule death in
>> de_thread, and then once it released, the tsync thread would try to
>> keep running.
>>
>> However, in looking at this again, now I'm concerned this produces a
>> dead-lock in de_thread, since it waits for all threads to actually
>> die, but tsync will be waiting with the killable mutex.
>
> That is why you should always use _killable (or _interruptible) if you
> want to take ->cred_guard_mutex.
>
> If this thread races with de_thread() which holds this mutex, it will
> be killed and mutex_lock_killable() will fail.
>
> (to clarify; this deadlock is not "fatal", de_thread() can be killed too,
>  but this doesn't really matter).
>
>> So I think I got too defensive when I read the top of de_thread where
>> it checks for pending signals itself.
>>
>> It seems like I can just safely remove the singal_group_exit checks?
>> The other paths (non-tsync seccomp_set_mode_filter, and
>> seccomp_set_mode_strict)
>
> Yes, I missed another signal_group_exit() in seccomp_set_mode_strict().
> It looks equally unneeded.
>
>> I can't decide which feels cleaner: just letting stuff
>> clean up naturally on death or to short-circuit after taking
>> sighand->siglock.
>
> I'd prefer to simply remove the singal_group_exit checks.
>
> I won't argue if you prefer to keep them, but then please add a comment
> to explain that this is not needed for correctness.
>
> Because otherwise the code looks confusing, as if there is a subtle reason
> why we must not do this if killed.

Sounds good! I'll clean it all up for v10.

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v9 09/11] seccomp: introduce writer locking
  2014-07-10 15:24           ` Oleg Nesterov
  (?)
@ 2014-07-10 16:54             ` Kees Cook
  -1 siblings, 0 replies; 59+ messages in thread
From: Kees Cook @ 2014-07-10 16:54 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-arch, linux-mips, Will Drewry, linux-security-module,
	Linux API, x86, LKML, Andy Lutomirski, Daniel Borkmann,
	Julien Tinnes, Michael Kerrisk (man-pages),
	Andrew Morton, David Drysdale, linux-arm-kernel,
	Alexei Starovoitov

On Thu, Jul 10, 2014 at 8:24 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 07/10, Kees Cook wrote:
>>
>> On Wed, Jul 9, 2014 at 11:55 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>> > On 07/09, Oleg Nesterov wrote:
>> >>
>> >> On 06/27, Kees Cook wrote:
>> >> >
>> >> >  static u32 seccomp_run_filters(int syscall)
>> >> >  {
>> >> > -   struct seccomp_filter *f;
>> >> > +   struct seccomp_filter *f = ACCESS_ONCE(current->seccomp.filter);
>> >>
>> >> I am not sure...
>> >>
>> >> This is fine if this ->filter is the 1st (and only) one, in this case
>> >> we can rely on rmb() in the caller.
>> >>
>> >> But the new filter can be installed at any moment. Say, right after that
>> >> rmb() although this doesn't matter. Either we need smp_read_barrier_depends()
>> >> after that, or smp_load_acquire() like the previous version did?
>> >
>> > Wait... and it seems that seccomp_sync_threads() needs smp_store_release()
>> > when it sets thread->filter = current->filter by the same reason?
>> >
>> > OTOH. smp_store_release() in seccomp_attach_filter() can die, "current"
>> > doesn't need a barrier to serialize with itself.
>>
>> I have lost track of what you're suggesting to change. :)
>
> Perhaps I am just trying to confuse you and myself ;)
>
> But,
>
>> Since rmb() happens before run_filters, isn't the ACCESS_ONCE
>> sufficient?
>
> Yes. But see above. ACCESS_ONCE is sufficient if we read the first filter
> installed by another thread, in this case rmb() pairs with mb_before_atomic()
> before set_bit(TIF_SECCOMP).
>
> IOW, if this threads sees TIF_SECCOMP, it should also see all modifications
> which were done before set_bit, including the data in ->filter points to.
>
>> We only care that TIF_SECCOMP, mode, and some filter is
>> valid. In a tsync thread race, it's okay to use not use the deepest
>> filter node in the list,
>
> Yes, it is fine if we miss yet another filter which was just installed by
> another thread.
>
> But, unless I missed something, the problem is that we can get this new
> filter.
>
> Just to simplify. Suppose TIF_SECCOMP was set a long ago. This thread
> has a single filter F1 and it enters seccomp_run_filters().
>
> Right before it does ACCESS_ONCE() to read the pointer, another thread
> does seccomp_sync_threads() and sets .filter = F2.
>
> If ACCESS_ONCE() returns F1 - everything is fine. But it can see the new
> pointer F2, and in this case we need a barrier to ensure that, say,
> LOAD(F2->prog) will see all the preceding changes in this memory.

And the rmb() isn't sufficient for that? Is another barrier needed
before assigning the filter pointer to make sure the contents it
points to are flushed?

What's the least time-consuming operation I can use in run_filters?

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v9 09/11] seccomp: introduce writer locking
@ 2014-07-10 16:54             ` Kees Cook
  0 siblings, 0 replies; 59+ messages in thread
From: Kees Cook @ 2014-07-10 16:54 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-arch, linux-mips, Will Drewry, linux-security-module,
	Linux API, x86, LKML, Andy Lutomirski, Daniel Borkmann,
	Julien Tinnes, Michael Kerrisk (man-pages),
	Andrew Morton, David Drysdale, linux-arm-kernel,
	Alexei Starovoitov

On Thu, Jul 10, 2014 at 8:24 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 07/10, Kees Cook wrote:
>>
>> On Wed, Jul 9, 2014 at 11:55 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>> > On 07/09, Oleg Nesterov wrote:
>> >>
>> >> On 06/27, Kees Cook wrote:
>> >> >
>> >> >  static u32 seccomp_run_filters(int syscall)
>> >> >  {
>> >> > -   struct seccomp_filter *f;
>> >> > +   struct seccomp_filter *f = ACCESS_ONCE(current->seccomp.filter);
>> >>
>> >> I am not sure...
>> >>
>> >> This is fine if this ->filter is the 1st (and only) one, in this case
>> >> we can rely on rmb() in the caller.
>> >>
>> >> But the new filter can be installed at any moment. Say, right after that
>> >> rmb() although this doesn't matter. Either we need smp_read_barrier_depends()
>> >> after that, or smp_load_acquire() like the previous version did?
>> >
>> > Wait... and it seems that seccomp_sync_threads() needs smp_store_release()
>> > when it sets thread->filter = current->filter by the same reason?
>> >
>> > OTOH. smp_store_release() in seccomp_attach_filter() can die, "current"
>> > doesn't need a barrier to serialize with itself.
>>
>> I have lost track of what you're suggesting to change. :)
>
> Perhaps I am just trying to confuse you and myself ;)
>
> But,
>
>> Since rmb() happens before run_filters, isn't the ACCESS_ONCE
>> sufficient?
>
> Yes. But see above. ACCESS_ONCE is sufficient if we read the first filter
> installed by another thread, in this case rmb() pairs with mb_before_atomic()
> before set_bit(TIF_SECCOMP).
>
> IOW, if this threads sees TIF_SECCOMP, it should also see all modifications
> which were done before set_bit, including the data in ->filter points to.
>
>> We only care that TIF_SECCOMP, mode, and some filter is
>> valid. In a tsync thread race, it's okay to use not use the deepest
>> filter node in the list,
>
> Yes, it is fine if we miss yet another filter which was just installed by
> another thread.
>
> But, unless I missed something, the problem is that we can get this new
> filter.
>
> Just to simplify. Suppose TIF_SECCOMP was set a long ago. This thread
> has a single filter F1 and it enters seccomp_run_filters().
>
> Right before it does ACCESS_ONCE() to read the pointer, another thread
> does seccomp_sync_threads() and sets .filter = F2.
>
> If ACCESS_ONCE() returns F1 - everything is fine. But it can see the new
> pointer F2, and in this case we need a barrier to ensure that, say,
> LOAD(F2->prog) will see all the preceding changes in this memory.

And the rmb() isn't sufficient for that? Is another barrier needed
before assigning the filter pointer to make sure the contents it
points to are flushed?

What's the least time-consuming operation I can use in run_filters?

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v9 09/11] seccomp: introduce writer locking
@ 2014-07-10 16:54             ` Kees Cook
  0 siblings, 0 replies; 59+ messages in thread
From: Kees Cook @ 2014-07-10 16:54 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jul 10, 2014 at 8:24 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> On 07/10, Kees Cook wrote:
>>
>> On Wed, Jul 9, 2014 at 11:55 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>> > On 07/09, Oleg Nesterov wrote:
>> >>
>> >> On 06/27, Kees Cook wrote:
>> >> >
>> >> >  static u32 seccomp_run_filters(int syscall)
>> >> >  {
>> >> > -   struct seccomp_filter *f;
>> >> > +   struct seccomp_filter *f = ACCESS_ONCE(current->seccomp.filter);
>> >>
>> >> I am not sure...
>> >>
>> >> This is fine if this ->filter is the 1st (and only) one, in this case
>> >> we can rely on rmb() in the caller.
>> >>
>> >> But the new filter can be installed at any moment. Say, right after that
>> >> rmb() although this doesn't matter. Either we need smp_read_barrier_depends()
>> >> after that, or smp_load_acquire() like the previous version did?
>> >
>> > Wait... and it seems that seccomp_sync_threads() needs smp_store_release()
>> > when it sets thread->filter = current->filter by the same reason?
>> >
>> > OTOH. smp_store_release() in seccomp_attach_filter() can die, "current"
>> > doesn't need a barrier to serialize with itself.
>>
>> I have lost track of what you're suggesting to change. :)
>
> Perhaps I am just trying to confuse you and myself ;)
>
> But,
>
>> Since rmb() happens before run_filters, isn't the ACCESS_ONCE
>> sufficient?
>
> Yes. But see above. ACCESS_ONCE is sufficient if we read the first filter
> installed by another thread, in this case rmb() pairs with mb_before_atomic()
> before set_bit(TIF_SECCOMP).
>
> IOW, if this threads sees TIF_SECCOMP, it should also see all modifications
> which were done before set_bit, including the data in ->filter points to.
>
>> We only care that TIF_SECCOMP, mode, and some filter is
>> valid. In a tsync thread race, it's okay to use not use the deepest
>> filter node in the list,
>
> Yes, it is fine if we miss yet another filter which was just installed by
> another thread.
>
> But, unless I missed something, the problem is that we can get this new
> filter.
>
> Just to simplify. Suppose TIF_SECCOMP was set a long ago. This thread
> has a single filter F1 and it enters seccomp_run_filters().
>
> Right before it does ACCESS_ONCE() to read the pointer, another thread
> does seccomp_sync_threads() and sets .filter = F2.
>
> If ACCESS_ONCE() returns F1 - everything is fine. But it can see the new
> pointer F2, and in this case we need a barrier to ensure that, say,
> LOAD(F2->prog) will see all the preceding changes in this memory.

And the rmb() isn't sufficient for that? Is another barrier needed
before assigning the filter pointer to make sure the contents it
points to are flushed?

What's the least time-consuming operation I can use in run_filters?

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v9 09/11] seccomp: introduce writer locking
  2014-07-10 16:54             ` Kees Cook
  (?)
@ 2014-07-10 17:35               ` Oleg Nesterov
  -1 siblings, 0 replies; 59+ messages in thread
From: Oleg Nesterov @ 2014-07-10 17:35 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-arch, linux-mips, Will Drewry, linux-security-module,
	Linux API, x86, LKML, Andy Lutomirski, Daniel Borkmann,
	Julien Tinnes, Michael Kerrisk (man-pages),
	Andrew Morton, David Drysdale, linux-arm-kernel,
	Alexei Starovoitov

On 07/10, Kees Cook wrote:
>
> On Thu, Jul 10, 2014 at 8:24 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > Just to simplify. Suppose TIF_SECCOMP was set a long ago. This thread
> > has a single filter F1 and it enters seccomp_run_filters().
> >
> > Right before it does ACCESS_ONCE() to read the pointer, another thread
> > does seccomp_sync_threads() and sets .filter = F2.
> >
> > If ACCESS_ONCE() returns F1 - everything is fine. But it can see the new
> > pointer F2, and in this case we need a barrier to ensure that, say,
> > LOAD(F2->prog) will see all the preceding changes in this memory.
>
> And the rmb() isn't sufficient for that?

But it has no effect if the pointer was changed _after_ rmb() was already
called.

And, you need a barrier _after_ ACCESS_ONCE().

(Unless, again, we know that this is the first filter, but this is only
 by accident).

> Is another barrier needed
> before assigning the filter pointer to make sure the contents it
> points to are flushed?

I think smp_store_release() should be moved from seccomp_attach_filter()
to seccomp_sync_threads(). Although probably it _should_ work either way,
but at least this looks confusing because a) "current" doesn't need a
barrier to serialize wuth itself, and b) it is not clear why it is safe
to change the pointer dereferenced by another thread without a barrier.

> What's the least time-consuming operation I can use in run_filters?

As I said smp_read_barrier_depends() (nop unless alpha) or
smp_load_acquire() which you used in the previous version.

And to remind, afaics smp_load_acquire() in put_filter() should die ;)

Oleg.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v9 09/11] seccomp: introduce writer locking
@ 2014-07-10 17:35               ` Oleg Nesterov
  0 siblings, 0 replies; 59+ messages in thread
From: Oleg Nesterov @ 2014-07-10 17:35 UTC (permalink / raw)
  To: Kees Cook
  Cc: linux-arch, linux-mips, Will Drewry, linux-security-module,
	Linux API, x86, LKML, Andy Lutomirski, Daniel Borkmann,
	Julien Tinnes, Michael Kerrisk (man-pages),
	Andrew Morton, David Drysdale, linux-arm-kernel,
	Alexei Starovoitov

On 07/10, Kees Cook wrote:
>
> On Thu, Jul 10, 2014 at 8:24 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > Just to simplify. Suppose TIF_SECCOMP was set a long ago. This thread
> > has a single filter F1 and it enters seccomp_run_filters().
> >
> > Right before it does ACCESS_ONCE() to read the pointer, another thread
> > does seccomp_sync_threads() and sets .filter = F2.
> >
> > If ACCESS_ONCE() returns F1 - everything is fine. But it can see the new
> > pointer F2, and in this case we need a barrier to ensure that, say,
> > LOAD(F2->prog) will see all the preceding changes in this memory.
>
> And the rmb() isn't sufficient for that?

But it has no effect if the pointer was changed _after_ rmb() was already
called.

And, you need a barrier _after_ ACCESS_ONCE().

(Unless, again, we know that this is the first filter, but this is only
 by accident).

> Is another barrier needed
> before assigning the filter pointer to make sure the contents it
> points to are flushed?

I think smp_store_release() should be moved from seccomp_attach_filter()
to seccomp_sync_threads(). Although probably it _should_ work either way,
but at least this looks confusing because a) "current" doesn't need a
barrier to serialize wuth itself, and b) it is not clear why it is safe
to change the pointer dereferenced by another thread without a barrier.

> What's the least time-consuming operation I can use in run_filters?

As I said smp_read_barrier_depends() (nop unless alpha) or
smp_load_acquire() which you used in the previous version.

And to remind, afaics smp_load_acquire() in put_filter() should die ;)

Oleg.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v9 09/11] seccomp: introduce writer locking
@ 2014-07-10 17:35               ` Oleg Nesterov
  0 siblings, 0 replies; 59+ messages in thread
From: Oleg Nesterov @ 2014-07-10 17:35 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/10, Kees Cook wrote:
>
> On Thu, Jul 10, 2014 at 8:24 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > Just to simplify. Suppose TIF_SECCOMP was set a long ago. This thread
> > has a single filter F1 and it enters seccomp_run_filters().
> >
> > Right before it does ACCESS_ONCE() to read the pointer, another thread
> > does seccomp_sync_threads() and sets .filter = F2.
> >
> > If ACCESS_ONCE() returns F1 - everything is fine. But it can see the new
> > pointer F2, and in this case we need a barrier to ensure that, say,
> > LOAD(F2->prog) will see all the preceding changes in this memory.
>
> And the rmb() isn't sufficient for that?

But it has no effect if the pointer was changed _after_ rmb() was already
called.

And, you need a barrier _after_ ACCESS_ONCE().

(Unless, again, we know that this is the first filter, but this is only
 by accident).

> Is another barrier needed
> before assigning the filter pointer to make sure the contents it
> points to are flushed?

I think smp_store_release() should be moved from seccomp_attach_filter()
to seccomp_sync_threads(). Although probably it _should_ work either way,
but at least this looks confusing because a) "current" doesn't need a
barrier to serialize wuth itself, and b) it is not clear why it is safe
to change the pointer dereferenced by another thread without a barrier.

> What's the least time-consuming operation I can use in run_filters?

As I said smp_read_barrier_depends() (nop unless alpha) or
smp_load_acquire() which you used in the previous version.

And to remind, afaics smp_load_acquire() in put_filter() should die ;)

Oleg.

^ permalink raw reply	[flat|nested] 59+ messages in thread

end of thread, other threads:[~2014-07-10 17:38 UTC | newest]

Thread overview: 59+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-27 23:22 [PATCH v9 0/11] seccomp: add thread sync ability Kees Cook
2014-06-27 23:22 ` Kees Cook
2014-06-27 23:22 ` [PATCH v9 01/11] seccomp: create internal mode-setting function Kees Cook
2014-06-27 23:22   ` Kees Cook
2014-06-27 23:22 ` [PATCH v9 02/11] seccomp: extract check/assign mode helpers Kees Cook
2014-06-27 23:22   ` Kees Cook
2014-06-27 23:22 ` [PATCH v9 03/11] seccomp: split mode setting routines Kees Cook
2014-06-27 23:22   ` Kees Cook
2014-06-27 23:22   ` Kees Cook
2014-06-27 23:22 ` [PATCH v9 04/11] seccomp: add "seccomp" syscall Kees Cook
2014-06-27 23:22   ` Kees Cook
2014-06-27 23:22 ` [PATCH v9 05/11] ARM: add seccomp syscall Kees Cook
2014-06-27 23:22   ` Kees Cook
2014-06-27 23:22   ` Kees Cook
2014-06-27 23:22 ` [PATCH v9 06/11] MIPS: " Kees Cook
2014-06-27 23:22   ` Kees Cook
2014-06-27 23:22 ` [PATCH v9 07/11] sched: move no_new_privs into new atomic flags Kees Cook
2014-06-27 23:22   ` Kees Cook
2014-06-27 23:22 ` [PATCH v9 08/11] seccomp: split filter prep from check and apply Kees Cook
2014-06-27 23:22   ` Kees Cook
2014-06-27 23:22   ` Kees Cook
2014-06-27 23:22 ` [PATCH v9 09/11] seccomp: introduce writer locking Kees Cook
2014-06-27 23:22   ` Kees Cook
2014-07-09 18:42   ` Oleg Nesterov
2014-07-09 18:42     ` Oleg Nesterov
2014-07-09 18:55     ` Oleg Nesterov
2014-07-09 18:55       ` Oleg Nesterov
2014-07-10  9:25       ` Kees Cook
2014-07-10  9:25         ` Kees Cook
2014-07-10  9:25         ` Kees Cook
2014-07-10 15:24         ` Oleg Nesterov
2014-07-10 15:24           ` Oleg Nesterov
2014-07-10 15:24           ` Oleg Nesterov
2014-07-10 16:54           ` Kees Cook
2014-07-10 16:54             ` Kees Cook
2014-07-10 16:54             ` Kees Cook
2014-07-10 17:35             ` Oleg Nesterov
2014-07-10 17:35               ` Oleg Nesterov
2014-07-10 17:35               ` Oleg Nesterov
2014-07-09 18:59   ` Oleg Nesterov
2014-07-09 18:59     ` Oleg Nesterov
2014-06-27 23:22 ` [PATCH v9 10/11] seccomp: allow mode setting across threads Kees Cook
2014-06-27 23:22   ` Kees Cook
2014-06-27 23:22   ` Kees Cook
2014-06-27 23:23 ` [PATCH v9 11/11] seccomp: implement SECCOMP_FILTER_FLAG_TSYNC Kees Cook
2014-06-27 23:23   ` Kees Cook
2014-06-27 23:23   ` Kees Cook
2014-07-09 18:05   ` Oleg Nesterov
2014-07-09 18:05     ` Oleg Nesterov
2014-07-10  9:17     ` Kees Cook
2014-07-10  9:17       ` Kees Cook
2014-07-10  9:17       ` Kees Cook
2014-07-10 15:08       ` Oleg Nesterov
2014-07-10 15:08         ` Oleg Nesterov
2014-07-10 15:08         ` Oleg Nesterov
2014-07-10 16:03         ` Kees Cook
2014-07-10 16:03           ` Kees Cook
2014-07-10 16:03           ` Kees Cook
2014-07-10 16:03           ` Kees Cook

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.