linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] seccomp: Add a trace action that allows the syscall if ptracer absent
@ 2020-03-22  6:20 Keno Fischer
  2020-03-22 20:26 ` Andy Lutomirski
  0 siblings, 1 reply; 2+ messages in thread
From: Keno Fischer @ 2020-03-22  6:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Kees Cook, Andy Lutomirski, Will Drewry, Oleg Nesterov, robert

# Background

One usecase for seccomp that has become more common is as a sort-of
pre-filter for ptrace events. This is useful to reduce debugging
overhead by avoiding traps to the ptracer for events that the
ptracer is not interested in and is used extensively for this
purpose in modern ptrace clients such as the rr debugger [1].

As a simpler, but illustrative example, consider perhaps an
expression like `strace -e open`. An advanced strace implementation
could set a seccomp filter that causes a SECCOMP_RET_TRACE option
for open system calls only (or even only system calls involving a
particular file descriptor).

However, when used for this purpose, the persistent and inherited
nature of the seccomp filter can present a problem (though
perfectly sensible from the perspective of a security feature of
course). Of course the ptracer could simply attach itself to the
entire process tree, but this of course presents its own performance
challenges negating the originall performance advantage (e.g.
in the `strace` example above, strace may only want to trace its
original child).

The most desirable feature would probably be for the ptracer to have
the capability to remove a seccomp filter from a child, but I
understand that the design of seccomp as an immutable tree of filters
makes this difficult, not to mention that such a feature is perhaps
undesirable given seccomp's intention as a security feature (even
if guarded behind a non-default flag). This instead implements an
additional trace action, that I believe can achieve much the same
effect while being significantly less invasive:

# This patch

The default behavior of the `SECCOMP_RET_TRACE` is to fail the system
call with `ENOSYS` if no ptracer is attached to the process. This
patch adds a similar option `SECCOMP_RET_TRACE_ALLOW`, that behaves
identically to `SECCOMP_RET_TRACE` in all respects, except that it
allows the system call to go forward unmodified in the case that no
ptracer is attached (or that the ptracer does not have the seccomp
event enabled).

This remedies the situation discussed in the previous section by making
it irrelevant whether or not the seccomp filter is installed in
the children of the original initial task, as they will not see any
difference in behavior from the presence of a properly-implemented
seccomp filter as long as no ptracer is attached.

I think another possible use of an interface like this would be ptracers
that want to occaisionally attach to running processes for profiling
or logging purposes, but otherwise do not want to disturb the
functioning of the process.

It is not entirely satisfying of course, as the new behavior will
be re-activated by any new ptracer attaching to such a process,
even if such a ptracer is unaware of the presence of the seccomp filter.
Such a situation is probably fairly rare (esp. since the alternative
is just to keep every child ptrace-attached, which would prevent
other ptracers from attaching anyway), but it deserves pointing out
nonetheless.

# Alternatives considered

As discussed above, I considered proposing some mechanism for the
removal of seccomp filters after the fact, but rejected this approach
because it seemed to invasive and counter to the design intent of the
API.

Another idea I considered was allowing the BPF filter to access some
ptracer-writable metadata area that could be used to modify the behavior
of the filter after the fact (if enabled by a flag). I still think
this would potentially be a viable solution. However, I worried that
this would get too far into the realm of allowing eBPF seccomp filters,
which as far as I understand was previously rejected (though I did
not completely research the history of this suggestion). A solution
like this would of course address the shortcoming I mentioned above
by allowing the ptracer to render the filter truly inert.

Lastly I considered instead adding a field to the seccomp_data
structure that would indicate whether or not the ptracer was available
or not. However, that approached seemed a lot more likely to result
in user confusion and incorrect filters than simply adding an additional
action.

[1] https://github.com/mozilla/rr

Signed-Off-By: Keno Fischer <keno@juliacomputing.com>

---
 .../userspace-api/seccomp_filter.rst          |  5 ++++
 include/uapi/linux/seccomp.h                  |  1 +
 kernel/seccomp.c                              | 26 +++++++++++++------
 3 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/Documentation/userspace-api/seccomp_filter.rst b/Documentation/userspace-api/seccomp_filter.rst
index bd9165241b6c..95098c20c3db 100644
--- a/Documentation/userspace-api/seccomp_filter.rst
+++ b/Documentation/userspace-api/seccomp_filter.rst
@@ -151,6 +151,11 @@ In precedence order, they are:
 	allow use of ptrace, even of other sandboxed processes, without
 	extreme care; ptracers can use this mechanism to escape.)
 
+``SECCOMP_RET_TRACE_ALLOW``:
+	Like ``SECCOMP_RET_TRACE``, but if no tracer is present (or the tracer is
+	not listening to seccomp events), allow the syscall rather than returning
+	ENOSYS.
+
 ``SECCOMP_RET_LOG``:
 	Results in the system call being executed after it is logged. This
 	should be used by application developers to learn which syscalls their
diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
index be84d87f1f46..7d9997a98d06 100644
--- a/include/uapi/linux/seccomp.h
+++ b/include/uapi/linux/seccomp.h
@@ -39,6 +39,7 @@
 #define SECCOMP_RET_ERRNO	 0x00050000U /* returns an errno */
 #define SECCOMP_RET_USER_NOTIF	 0x7fc00000U /* notifies userspace */
 #define SECCOMP_RET_TRACE	 0x7ff00000U /* pass to a tracer or disallow */
+#define SECCOMP_RET_TRACE_ALLOW	 0x7ff30000U /* pass to a tracer or allow */
 #define SECCOMP_RET_LOG		 0x7ffc0000U /* allow after logging */
 #define SECCOMP_RET_ALLOW	 0x7fff0000U /* allow */
 
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index b6ea3dcb57bf..af7d38666b6c 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -623,6 +623,7 @@ static void seccomp_send_sigsys(int syscall, int reason)
 #define SECCOMP_LOG_LOG			(1 << 5)
 #define SECCOMP_LOG_ALLOW		(1 << 6)
 #define SECCOMP_LOG_USER_NOTIF		(1 << 7)
+#define SECCOMP_LOG_TRACE_ALLOW		(1 << 8)
 
 static u32 seccomp_actions_logged = SECCOMP_LOG_KILL_PROCESS |
 				    SECCOMP_LOG_KILL_THREAD  |
@@ -630,6 +631,7 @@ static u32 seccomp_actions_logged = SECCOMP_LOG_KILL_PROCESS |
 				    SECCOMP_LOG_ERRNO |
 				    SECCOMP_LOG_USER_NOTIF |
 				    SECCOMP_LOG_TRACE |
+				    SECCOMP_LOG_TRACE_ALLOW |
 				    SECCOMP_LOG_LOG;
 
 static inline void seccomp_log(unsigned long syscall, long signr, u32 action,
@@ -646,6 +648,7 @@ static inline void seccomp_log(unsigned long syscall, long signr, u32 action,
 	case SECCOMP_RET_ERRNO:
 		log = requested && seccomp_actions_logged & SECCOMP_LOG_ERRNO;
 		break;
+	case SECCOMP_RET_TRACE_ALLOW:
 	case SECCOMP_RET_TRACE:
 		log = requested && seccomp_actions_logged & SECCOMP_LOG_TRACE;
 		break;
@@ -832,6 +835,7 @@ static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd,
 		seccomp_send_sigsys(this_syscall, data);
 		goto skip;
 
+	case SECCOMP_RET_TRACE_ALLOW:
 	case SECCOMP_RET_TRACE:
 		/* We've been put in this state by the ptracer already. */
 		if (recheck_after_trace)
@@ -839,9 +843,12 @@ static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd,
 
 		/* ENOSYS these calls if there is no tracer attached. */
 		if (!ptrace_event_enabled(current, PTRACE_EVENT_SECCOMP)) {
+			if (action == SECCOMP_RET_TRACE_ALLOW)
+				return 0;
+
 			syscall_set_return_value(current,
-						 task_pt_regs(current),
-						 -ENOSYS, 0);
+						task_pt_regs(current),
+						-ENOSYS, 0);
 			goto skip;
 		}
 
@@ -1374,6 +1381,7 @@ static long seccomp_get_action_avail(const char __user *uaction)
 	case SECCOMP_RET_TRAP:
 	case SECCOMP_RET_ERRNO:
 	case SECCOMP_RET_USER_NOTIF:
+	case SECCOMP_RET_TRACE_ALLOW:
 	case SECCOMP_RET_TRACE:
 	case SECCOMP_RET_LOG:
 	case SECCOMP_RET_ALLOW:
@@ -1591,12 +1599,13 @@ long seccomp_get_metadata(struct task_struct *task,
 /* Human readable action names for friendly sysctl interaction */
 #define SECCOMP_RET_KILL_PROCESS_NAME	"kill_process"
 #define SECCOMP_RET_KILL_THREAD_NAME	"kill_thread"
-#define SECCOMP_RET_TRAP_NAME		"trap"
-#define SECCOMP_RET_ERRNO_NAME		"errno"
-#define SECCOMP_RET_USER_NOTIF_NAME	"user_notif"
-#define SECCOMP_RET_TRACE_NAME		"trace"
-#define SECCOMP_RET_LOG_NAME		"log"
-#define SECCOMP_RET_ALLOW_NAME		"allow"
+#define SECCOMP_RET_TRAP_NAME			"trap"
+#define SECCOMP_RET_ERRNO_NAME			"errno"
+#define SECCOMP_RET_USER_NOTIF_NAME		"user_notif"
+#define SECCOMP_RET_TRACE_NAME			"trace"
+#define SECCOMP_RET_TRACE_ALLOW_NAME 	"trace_allow"
+#define SECCOMP_RET_LOG_NAME		 	"log"
+#define SECCOMP_RET_ALLOW_NAME		 	"allow"
 
 static const char seccomp_actions_avail[] =
 				SECCOMP_RET_KILL_PROCESS_NAME	" "
@@ -1620,6 +1629,7 @@ static const struct seccomp_log_name seccomp_log_names[] = {
 	{ SECCOMP_LOG_ERRNO, SECCOMP_RET_ERRNO_NAME },
 	{ SECCOMP_LOG_USER_NOTIF, SECCOMP_RET_USER_NOTIF_NAME },
 	{ SECCOMP_LOG_TRACE, SECCOMP_RET_TRACE_NAME },
+	{ SECCOMP_LOG_TRACE_ALLOW, SECCOMP_RET_TRACE_ALLOW_NAME },
 	{ SECCOMP_LOG_LOG, SECCOMP_RET_LOG_NAME },
 	{ SECCOMP_LOG_ALLOW, SECCOMP_RET_ALLOW_NAME },
 	{ }
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [RFC PATCH] seccomp: Add a trace action that allows the syscall if ptracer absent
  2020-03-22  6:20 [RFC PATCH] seccomp: Add a trace action that allows the syscall if ptracer absent Keno Fischer
@ 2020-03-22 20:26 ` Andy Lutomirski
  0 siblings, 0 replies; 2+ messages in thread
From: Andy Lutomirski @ 2020-03-22 20:26 UTC (permalink / raw)
  To: Keno Fischer; +Cc: linux-kernel, Kees Cook, Will Drewry, Oleg Nesterov, robert



> On Mar 21, 2020, at 11:20 PM, Keno Fischer <keno@juliacomputing.com> wrote:
> 
> # Background
> 
> One usecase for seccomp that has become more common is as a sort-of
> pre-filter for ptrace events. This is useful to reduce debugging
> overhead by avoiding traps to the ptracer for events that the
> ptracer is not interested in and is used extensively for this
> purpose in modern ptrace clients such as the rr debugger [1].

Hmm. My impression is that this patch takes an existing so-so trick and makes it a bit weirder and a bit less broken but still quite hackish.

I suggest a different solution: add something like PTRACE_BPF: it resumes the target and passes each event through a BPF program.  The BPF program can decide whether to ignore the event or trap.  Or log it!

Sadly, unprivileged eBPF may still be stalled, so you may have to fudge it with classic BPF, which is doable but more awkward and less powerful.

> 
> As a simpler, but illustrative example, consider perhaps an
> expression like `strace -e open`. An advanced strace implementation
> could set a seccomp filter that causes a SECCOMP_RET_TRACE option
> for open system calls only (or even only system calls involving a
> particular file descriptor).
> 
> However, when used for this purpose, the persistent and inherited
> nature of the seccomp filter can present a problem (though
> perfectly sensible from the perspective of a security feature of
> course). Of course the ptracer could simply attach itself to the
> entire process tree, but this of course presents its own performance
> challenges negating the originall performance advantage (e.g.
> in the `strace` example above, strace may only want to trace its
> original child).
> 
> The most desirable feature would probably be for the ptracer to have
> the capability to remove a seccomp filter from a child, but I
> understand that the design of seccomp as an immutable tree of filters
> makes this difficult, not to mention that such a feature is perhaps
> undesirable given seccomp's intention as a security feature (even
> if guarded behind a non-default flag). This instead implements an
> additional trace action, that I believe can achieve much the same
> effect while being significantly less invasive:
> 
> # This patch
> 
> The default behavior of the `SECCOMP_RET_TRACE` is to fail the system
> call with `ENOSYS` if no ptracer is attached to the process. This
> patch adds a similar option `SECCOMP_RET_TRACE_ALLOW`, that behaves
> identically to `SECCOMP_RET_TRACE` in all respects, except that it
> allows the system call to go forward unmodified in the case that no
> ptracer is attached (or that the ptracer does not have the seccomp
> event enabled).
> 
> This remedies the situation discussed in the previous section by making
> it irrelevant whether or not the seccomp filter is installed in
> the children of the original initial task, as they will not see any
> difference in behavior from the presence of a properly-implemented
> seccomp filter as long as no ptracer is attached.
> 
> I think another possible use of an interface like this would be ptracers
> that want to occaisionally attach to running processes for profiling
> or logging purposes, but otherwise do not want to disturb the
> functioning of the process.
> 
> It is not entirely satisfying of course, as the new behavior will
> be re-activated by any new ptracer attaching to such a process,
> even if such a ptracer is unaware of the presence of the seccomp filter.
> Such a situation is probably fairly rare (esp. since the alternative
> is just to keep every child ptrace-attached, which would prevent
> other ptracers from attaching anyway), but it deserves pointing out
> nonetheless.
> 
> # Alternatives considered
> 
> As discussed above, I considered proposing some mechanism for the
> removal of seccomp filters after the fact, but rejected this approach
> because it seemed to invasive and counter to the design intent of the
> API.
> 
> Another idea I considered was allowing the BPF filter to access some
> ptracer-writable metadata area that could be used to modify the behavior
> of the filter after the fact (if enabled by a flag). I still think
> this would potentially be a viable solution. However, I worried that
> this would get too far into the realm of allowing eBPF seccomp filters,
> which as far as I understand was previously rejected (though I did
> not completely research the history of this suggestion). A solution
> like this would of course address the shortcoming I mentioned above
> by allowing the ptracer to render the filter truly inert.
> 
> Lastly I considered instead adding a field to the seccomp_data
> structure that would indicate whether or not the ptracer was available
> or not. However, that approached seemed a lot more likely to result
> in user confusion and incorrect filters than simply adding an additional
> action.
> 
> [1] https://github.com/mozilla/rr
> 
> Signed-Off-By: Keno Fischer <keno@juliacomputing.com>
> 
> ---
> .../userspace-api/seccomp_filter.rst          |  5 ++++
> include/uapi/linux/seccomp.h                  |  1 +
> kernel/seccomp.c                              | 26 +++++++++++++------
> 3 files changed, 24 insertions(+), 8 deletions(-)
> 
> diff --git a/Documentation/userspace-api/seccomp_filter.rst b/Documentation/userspace-api/seccomp_filter.rst
> index bd9165241b6c..95098c20c3db 100644
> --- a/Documentation/userspace-api/seccomp_filter.rst
> +++ b/Documentation/userspace-api/seccomp_filter.rst
> @@ -151,6 +151,11 @@ In precedence order, they are:
>    allow use of ptrace, even of other sandboxed processes, without
>    extreme care; ptracers can use this mechanism to escape.)
> 
> +``SECCOMP_RET_TRACE_ALLOW``:
> +    Like ``SECCOMP_RET_TRACE``, but if no tracer is present (or the tracer is
> +    not listening to seccomp events), allow the syscall rather than returning
> +    ENOSYS.
> +
> ``SECCOMP_RET_LOG``:
>    Results in the system call being executed after it is logged. This
>    should be used by application developers to learn which syscalls their
> diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
> index be84d87f1f46..7d9997a98d06 100644
> --- a/include/uapi/linux/seccomp.h
> +++ b/include/uapi/linux/seccomp.h
> @@ -39,6 +39,7 @@
> #define SECCOMP_RET_ERRNO     0x00050000U /* returns an errno */
> #define SECCOMP_RET_USER_NOTIF     0x7fc00000U /* notifies userspace */
> #define SECCOMP_RET_TRACE     0x7ff00000U /* pass to a tracer or disallow */
> +#define SECCOMP_RET_TRACE_ALLOW     0x7ff30000U /* pass to a tracer or allow */
> #define SECCOMP_RET_LOG         0x7ffc0000U /* allow after logging */
> #define SECCOMP_RET_ALLOW     0x7fff0000U /* allow */
> 
> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index b6ea3dcb57bf..af7d38666b6c 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -623,6 +623,7 @@ static void seccomp_send_sigsys(int syscall, int reason)
> #define SECCOMP_LOG_LOG            (1 << 5)
> #define SECCOMP_LOG_ALLOW        (1 << 6)
> #define SECCOMP_LOG_USER_NOTIF        (1 << 7)
> +#define SECCOMP_LOG_TRACE_ALLOW        (1 << 8)
> 
> static u32 seccomp_actions_logged = SECCOMP_LOG_KILL_PROCESS |
>                    SECCOMP_LOG_KILL_THREAD  |
> @@ -630,6 +631,7 @@ static u32 seccomp_actions_logged = SECCOMP_LOG_KILL_PROCESS |
>                    SECCOMP_LOG_ERRNO |
>                    SECCOMP_LOG_USER_NOTIF |
>                    SECCOMP_LOG_TRACE |
> +                    SECCOMP_LOG_TRACE_ALLOW |
>                    SECCOMP_LOG_LOG;
> 
> static inline void seccomp_log(unsigned long syscall, long signr, u32 action,
> @@ -646,6 +648,7 @@ static inline void seccomp_log(unsigned long syscall, long signr, u32 action,
>    case SECCOMP_RET_ERRNO:
>        log = requested && seccomp_actions_logged & SECCOMP_LOG_ERRNO;
>        break;
> +    case SECCOMP_RET_TRACE_ALLOW:
>    case SECCOMP_RET_TRACE:
>        log = requested && seccomp_actions_logged & SECCOMP_LOG_TRACE;
>        break;
> @@ -832,6 +835,7 @@ static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd,
>        seccomp_send_sigsys(this_syscall, data);
>        goto skip;
> 
> +    case SECCOMP_RET_TRACE_ALLOW:
>    case SECCOMP_RET_TRACE:
>        /* We've been put in this state by the ptracer already. */
>        if (recheck_after_trace)
> @@ -839,9 +843,12 @@ static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd,
> 
>        /* ENOSYS these calls if there is no tracer attached. */
>        if (!ptrace_event_enabled(current, PTRACE_EVENT_SECCOMP)) {
> +            if (action == SECCOMP_RET_TRACE_ALLOW)
> +                return 0;
> +
>            syscall_set_return_value(current,
> -                         task_pt_regs(current),
> -                         -ENOSYS, 0);
> +                        task_pt_regs(current),
> +                        -ENOSYS, 0);
>            goto skip;
>        }
> 
> @@ -1374,6 +1381,7 @@ static long seccomp_get_action_avail(const char __user *uaction)
>    case SECCOMP_RET_TRAP:
>    case SECCOMP_RET_ERRNO:
>    case SECCOMP_RET_USER_NOTIF:
> +    case SECCOMP_RET_TRACE_ALLOW:
>    case SECCOMP_RET_TRACE:
>    case SECCOMP_RET_LOG:
>    case SECCOMP_RET_ALLOW:
> @@ -1591,12 +1599,13 @@ long seccomp_get_metadata(struct task_struct *task,
> /* Human readable action names for friendly sysctl interaction */
> #define SECCOMP_RET_KILL_PROCESS_NAME    "kill_process"
> #define SECCOMP_RET_KILL_THREAD_NAME    "kill_thread"
> -#define SECCOMP_RET_TRAP_NAME        "trap"
> -#define SECCOMP_RET_ERRNO_NAME        "errno"
> -#define SECCOMP_RET_USER_NOTIF_NAME    "user_notif"
> -#define SECCOMP_RET_TRACE_NAME        "trace"
> -#define SECCOMP_RET_LOG_NAME        "log"
> -#define SECCOMP_RET_ALLOW_NAME        "allow"
> +#define SECCOMP_RET_TRAP_NAME            "trap"
> +#define SECCOMP_RET_ERRNO_NAME            "errno"
> +#define SECCOMP_RET_USER_NOTIF_NAME        "user_notif"
> +#define SECCOMP_RET_TRACE_NAME            "trace"
> +#define SECCOMP_RET_TRACE_ALLOW_NAME    "trace_allow"
> +#define SECCOMP_RET_LOG_NAME            "log"
> +#define SECCOMP_RET_ALLOW_NAME            "allow"
> 
> static const char seccomp_actions_avail[] =
>                SECCOMP_RET_KILL_PROCESS_NAME    " "
> @@ -1620,6 +1629,7 @@ static const struct seccomp_log_name seccomp_log_names[] = {
>    { SECCOMP_LOG_ERRNO, SECCOMP_RET_ERRNO_NAME },
>    { SECCOMP_LOG_USER_NOTIF, SECCOMP_RET_USER_NOTIF_NAME },
>    { SECCOMP_LOG_TRACE, SECCOMP_RET_TRACE_NAME },
> +    { SECCOMP_LOG_TRACE_ALLOW, SECCOMP_RET_TRACE_ALLOW_NAME },
>    { SECCOMP_LOG_LOG, SECCOMP_RET_LOG_NAME },
>    { SECCOMP_LOG_ALLOW, SECCOMP_RET_ALLOW_NAME },
>    { }
> -- 
> 2.24.0
> 

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2020-03-22 20:27 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-22  6:20 [RFC PATCH] seccomp: Add a trace action that allows the syscall if ptracer absent Keno Fischer
2020-03-22 20:26 ` Andy Lutomirski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).