All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kees Cook <keescook@chromium.org>
To: Sargun Dhillon <sargun@sargun.me>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Linux Containers <containers@lists.linux-foundation.org>,
	Rodrigo Campos <rodrigo@kinvolk.io>,
	Christian Brauner <christian.brauner@ubuntu.com>,
	Giuseppe Scrivano <gscrivan@redhat.com>,
	Will Drewry <wad@chromium.org>,
	Andy Lutomirski <luto@amacapital.net>,
	Alban Crequy <alban@kinvolk.io>
Subject: Re: [PATCH v3 1/2] seccomp: Add wait_killable semantic to seccomp user notifier
Date: Fri, 29 Apr 2022 11:22:23 -0700	[thread overview]
Message-ID: <202204291120.79624BD@keescook> (raw)
In-Reply-To: <20220429023113.74993-2-sargun@sargun.me>

On Thu, Apr 28, 2022 at 07:31:12PM -0700, Sargun Dhillon wrote:
> This introduces a per-filter flag (SECCOMP_FILTER_FLAG_WAIT_KILLABLE_RECV)
> that makes it so that when notifications are received by the supervisor
> the notifying process will transition to wait killable semantics. Although
> wait killable isn't a set of semantics formally exposed to userspace,
> the concept is searchable. If the notifying process is signaled prior
> to the notification being received by the userspace agent, it will
> be handled as normal.
> 
> One quirk about how this is handled is that the notifying process
> only switches to TASK_KILLABLE if it receives a wakeup from either
> an addfd or a signal. This is to avoid an unnecessary wakeup of
> the notifying task.
> 
> Signed-off-by: Sargun Dhillon <sargun@sargun.me>
> ---
>  .../userspace-api/seccomp_filter.rst          |  8 ++++
>  include/linux/seccomp.h                       |  3 +-
>  include/uapi/linux/seccomp.h                  |  2 +
>  kernel/seccomp.c                              | 42 ++++++++++++++++++-
>  4 files changed, 52 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/userspace-api/seccomp_filter.rst b/Documentation/userspace-api/seccomp_filter.rst
> index 539e9d4a4860..204cf5ba511a 100644
> --- a/Documentation/userspace-api/seccomp_filter.rst
> +++ b/Documentation/userspace-api/seccomp_filter.rst
> @@ -271,6 +271,14 @@ notifying process it will be replaced. The supervisor can also add an FD, and
>  respond atomically by using the ``SECCOMP_ADDFD_FLAG_SEND`` flag and the return
>  value will be the injected file descriptor number.
>  
> +The notifying process can be preempted, resulting in the notification being
> +aborted. This can be problematic when trying to take actions on behalf of the
> +notifying process that are long-running and typically retryable (mounting a
> +filesytem). Alternatively, the at filter installation time, the

typo: "the at" -> "at"

> +``SECCOMP_FILTER_FLAG_WAIT_KILLABLE_RECV`` flag can be set. This flag makes it
> +such that when a user notification is received by the supervisor, the notifying
> +process will ignore non-fatal signals until the response is sent.
> +
>  It is worth noting that ``struct seccomp_data`` contains the values of register
>  arguments to the syscall, but does not contain pointers to memory. The task's
>  memory is accessible to suitably privileged traces via ``ptrace()`` or
> diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
> index 0c564e5d40ff..d31d76be4982 100644
> --- a/include/linux/seccomp.h
> +++ b/include/linux/seccomp.h
> @@ -8,7 +8,8 @@
>  					 SECCOMP_FILTER_FLAG_LOG | \
>  					 SECCOMP_FILTER_FLAG_SPEC_ALLOW | \
>  					 SECCOMP_FILTER_FLAG_NEW_LISTENER | \
> -					 SECCOMP_FILTER_FLAG_TSYNC_ESRCH)
> +					 SECCOMP_FILTER_FLAG_TSYNC_ESRCH | \
> +					 SECCOMP_FILTER_FLAG_WAIT_KILLABLE_RECV)
>  
>  /* sizeof() the first published struct seccomp_notif_addfd */
>  #define SECCOMP_NOTIFY_ADDFD_SIZE_VER0 24
> diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
> index 78074254ab98..0fdc6ef02b94 100644
> --- a/include/uapi/linux/seccomp.h
> +++ b/include/uapi/linux/seccomp.h
> @@ -23,6 +23,8 @@
>  #define SECCOMP_FILTER_FLAG_SPEC_ALLOW		(1UL << 2)
>  #define SECCOMP_FILTER_FLAG_NEW_LISTENER	(1UL << 3)
>  #define SECCOMP_FILTER_FLAG_TSYNC_ESRCH		(1UL << 4)
> +/* Received notifications wait in killable state (only respond to fatal signals) */
> +#define SECCOMP_FILTER_FLAG_WAIT_KILLABLE_RECV	(1UL << 5)
>  
>  /*
>   * All BPF programs must return a 32-bit value.
> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index db10e73d06e0..9291b0843cb2 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -201,6 +201,8 @@ static inline void seccomp_cache_prepare(struct seccomp_filter *sfilter)
>   *	   the filter can be freed.
>   * @cache: cache of arch/syscall mappings to actions
>   * @log: true if all actions except for SECCOMP_RET_ALLOW should be logged
> + * @wait_killable_recv: Put notifying process in killable state once the
> + *			notification is received by the userspace listener.
>   * @prev: points to a previously installed, or inherited, filter
>   * @prog: the BPF program to evaluate
>   * @notif: the struct that holds all notification related information
> @@ -221,6 +223,7 @@ struct seccomp_filter {
>  	refcount_t refs;
>  	refcount_t users;
>  	bool log;
> +	bool wait_killable_recv;
>  	struct action_cache cache;
>  	struct seccomp_filter *prev;
>  	struct bpf_prog *prog;
> @@ -894,6 +897,10 @@ static long seccomp_attach_filter(unsigned int flags,
>  	if (flags & SECCOMP_FILTER_FLAG_LOG)
>  		filter->log = true;
>  
> +	/* Set wait killable flag, if present. */
> +	if (flags & SECCOMP_FILTER_FLAG_WAIT_KILLABLE_RECV)
> +		filter->wait_killable_recv = true;
> +
>  	/*
>  	 * If there is an existing filter, make it the prev and don't drop its
>  	 * task reference.
> @@ -1081,6 +1088,12 @@ static void seccomp_handle_addfd(struct seccomp_kaddfd *addfd, struct seccomp_kn
>  	complete(&addfd->completion);
>  }
>  
> +static bool should_sleep_killable(struct seccomp_filter *match,
> +				  struct seccomp_knotif *n)
> +{
> +	return match->wait_killable_recv && n->state == SECCOMP_NOTIFY_SENT;
> +}
> +
>  static int seccomp_do_user_notification(int this_syscall,
>  					struct seccomp_filter *match,
>  					const struct seccomp_data *sd)
> @@ -1111,11 +1124,25 @@ static int seccomp_do_user_notification(int this_syscall,
>  	 * This is where we wait for a reply from userspace.
>  	 */
>  	do {
> +		bool wait_killable = should_sleep_killable(match, &n);
> +
>  		mutex_unlock(&match->notify_lock);
> -		err = wait_for_completion_interruptible(&n.ready);
> +		if (wait_killable)
> +			err = wait_for_completion_killable(&n.ready);
> +		else
> +			err = wait_for_completion_interruptible(&n.ready);
>  		mutex_lock(&match->notify_lock);
> -		if (err != 0)
> +
> +		if (err != 0) {
> +			/*
> +			 * Check to see if the notifcation got picked up and
> +			 * whether we should switch to wait killable.
> +			 */
> +			if (!wait_killable && should_sleep_killable(match, &n))
> +				continue;
> +
>  			goto interrupted;
> +		}
>  
>  		addfd = list_first_entry_or_null(&n.addfd,
>  						 struct seccomp_kaddfd, list);
> @@ -1485,6 +1512,9 @@ static long seccomp_notify_recv(struct seccomp_filter *filter,
>  		mutex_lock(&filter->notify_lock);
>  		knotif = find_notification(filter, unotif.id);
>  		if (knotif) {
> +			/* Reset the process to make sure it's not stuck */
> +			if (should_sleep_killable(filter, knotif))
> +				complete(&knotif->ready);
>  			knotif->state = SECCOMP_NOTIFY_INIT;
>  			up(&filter->notif->request);
>  		}
> @@ -1830,6 +1860,14 @@ static long seccomp_set_mode_filter(unsigned int flags,
>  	    ((flags & SECCOMP_FILTER_FLAG_TSYNC_ESRCH) == 0))
>  		return -EINVAL;
>  
> +	/*
> +	 * The SECCOMP_FILTER_FLAG_WAIT_KILLABLE_SENT flag doesn't make sense
> +	 * without the SECCOMP_FILTER_FLAG_NEW_LISTENER flag.
> +	 */
> +	if ((flags & SECCOMP_FILTER_FLAG_WAIT_KILLABLE_RECV) &&
> +	    ((flags & SECCOMP_FILTER_FLAG_NEW_LISTENER) == 0))
> +		return -EINVAL;
> +
>  	/* Prepare the new filter before holding any locks. */
>  	prepared = seccomp_prepare_user_filter(filter);
>  	if (IS_ERR(prepared))

Otherwise, looks good. Thanks for bringing this back up!

-- 
Kees Cook

  parent reply	other threads:[~2022-04-29 18:22 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-29  2:31 [PATCH v3 0/2] Handle seccomp notification preemption Sargun Dhillon
2022-04-29  2:31 ` [PATCH v3 1/2] seccomp: Add wait_killable semantic to seccomp user notifier Sargun Dhillon
2022-04-29  9:42   ` Rodrigo Campos
2022-04-29 17:14     ` Sargun Dhillon
2022-04-29 18:20       ` Kees Cook
2022-05-02 12:48         ` Rodrigo Campos
2022-04-29 18:22   ` Kees Cook [this message]
2022-05-02 14:15   ` Rodrigo Campos
2022-05-02 16:04     ` Sargun Dhillon
2022-05-03 14:27       ` Rodrigo Campos
2022-04-29  2:31 ` [PATCH v3 2/2] selftests/seccomp: Add test for wait killable notifier Sargun Dhillon
2022-04-29 18:19   ` Kees Cook
2022-04-29 22:35     ` Sargun Dhillon
2022-04-29 22:43       ` Kees Cook
2022-04-29  9:24 ` [PATCH v3 0/2] Handle seccomp notification preemption Rodrigo Campos

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=202204291120.79624BD@keescook \
    --to=keescook@chromium.org \
    --cc=alban@kinvolk.io \
    --cc=christian.brauner@ubuntu.com \
    --cc=containers@lists.linux-foundation.org \
    --cc=gscrivan@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=rodrigo@kinvolk.io \
    --cc=sargun@sargun.me \
    --cc=wad@chromium.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.