All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tycho Andersen <tycho@tycho.pizza>
To: Jann Horn <jannh@google.com>
Cc: linux-man <linux-man@vger.kernel.org>,
	Song Liu <songliubraving@fb.com>, Will Drewry <wad@chromium.org>,
	Kees Cook <keescook@chromium.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Giuseppe Scrivano <gscrivan@redhat.com>,
	Robert Sesek <rsesek@google.com>,
	Linux Containers <containers@lists.linux-foundation.org>,
	lkml <linux-kernel@vger.kernel.org>,
	Alexei Starovoitov <ast@kernel.org>,
	Christian Brauner <christian.brauner@canonical.com>,
	"Michael Kerrisk \(man-pages\)" <mtk.manpages@gmail.com>,
	bpf <bpf@vger.kernel.org>, Andy Lutomirski <luto@amacapital.net>,
	Christian Brauner <christian@brauner.io>
Subject: Re: For review: seccomp_user_notif(2) manual page
Date: Thu, 1 Oct 2020 12:56:31 -0600	[thread overview]
Message-ID: <20201001185631.GD1260245@cisco> (raw)
In-Reply-To: <CAG48ez1W+Ym5=-PdUhyei_UCJov0agEF4YVyARL=pooWYmdEAg@mail.gmail.com>

On Thu, Oct 01, 2020 at 08:18:49PM +0200, Jann Horn wrote:
> On Thu, Oct 1, 2020 at 6:58 PM Tycho Andersen <tycho@tycho.pizza> wrote:
> > On Thu, Oct 01, 2020 at 05:47:54PM +0200, Jann Horn via Containers wrote:
> > > On Thu, Oct 1, 2020 at 2:54 PM Christian Brauner
> > > <christian.brauner@canonical.com> wrote:
> > > > On Wed, Sep 30, 2020 at 05:53:46PM +0200, Jann Horn via Containers wrote:
> > > > > On Wed, Sep 30, 2020 at 1:07 PM Michael Kerrisk (man-pages)
> > > > > <mtk.manpages@gmail.com> wrote:
> > > > > > NOTES
> > > > > >        The file descriptor returned when seccomp(2) is employed with the
> > > > > >        SECCOMP_FILTER_FLAG_NEW_LISTENER  flag  can  be  monitored  using
> > > > > >        poll(2), epoll(7), and select(2).  When a notification  is  pend‐
> > > > > >        ing,  these interfaces indicate that the file descriptor is read‐
> > > > > >        able.
> > > > >
> > > > > We should probably also point out somewhere that, as
> > > > > include/uapi/linux/seccomp.h says:
> > > > >
> > > > >  * Similar precautions should be applied when stacking SECCOMP_RET_USER_NOTIF
> > > > >  * or SECCOMP_RET_TRACE. For SECCOMP_RET_USER_NOTIF filters acting on the
> > > > >  * same syscall, the most recently added filter takes precedence. This means
> > > > >  * that the new SECCOMP_RET_USER_NOTIF filter can override any
> > > > >  * SECCOMP_IOCTL_NOTIF_SEND from earlier filters, essentially allowing all
> > > > >  * such filtered syscalls to be executed by sending the response
> > > > >  * SECCOMP_USER_NOTIF_FLAG_CONTINUE. Note that SECCOMP_RET_TRACE can equally
> > > > >  * be overriden by SECCOMP_USER_NOTIF_FLAG_CONTINUE.
> > > > >
> > > > > In other words, from a security perspective, you must assume that the
> > > > > target process can bypass any SECCOMP_RET_USER_NOTIF (or
> > > > > SECCOMP_RET_TRACE) filters unless it is completely prohibited from
> > > > > calling seccomp(). This should also be noted over in the main
> > > > > seccomp(2) manpage, especially the SECCOMP_RET_TRACE part.
> > > >
> > > > So I was actually wondering about this when I skimmed this and a while
> > > > ago but forgot about this again... Afaict, you can only ever load a
> > > > single filter with SECCOMP_FILTER_FLAG_NEW_LISTENER set. If there
> > > > already is a filter with the SECCOMP_FILTER_FLAG_NEW_LISTENER property
> > > > in the tasks filter hierarchy then the kernel will refuse to load a new
> > > > one?
> > > >
> > > > static struct file *init_listener(struct seccomp_filter *filter)
> > > > {
> > > >         struct file *ret = ERR_PTR(-EBUSY);
> > > >         struct seccomp_filter *cur;
> > > >
> > > >         for (cur = current->seccomp.filter; cur; cur = cur->prev) {
> > > >                 if (cur->notif)
> > > >                         goto out;
> > > >         }
> > > >
> > > > shouldn't that be sufficient to guarantee that USER_NOTIF filters can't
> > > > override each other for the same task simply because there can only ever
> > > > be a single one?
> > >
> > > Good point. Exceeeept that that check seems ineffective because this
> > > happens before we take the locks that guard against TSYNC, and also
> > > before we decide to which existing filter we want to chain the new
> > > filter. So if two threads race with TSYNC, I think they'll be able to
> > > chain two filters with listeners together.
> >
> > Yep, seems the check needs to also be in seccomp_can_sync_threads() to
> > be totally effective,
> >
> > > I don't know whether we want to eternalize this "only one listener
> > > across all the filters" restriction in the manpage though, or whether
> > > the man page should just say that the kernel currently doesn't support
> > > it but that security-wise you should assume that it might at some
> > > point.
> >
> > This requirement originally came from Andy, arguing that the semantics
> > of this were/are confusing, which still makes sense to me. Perhaps we
> > should do something like the below?
> [...]
> > +static bool has_listener_parent(struct seccomp_filter *child)
> > +{
> > +       struct seccomp_filter *cur;
> > +
> > +       for (cur = current->seccomp.filter; cur; cur = cur->prev) {
> > +               if (cur->notif)
> > +                       return true;
> > +       }
> > +
> > +       return false;
> > +}
> [...]
> > @@ -407,6 +419,11 @@ static inline pid_t seccomp_can_sync_threads(void)
> [...]
> > +               /* don't allow TSYNC to install multiple listeners */
> > +               if (flags & SECCOMP_FILTER_FLAG_NEW_LISTENER &&
> > +                   !has_listener_parent(thread->seccomp.filter))
> > +                       continue;
> [...]
> > @@ -1462,12 +1479,9 @@ static const struct file_operations seccomp_notify_ops = {
> >  static struct file *init_listener(struct seccomp_filter *filter)
> [...]
> > -       for (cur = current->seccomp.filter; cur; cur = cur->prev) {
> > -               if (cur->notif)
> > -                       goto out;
> > -       }
> > +       if (has_listener_parent(current->seccomp.filter))
> > +               goto out;
> 
> I dislike this because it combines a non-locked check and a locked
> check. And I don't think this will work in the case where TSYNC and
> non-TSYNC race - if the non-TSYNC call nests around the TSYNC filter
> installation, the thread that called seccomp in non-TSYNC mode will
> still end up with two notifying filters. How about the following?

Sure, you can add,

Reviewed-by: Tycho Andersen <tycho@tycho.pizza>

when you send it.

Tycho
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

WARNING: multiple messages have this Message-ID (diff)
From: Tycho Andersen <tycho@tycho.pizza>
To: Jann Horn <jannh@google.com>
Cc: Christian Brauner <christian.brauner@canonical.com>,
	linux-man <linux-man@vger.kernel.org>,
	Song Liu <songliubraving@fb.com>, Will Drewry <wad@chromium.org>,
	Kees Cook <keescook@chromium.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Giuseppe Scrivano <gscrivan@redhat.com>,
	Robert Sesek <rsesek@google.com>,
	Linux Containers <containers@lists.linux-foundation.org>,
	lkml <linux-kernel@vger.kernel.org>,
	Alexei Starovoitov <ast@kernel.org>,
	"Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>,
	bpf <bpf@vger.kernel.org>, Andy Lutomirski <luto@amacapital.net>,
	Christian Brauner <christian@brauner.io>
Subject: Re: For review: seccomp_user_notif(2) manual page
Date: Thu, 1 Oct 2020 12:56:31 -0600	[thread overview]
Message-ID: <20201001185631.GD1260245@cisco> (raw)
In-Reply-To: <CAG48ez1W+Ym5=-PdUhyei_UCJov0agEF4YVyARL=pooWYmdEAg@mail.gmail.com>

On Thu, Oct 01, 2020 at 08:18:49PM +0200, Jann Horn wrote:
> On Thu, Oct 1, 2020 at 6:58 PM Tycho Andersen <tycho@tycho.pizza> wrote:
> > On Thu, Oct 01, 2020 at 05:47:54PM +0200, Jann Horn via Containers wrote:
> > > On Thu, Oct 1, 2020 at 2:54 PM Christian Brauner
> > > <christian.brauner@canonical.com> wrote:
> > > > On Wed, Sep 30, 2020 at 05:53:46PM +0200, Jann Horn via Containers wrote:
> > > > > On Wed, Sep 30, 2020 at 1:07 PM Michael Kerrisk (man-pages)
> > > > > <mtk.manpages@gmail.com> wrote:
> > > > > > NOTES
> > > > > >        The file descriptor returned when seccomp(2) is employed with the
> > > > > >        SECCOMP_FILTER_FLAG_NEW_LISTENER  flag  can  be  monitored  using
> > > > > >        poll(2), epoll(7), and select(2).  When a notification  is  pend‐
> > > > > >        ing,  these interfaces indicate that the file descriptor is read‐
> > > > > >        able.
> > > > >
> > > > > We should probably also point out somewhere that, as
> > > > > include/uapi/linux/seccomp.h says:
> > > > >
> > > > >  * Similar precautions should be applied when stacking SECCOMP_RET_USER_NOTIF
> > > > >  * or SECCOMP_RET_TRACE. For SECCOMP_RET_USER_NOTIF filters acting on the
> > > > >  * same syscall, the most recently added filter takes precedence. This means
> > > > >  * that the new SECCOMP_RET_USER_NOTIF filter can override any
> > > > >  * SECCOMP_IOCTL_NOTIF_SEND from earlier filters, essentially allowing all
> > > > >  * such filtered syscalls to be executed by sending the response
> > > > >  * SECCOMP_USER_NOTIF_FLAG_CONTINUE. Note that SECCOMP_RET_TRACE can equally
> > > > >  * be overriden by SECCOMP_USER_NOTIF_FLAG_CONTINUE.
> > > > >
> > > > > In other words, from a security perspective, you must assume that the
> > > > > target process can bypass any SECCOMP_RET_USER_NOTIF (or
> > > > > SECCOMP_RET_TRACE) filters unless it is completely prohibited from
> > > > > calling seccomp(). This should also be noted over in the main
> > > > > seccomp(2) manpage, especially the SECCOMP_RET_TRACE part.
> > > >
> > > > So I was actually wondering about this when I skimmed this and a while
> > > > ago but forgot about this again... Afaict, you can only ever load a
> > > > single filter with SECCOMP_FILTER_FLAG_NEW_LISTENER set. If there
> > > > already is a filter with the SECCOMP_FILTER_FLAG_NEW_LISTENER property
> > > > in the tasks filter hierarchy then the kernel will refuse to load a new
> > > > one?
> > > >
> > > > static struct file *init_listener(struct seccomp_filter *filter)
> > > > {
> > > >         struct file *ret = ERR_PTR(-EBUSY);
> > > >         struct seccomp_filter *cur;
> > > >
> > > >         for (cur = current->seccomp.filter; cur; cur = cur->prev) {
> > > >                 if (cur->notif)
> > > >                         goto out;
> > > >         }
> > > >
> > > > shouldn't that be sufficient to guarantee that USER_NOTIF filters can't
> > > > override each other for the same task simply because there can only ever
> > > > be a single one?
> > >
> > > Good point. Exceeeept that that check seems ineffective because this
> > > happens before we take the locks that guard against TSYNC, and also
> > > before we decide to which existing filter we want to chain the new
> > > filter. So if two threads race with TSYNC, I think they'll be able to
> > > chain two filters with listeners together.
> >
> > Yep, seems the check needs to also be in seccomp_can_sync_threads() to
> > be totally effective,
> >
> > > I don't know whether we want to eternalize this "only one listener
> > > across all the filters" restriction in the manpage though, or whether
> > > the man page should just say that the kernel currently doesn't support
> > > it but that security-wise you should assume that it might at some
> > > point.
> >
> > This requirement originally came from Andy, arguing that the semantics
> > of this were/are confusing, which still makes sense to me. Perhaps we
> > should do something like the below?
> [...]
> > +static bool has_listener_parent(struct seccomp_filter *child)
> > +{
> > +       struct seccomp_filter *cur;
> > +
> > +       for (cur = current->seccomp.filter; cur; cur = cur->prev) {
> > +               if (cur->notif)
> > +                       return true;
> > +       }
> > +
> > +       return false;
> > +}
> [...]
> > @@ -407,6 +419,11 @@ static inline pid_t seccomp_can_sync_threads(void)
> [...]
> > +               /* don't allow TSYNC to install multiple listeners */
> > +               if (flags & SECCOMP_FILTER_FLAG_NEW_LISTENER &&
> > +                   !has_listener_parent(thread->seccomp.filter))
> > +                       continue;
> [...]
> > @@ -1462,12 +1479,9 @@ static const struct file_operations seccomp_notify_ops = {
> >  static struct file *init_listener(struct seccomp_filter *filter)
> [...]
> > -       for (cur = current->seccomp.filter; cur; cur = cur->prev) {
> > -               if (cur->notif)
> > -                       goto out;
> > -       }
> > +       if (has_listener_parent(current->seccomp.filter))
> > +               goto out;
> 
> I dislike this because it combines a non-locked check and a locked
> check. And I don't think this will work in the case where TSYNC and
> non-TSYNC race - if the non-TSYNC call nests around the TSYNC filter
> installation, the thread that called seccomp in non-TSYNC mode will
> still end up with two notifying filters. How about the following?

Sure, you can add,

Reviewed-by: Tycho Andersen <tycho@tycho.pizza>

when you send it.

Tycho

  reply	other threads:[~2020-10-01 18:56 UTC|newest]

Thread overview: 105+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-30 11:07 For review: seccomp_user_notif(2) manual page Michael Kerrisk (man-pages)
2020-09-30 11:07 ` Michael Kerrisk (man-pages)
2020-09-30 15:03 ` Tycho Andersen
2020-09-30 15:03   ` Tycho Andersen
2020-09-30 15:11   ` Tycho Andersen
2020-09-30 15:11     ` Tycho Andersen
2020-09-30 20:34   ` Michael Kerrisk (man-pages)
2020-09-30 20:34     ` Michael Kerrisk (man-pages)
2020-09-30 23:03     ` Tycho Andersen
2020-09-30 23:03       ` Tycho Andersen
2020-09-30 23:11       ` Jann Horn via Containers
2020-09-30 23:11         ` Jann Horn
2020-09-30 23:24         ` Tycho Andersen
2020-09-30 23:24           ` Tycho Andersen
2020-10-01  1:52           ` Jann Horn via Containers
2020-10-01  1:52             ` Jann Horn
2020-10-01  2:14             ` Jann Horn via Containers
2020-10-01  2:14               ` Jann Horn
2020-10-25 16:31               ` Michael Kerrisk (man-pages)
2020-10-25 16:31                 ` Michael Kerrisk (man-pages)
2020-10-26 15:54                 ` Jann Horn via Containers
2020-10-26 15:54                   ` Jann Horn
2020-10-27  6:14                   ` Michael Kerrisk (man-pages)
2020-10-27  6:14                     ` Michael Kerrisk (man-pages)
2020-10-27 10:28                     ` Jann Horn via Containers
2020-10-27 10:28                       ` Jann Horn
2020-10-28  6:31                       ` Sargun Dhillon
2020-10-28  6:31                         ` Sargun Dhillon
2020-10-28  9:43                         ` Jann Horn via Containers
2020-10-28  9:43                           ` Jann Horn
2020-10-28 17:43                           ` Sargun Dhillon
2020-10-28 17:43                             ` Sargun Dhillon
2020-10-28 18:20                             ` Jann Horn via Containers
2020-10-28 18:20                               ` Jann Horn
2020-10-01  7:49             ` Michael Kerrisk (man-pages)
2020-10-01  7:49               ` Michael Kerrisk (man-pages)
2020-10-26  0:32             ` Kees Cook
2020-10-26  0:32               ` Kees Cook
2020-10-26  9:51               ` Jann Horn via Containers
2020-10-26  9:51                 ` Jann Horn
2020-10-26 10:31                 ` Jann Horn via Containers
2020-10-26 10:31                   ` Jann Horn
2020-10-28 22:56                   ` Kees Cook
2020-10-28 22:56                     ` Kees Cook
2020-10-29  1:11                     ` Jann Horn via Containers
2020-10-29  1:11                       ` Jann Horn
2020-10-29  2:13                   ` Tycho Andersen
2020-10-29  4:26                     ` Jann Horn via Containers
2020-10-29  4:26                       ` Jann Horn
2020-10-28 22:53                 ` Kees Cook
2020-10-28 22:53                   ` Kees Cook
2020-10-29  1:25                   ` Jann Horn via Containers
2020-10-29  1:25                     ` Jann Horn
2020-10-01  7:45       ` Michael Kerrisk (man-pages)
2020-10-01  7:45         ` Michael Kerrisk (man-pages)
2020-10-14  4:40         ` Michael Kerrisk (man-pages)
2020-10-14  4:40           ` Michael Kerrisk (man-pages)
2020-09-30 15:53 ` Jann Horn via Containers
2020-09-30 15:53   ` Jann Horn
2020-10-01 12:54   ` Christian Brauner
2020-10-01 12:54     ` Christian Brauner
2020-10-01 15:47     ` Jann Horn via Containers
2020-10-01 15:47       ` Jann Horn
2020-10-01 16:58       ` Tycho Andersen
2020-10-01 16:58         ` Tycho Andersen
2020-10-01 17:12         ` Christian Brauner
2020-10-01 17:12           ` Christian Brauner
2020-10-14  5:41           ` Michael Kerrisk (man-pages)
2020-10-14  5:41             ` Michael Kerrisk (man-pages)
2020-10-01 18:18         ` Jann Horn via Containers
2020-10-01 18:18           ` Jann Horn
2020-10-01 18:56           ` Tycho Andersen [this message]
2020-10-01 18:56             ` Tycho Andersen
2020-10-01 17:05       ` Christian Brauner
2020-10-01 17:05         ` Christian Brauner
2020-10-15 11:24   ` Michael Kerrisk (man-pages)
2020-10-15 11:24     ` Michael Kerrisk (man-pages)
2020-10-15 20:32     ` Jann Horn via Containers
2020-10-15 20:32       ` Jann Horn
2020-10-16 18:29       ` Michael Kerrisk (man-pages)
2020-10-16 18:29         ` Michael Kerrisk (man-pages)
2020-10-17  0:25         ` Jann Horn via Containers
2020-10-17  0:25           ` Jann Horn
2020-10-24 12:52           ` Michael Kerrisk (man-pages)
2020-10-24 12:52             ` Michael Kerrisk (man-pages)
2020-10-26  9:32             ` Jann Horn via Containers
2020-10-26  9:32               ` Jann Horn
2020-10-26  9:47               ` Michael Kerrisk (man-pages)
2020-10-26  9:47                 ` Michael Kerrisk (man-pages)
2020-09-30 23:39 ` Kees Cook
2020-09-30 23:39   ` Kees Cook
2020-10-15 11:24   ` Michael Kerrisk (man-pages)
2020-10-15 11:24     ` Michael Kerrisk (man-pages)
2020-10-26  0:19     ` Kees Cook
2020-10-26  0:19       ` Kees Cook
2020-10-26  9:39       ` Michael Kerrisk (man-pages)
2020-10-26  9:39         ` Michael Kerrisk (man-pages)
2020-10-01 12:36 ` Christian Brauner
2020-10-01 12:36   ` Christian Brauner
2020-10-15 11:23   ` Michael Kerrisk (man-pages)
2020-10-15 11:23     ` Michael Kerrisk (man-pages)
2020-10-01 21:06 ` Sargun Dhillon
2020-10-01 21:06   ` Sargun Dhillon
2020-10-01 23:19   ` Tycho Andersen
2020-10-01 23:19     ` Tycho Andersen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201001185631.GD1260245@cisco \
    --to=tycho@tycho.pizza \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=christian.brauner@canonical.com \
    --cc=christian@brauner.io \
    --cc=containers@lists.linux-foundation.org \
    --cc=daniel@iogearbox.net \
    --cc=gscrivan@redhat.com \
    --cc=jannh@google.com \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-man@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=mtk.manpages@gmail.com \
    --cc=rsesek@google.com \
    --cc=songliubraving@fb.com \
    --cc=wad@chromium.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.