All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
To: Christian Brauner <christian.brauner@canonical.com>,
	Tycho Andersen <tycho@tycho.pizza>
Cc: linux-man <linux-man@vger.kernel.org>,
	Song Liu <songliubraving@fb.com>, Will Drewry <wad@chromium.org>,
	Kees Cook <keescook@chromium.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Jann Horn <jannh@google.com>, Robert Sesek <rsesek@google.com>,
	Linux Containers <containers@lists.linux-foundation.org>,
	lkml <linux-kernel@vger.kernel.org>,
	Alexei Starovoitov <ast@kernel.org>,
	mtk.manpages@gmail.com, Giuseppe Scrivano <gscrivan@redhat.com>,
	bpf <bpf@vger.kernel.org>, Andy Lutomirski <luto@amacapital.net>,
	Christian Brauner <christian@brauner.io>
Subject: Re: For review: seccomp_user_notif(2) manual page
Date: Wed, 14 Oct 2020 07:41:07 +0200	[thread overview]
Message-ID: <3a417df2-6346-601d-568e-29307347e6aa@gmail.com> (raw)
In-Reply-To: <20201001171206.jvkdx4htqux5agdv@gmail.com>

On 10/1/20 7:12 PM, Christian Brauner wrote:
> On Thu, Oct 01, 2020 at 10:58:50AM -0600, Tycho Andersen wrote:
>> On Thu, Oct 01, 2020 at 05:47:54PM +0200, Jann Horn via Containers wrote:
>>> On Thu, Oct 1, 2020 at 2:54 PM Christian Brauner
>>> <christian.brauner@canonical.com> wrote:
>>>> On Wed, Sep 30, 2020 at 05:53:46PM +0200, Jann Horn via Containers wrote:
>>>>> On Wed, Sep 30, 2020 at 1:07 PM Michael Kerrisk (man-pages)
>>>>> <mtk.manpages@gmail.com> wrote:
>>>>>> NOTES
>>>>>>        The file descriptor returned when seccomp(2) is employed with the
>>>>>>        SECCOMP_FILTER_FLAG_NEW_LISTENER  flag  can  be  monitored  using
>>>>>>        poll(2), epoll(7), and select(2).  When a notification  is  pend‐
>>>>>>        ing,  these interfaces indicate that the file descriptor is read‐
>>>>>>        able.
>>>>>
>>>>> We should probably also point out somewhere that, as
>>>>> include/uapi/linux/seccomp.h says:
>>>>>
>>>>>  * Similar precautions should be applied when stacking SECCOMP_RET_USER_NOTIF
>>>>>  * or SECCOMP_RET_TRACE. For SECCOMP_RET_USER_NOTIF filters acting on the
>>>>>  * same syscall, the most recently added filter takes precedence. This means
>>>>>  * that the new SECCOMP_RET_USER_NOTIF filter can override any
>>>>>  * SECCOMP_IOCTL_NOTIF_SEND from earlier filters, essentially allowing all
>>>>>  * such filtered syscalls to be executed by sending the response
>>>>>  * SECCOMP_USER_NOTIF_FLAG_CONTINUE. Note that SECCOMP_RET_TRACE can equally
>>>>>  * be overriden by SECCOMP_USER_NOTIF_FLAG_CONTINUE.
>>>>>
>>>>> In other words, from a security perspective, you must assume that the
>>>>> target process can bypass any SECCOMP_RET_USER_NOTIF (or
>>>>> SECCOMP_RET_TRACE) filters unless it is completely prohibited from
>>>>> calling seccomp(). This should also be noted over in the main
>>>>> seccomp(2) manpage, especially the SECCOMP_RET_TRACE part.
>>>>
>>>> So I was actually wondering about this when I skimmed this and a while
>>>> ago but forgot about this again... Afaict, you can only ever load a
>>>> single filter with SECCOMP_FILTER_FLAG_NEW_LISTENER set. If there
>>>> already is a filter with the SECCOMP_FILTER_FLAG_NEW_LISTENER property
>>>> in the tasks filter hierarchy then the kernel will refuse to load a new
>>>> one?
>>>>
>>>> static struct file *init_listener(struct seccomp_filter *filter)
>>>> {
>>>>         struct file *ret = ERR_PTR(-EBUSY);
>>>>         struct seccomp_filter *cur;
>>>>
>>>>         for (cur = current->seccomp.filter; cur; cur = cur->prev) {
>>>>                 if (cur->notif)
>>>>                         goto out;
>>>>         }
>>>>
>>>> shouldn't that be sufficient to guarantee that USER_NOTIF filters can't
>>>> override each other for the same task simply because there can only ever
>>>> be a single one?
>>>
>>> Good point. Exceeeept that that check seems ineffective because this
>>> happens before we take the locks that guard against TSYNC, and also
>>> before we decide to which existing filter we want to chain the new
>>> filter. So if two threads race with TSYNC, I think they'll be able to
>>> chain two filters with listeners together.
>>
>> Yep, seems the check needs to also be in seccomp_can_sync_threads() to
>> be totally effective,
>>
>>> I don't know whether we want to eternalize this "only one listener
>>> across all the filters" restriction in the manpage though, or whether
>>> the man page should just say that the kernel currently doesn't support
>>> it but that security-wise you should assume that it might at some
>>> point.
>>
>> This requirement originally came from Andy, arguing that the semantics
>> of this were/are confusing, which still makes sense to me. Perhaps we
>> should do something like the below?
> 
> I think we should either keep up this restriction and then cement it in
> the manpage or add a flag to indicate that the notifier is
> non-overridable.
> I don't care about the default too much, i.e. whether it's overridable
> by default and exclusive if opting in or the other way around doesn't
> matter too much. But from a supervisor's perspective it'd be quite nice
> to be able to be sure that a notifier can't be overriden by another
> notifier.
> 
> I think having a flag would provide the greatest flexibility but I agree
> that the semantics of multiple listeners are kinda odd.

So, for now, I have applied the patch at the foot of this mail
to the pages. Does this seem correct?

> Below looks sane to me though again, I'm not sitting in fron of source
> code.
[...]

Thanks,

Michael

PS Jann, if you see this, I'm still working through your (extensive
and very helpful) review comments. I will be sending a response.

======

diff --git a/man2/seccomp.2 b/man2/seccomp.2
index 9ab07f4ab..45a6984df 100644
--- a/man2/seccomp.2
+++ b/man2/seccomp.2
@@ -221,6 +221,11 @@ return a new user-space notification file descriptor.
 When the filter returns
 .BR SECCOMP_RET_USER_NOTIF
 a notification will be sent to this file descriptor.
+.IP
+At most one seccomp filter using the
+.BR SECCOMP_FILTER_FLAG_NEW_LISTENER
+flag can be installed for a thread.
+.IP
 See
 .BR seccomp_user_notif (2)
 for further details.
@@ -789,6 +794,12 @@ capability in its user namespace, or had not set
 before using
 .BR SECCOMP_SET_MODE_FILTER .
 .TP
+.BR EBUSY
+While installing a new filter, the
+.BR SECCOMP_FILTER_FLAG_NEW_LISTENER
+flag was specified,
+but a previous filter had already been installed with that flag.
+.TP
 .BR EFAULT
 .IR args
 was not a valid address.
diff --git a/man2/seccomp_user_notif.2 b/man2/seccomp_user_notif.2
index a6025e4d4..d1a406f46 100644
--- a/man2/seccomp_user_notif.2
+++ b/man2/seccomp_user_notif.2
@@ -92,6 +92,7 @@ Consequently, the return value  of the (successful)
 .BR seccomp (2)
 call is a new "listening"
 file descriptor that can be used to receive notifications.
+Only one such "listener" can be established.
 .IP \(bu
 In cases where it is appropriate, the seccomp filter returns the action value
 .BR SECCOMP_RET_USER_NOTIF .

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

WARNING: multiple messages have this Message-ID (diff)
From: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
To: Christian Brauner <christian.brauner@canonical.com>,
	Tycho Andersen <tycho@tycho.pizza>
Cc: mtk.manpages@gmail.com, Jann Horn <jannh@google.com>,
	linux-man <linux-man@vger.kernel.org>,
	Song Liu <songliubraving@fb.com>, Will Drewry <wad@chromium.org>,
	Kees Cook <keescook@chromium.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Giuseppe Scrivano <gscrivan@redhat.com>,
	Robert Sesek <rsesek@google.com>,
	Linux Containers <containers@lists.linux-foundation.org>,
	lkml <linux-kernel@vger.kernel.org>,
	Alexei Starovoitov <ast@kernel.org>, bpf <bpf@vger.kernel.org>,
	Andy Lutomirski <luto@amacapital.net>,
	Christian Brauner <christian@brauner.io>
Subject: Re: For review: seccomp_user_notif(2) manual page
Date: Wed, 14 Oct 2020 07:41:07 +0200	[thread overview]
Message-ID: <3a417df2-6346-601d-568e-29307347e6aa@gmail.com> (raw)
In-Reply-To: <20201001171206.jvkdx4htqux5agdv@gmail.com>

On 10/1/20 7:12 PM, Christian Brauner wrote:
> On Thu, Oct 01, 2020 at 10:58:50AM -0600, Tycho Andersen wrote:
>> On Thu, Oct 01, 2020 at 05:47:54PM +0200, Jann Horn via Containers wrote:
>>> On Thu, Oct 1, 2020 at 2:54 PM Christian Brauner
>>> <christian.brauner@canonical.com> wrote:
>>>> On Wed, Sep 30, 2020 at 05:53:46PM +0200, Jann Horn via Containers wrote:
>>>>> On Wed, Sep 30, 2020 at 1:07 PM Michael Kerrisk (man-pages)
>>>>> <mtk.manpages@gmail.com> wrote:
>>>>>> NOTES
>>>>>>        The file descriptor returned when seccomp(2) is employed with the
>>>>>>        SECCOMP_FILTER_FLAG_NEW_LISTENER  flag  can  be  monitored  using
>>>>>>        poll(2), epoll(7), and select(2).  When a notification  is  pend‐
>>>>>>        ing,  these interfaces indicate that the file descriptor is read‐
>>>>>>        able.
>>>>>
>>>>> We should probably also point out somewhere that, as
>>>>> include/uapi/linux/seccomp.h says:
>>>>>
>>>>>  * Similar precautions should be applied when stacking SECCOMP_RET_USER_NOTIF
>>>>>  * or SECCOMP_RET_TRACE. For SECCOMP_RET_USER_NOTIF filters acting on the
>>>>>  * same syscall, the most recently added filter takes precedence. This means
>>>>>  * that the new SECCOMP_RET_USER_NOTIF filter can override any
>>>>>  * SECCOMP_IOCTL_NOTIF_SEND from earlier filters, essentially allowing all
>>>>>  * such filtered syscalls to be executed by sending the response
>>>>>  * SECCOMP_USER_NOTIF_FLAG_CONTINUE. Note that SECCOMP_RET_TRACE can equally
>>>>>  * be overriden by SECCOMP_USER_NOTIF_FLAG_CONTINUE.
>>>>>
>>>>> In other words, from a security perspective, you must assume that the
>>>>> target process can bypass any SECCOMP_RET_USER_NOTIF (or
>>>>> SECCOMP_RET_TRACE) filters unless it is completely prohibited from
>>>>> calling seccomp(). This should also be noted over in the main
>>>>> seccomp(2) manpage, especially the SECCOMP_RET_TRACE part.
>>>>
>>>> So I was actually wondering about this when I skimmed this and a while
>>>> ago but forgot about this again... Afaict, you can only ever load a
>>>> single filter with SECCOMP_FILTER_FLAG_NEW_LISTENER set. If there
>>>> already is a filter with the SECCOMP_FILTER_FLAG_NEW_LISTENER property
>>>> in the tasks filter hierarchy then the kernel will refuse to load a new
>>>> one?
>>>>
>>>> static struct file *init_listener(struct seccomp_filter *filter)
>>>> {
>>>>         struct file *ret = ERR_PTR(-EBUSY);
>>>>         struct seccomp_filter *cur;
>>>>
>>>>         for (cur = current->seccomp.filter; cur; cur = cur->prev) {
>>>>                 if (cur->notif)
>>>>                         goto out;
>>>>         }
>>>>
>>>> shouldn't that be sufficient to guarantee that USER_NOTIF filters can't
>>>> override each other for the same task simply because there can only ever
>>>> be a single one?
>>>
>>> Good point. Exceeeept that that check seems ineffective because this
>>> happens before we take the locks that guard against TSYNC, and also
>>> before we decide to which existing filter we want to chain the new
>>> filter. So if two threads race with TSYNC, I think they'll be able to
>>> chain two filters with listeners together.
>>
>> Yep, seems the check needs to also be in seccomp_can_sync_threads() to
>> be totally effective,
>>
>>> I don't know whether we want to eternalize this "only one listener
>>> across all the filters" restriction in the manpage though, or whether
>>> the man page should just say that the kernel currently doesn't support
>>> it but that security-wise you should assume that it might at some
>>> point.
>>
>> This requirement originally came from Andy, arguing that the semantics
>> of this were/are confusing, which still makes sense to me. Perhaps we
>> should do something like the below?
> 
> I think we should either keep up this restriction and then cement it in
> the manpage or add a flag to indicate that the notifier is
> non-overridable.
> I don't care about the default too much, i.e. whether it's overridable
> by default and exclusive if opting in or the other way around doesn't
> matter too much. But from a supervisor's perspective it'd be quite nice
> to be able to be sure that a notifier can't be overriden by another
> notifier.
> 
> I think having a flag would provide the greatest flexibility but I agree
> that the semantics of multiple listeners are kinda odd.

So, for now, I have applied the patch at the foot of this mail
to the pages. Does this seem correct?

> Below looks sane to me though again, I'm not sitting in fron of source
> code.
[...]

Thanks,

Michael

PS Jann, if you see this, I'm still working through your (extensive
and very helpful) review comments. I will be sending a response.

======

diff --git a/man2/seccomp.2 b/man2/seccomp.2
index 9ab07f4ab..45a6984df 100644
--- a/man2/seccomp.2
+++ b/man2/seccomp.2
@@ -221,6 +221,11 @@ return a new user-space notification file descriptor.
 When the filter returns
 .BR SECCOMP_RET_USER_NOTIF
 a notification will be sent to this file descriptor.
+.IP
+At most one seccomp filter using the
+.BR SECCOMP_FILTER_FLAG_NEW_LISTENER
+flag can be installed for a thread.
+.IP
 See
 .BR seccomp_user_notif (2)
 for further details.
@@ -789,6 +794,12 @@ capability in its user namespace, or had not set
 before using
 .BR SECCOMP_SET_MODE_FILTER .
 .TP
+.BR EBUSY
+While installing a new filter, the
+.BR SECCOMP_FILTER_FLAG_NEW_LISTENER
+flag was specified,
+but a previous filter had already been installed with that flag.
+.TP
 .BR EFAULT
 .IR args
 was not a valid address.
diff --git a/man2/seccomp_user_notif.2 b/man2/seccomp_user_notif.2
index a6025e4d4..d1a406f46 100644
--- a/man2/seccomp_user_notif.2
+++ b/man2/seccomp_user_notif.2
@@ -92,6 +92,7 @@ Consequently, the return value  of the (successful)
 .BR seccomp (2)
 call is a new "listening"
 file descriptor that can be used to receive notifications.
+Only one such "listener" can be established.
 .IP \(bu
 In cases where it is appropriate, the seccomp filter returns the action value
 .BR SECCOMP_RET_USER_NOTIF .

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

  reply	other threads:[~2020-10-14  5:41 UTC|newest]

Thread overview: 105+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-30 11:07 For review: seccomp_user_notif(2) manual page Michael Kerrisk (man-pages)
2020-09-30 11:07 ` Michael Kerrisk (man-pages)
2020-09-30 15:03 ` Tycho Andersen
2020-09-30 15:03   ` Tycho Andersen
2020-09-30 15:11   ` Tycho Andersen
2020-09-30 15:11     ` Tycho Andersen
2020-09-30 20:34   ` Michael Kerrisk (man-pages)
2020-09-30 20:34     ` Michael Kerrisk (man-pages)
2020-09-30 23:03     ` Tycho Andersen
2020-09-30 23:03       ` Tycho Andersen
2020-09-30 23:11       ` Jann Horn via Containers
2020-09-30 23:11         ` Jann Horn
2020-09-30 23:24         ` Tycho Andersen
2020-09-30 23:24           ` Tycho Andersen
2020-10-01  1:52           ` Jann Horn via Containers
2020-10-01  1:52             ` Jann Horn
2020-10-01  2:14             ` Jann Horn via Containers
2020-10-01  2:14               ` Jann Horn
2020-10-25 16:31               ` Michael Kerrisk (man-pages)
2020-10-25 16:31                 ` Michael Kerrisk (man-pages)
2020-10-26 15:54                 ` Jann Horn via Containers
2020-10-26 15:54                   ` Jann Horn
2020-10-27  6:14                   ` Michael Kerrisk (man-pages)
2020-10-27  6:14                     ` Michael Kerrisk (man-pages)
2020-10-27 10:28                     ` Jann Horn via Containers
2020-10-27 10:28                       ` Jann Horn
2020-10-28  6:31                       ` Sargun Dhillon
2020-10-28  6:31                         ` Sargun Dhillon
2020-10-28  9:43                         ` Jann Horn via Containers
2020-10-28  9:43                           ` Jann Horn
2020-10-28 17:43                           ` Sargun Dhillon
2020-10-28 17:43                             ` Sargun Dhillon
2020-10-28 18:20                             ` Jann Horn via Containers
2020-10-28 18:20                               ` Jann Horn
2020-10-01  7:49             ` Michael Kerrisk (man-pages)
2020-10-01  7:49               ` Michael Kerrisk (man-pages)
2020-10-26  0:32             ` Kees Cook
2020-10-26  0:32               ` Kees Cook
2020-10-26  9:51               ` Jann Horn via Containers
2020-10-26  9:51                 ` Jann Horn
2020-10-26 10:31                 ` Jann Horn via Containers
2020-10-26 10:31                   ` Jann Horn
2020-10-28 22:56                   ` Kees Cook
2020-10-28 22:56                     ` Kees Cook
2020-10-29  1:11                     ` Jann Horn via Containers
2020-10-29  1:11                       ` Jann Horn
2020-10-29  2:13                   ` Tycho Andersen
2020-10-29  4:26                     ` Jann Horn via Containers
2020-10-29  4:26                       ` Jann Horn
2020-10-28 22:53                 ` Kees Cook
2020-10-28 22:53                   ` Kees Cook
2020-10-29  1:25                   ` Jann Horn via Containers
2020-10-29  1:25                     ` Jann Horn
2020-10-01  7:45       ` Michael Kerrisk (man-pages)
2020-10-01  7:45         ` Michael Kerrisk (man-pages)
2020-10-14  4:40         ` Michael Kerrisk (man-pages)
2020-10-14  4:40           ` Michael Kerrisk (man-pages)
2020-09-30 15:53 ` Jann Horn via Containers
2020-09-30 15:53   ` Jann Horn
2020-10-01 12:54   ` Christian Brauner
2020-10-01 12:54     ` Christian Brauner
2020-10-01 15:47     ` Jann Horn via Containers
2020-10-01 15:47       ` Jann Horn
2020-10-01 16:58       ` Tycho Andersen
2020-10-01 16:58         ` Tycho Andersen
2020-10-01 17:12         ` Christian Brauner
2020-10-01 17:12           ` Christian Brauner
2020-10-14  5:41           ` Michael Kerrisk (man-pages) [this message]
2020-10-14  5:41             ` Michael Kerrisk (man-pages)
2020-10-01 18:18         ` Jann Horn via Containers
2020-10-01 18:18           ` Jann Horn
2020-10-01 18:56           ` Tycho Andersen
2020-10-01 18:56             ` Tycho Andersen
2020-10-01 17:05       ` Christian Brauner
2020-10-01 17:05         ` Christian Brauner
2020-10-15 11:24   ` Michael Kerrisk (man-pages)
2020-10-15 11:24     ` Michael Kerrisk (man-pages)
2020-10-15 20:32     ` Jann Horn via Containers
2020-10-15 20:32       ` Jann Horn
2020-10-16 18:29       ` Michael Kerrisk (man-pages)
2020-10-16 18:29         ` Michael Kerrisk (man-pages)
2020-10-17  0:25         ` Jann Horn via Containers
2020-10-17  0:25           ` Jann Horn
2020-10-24 12:52           ` Michael Kerrisk (man-pages)
2020-10-24 12:52             ` Michael Kerrisk (man-pages)
2020-10-26  9:32             ` Jann Horn via Containers
2020-10-26  9:32               ` Jann Horn
2020-10-26  9:47               ` Michael Kerrisk (man-pages)
2020-10-26  9:47                 ` Michael Kerrisk (man-pages)
2020-09-30 23:39 ` Kees Cook
2020-09-30 23:39   ` Kees Cook
2020-10-15 11:24   ` Michael Kerrisk (man-pages)
2020-10-15 11:24     ` Michael Kerrisk (man-pages)
2020-10-26  0:19     ` Kees Cook
2020-10-26  0:19       ` Kees Cook
2020-10-26  9:39       ` Michael Kerrisk (man-pages)
2020-10-26  9:39         ` Michael Kerrisk (man-pages)
2020-10-01 12:36 ` Christian Brauner
2020-10-01 12:36   ` Christian Brauner
2020-10-15 11:23   ` Michael Kerrisk (man-pages)
2020-10-15 11:23     ` Michael Kerrisk (man-pages)
2020-10-01 21:06 ` Sargun Dhillon
2020-10-01 21:06   ` Sargun Dhillon
2020-10-01 23:19   ` Tycho Andersen
2020-10-01 23:19     ` Tycho Andersen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3a417df2-6346-601d-568e-29307347e6aa@gmail.com \
    --to=mtk.manpages@gmail.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=christian.brauner@canonical.com \
    --cc=christian@brauner.io \
    --cc=containers@lists.linux-foundation.org \
    --cc=daniel@iogearbox.net \
    --cc=gscrivan@redhat.com \
    --cc=jannh@google.com \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-man@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=rsesek@google.com \
    --cc=songliubraving@fb.com \
    --cc=tycho@tycho.pizza \
    --cc=wad@chromium.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.