All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sargun Dhillon <sargun@sargun.me>
To: Jann Horn <jannh@google.com>
Cc: linux-man <linux-man@vger.kernel.org>,
	Song Liu <songliubraving@fb.com>, Will Drewry <wad@chromium.org>,
	Kees Cook <keescook@chromium.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Giuseppe Scrivano <gscrivan@redhat.com>,
	Robert Sesek <rsesek@google.com>,
	Linux Containers <containers@lists.linux-foundation.org>,
	lkml <linux-kernel@vger.kernel.org>,
	Alexei Starovoitov <ast@kernel.org>,
	"Michael Kerrisk \(man-pages\)" <mtk.manpages@gmail.com>,
	bpf <bpf@vger.kernel.org>, Andy Lutomirski <luto@amacapital.net>,
	Christian Brauner <christian@brauner.io>
Subject: Re: For review: seccomp_user_notif(2) manual page
Date: Tue, 27 Oct 2020 23:31:39 -0700	[thread overview]
Message-ID: <CAMp4zn_Qt2MYuoLojn5ikRkr-J5yGimirjevoAvorK5wfzrBHg@mail.gmail.com> (raw)
In-Reply-To: <CAG48ez2Nb95ae+XwZPYRju1KO-Ps_4R6QxN6ioUhOy2Uok=uAg@mail.gmail.com>

On Tue, Oct 27, 2020 at 3:28 AM Jann Horn <jannh@google.com> wrote:
>
> On Tue, Oct 27, 2020 at 7:14 AM Michael Kerrisk (man-pages)
> <mtk.manpages@gmail.com> wrote:
> > On 10/26/20 4:54 PM, Jann Horn wrote:
> > > I'm a bit on the fence now on whether non-blocking mode should use
> > > ENOTCONN or not... I guess if we returned ENOENT even when there are
> > > no more listeners, you'd have to disambiguate through the poll()
> > > revents, which would be kinda ugly?
> >
> > I must confess, I'm not quite clear on which two cases you
> > are trying to distinguish. Can you elaborate?
>
> Let's say someone writes a program whose responsibilities are just to
> handle seccomp events and to listen on some other fd for commands. And
> this is implemented with an event loop. Then once all the target
> processes are gone (including zombie reaping), we'll start getting
> EPOLLERR.
>
> If NOTIF_RECV starts returning -ENOTCONN at this point, the event loop
> can just call into the seccomp logic without any arguments; it can
> just call NOTIF_RECV one more time, see the -ENOTCONN, and terminate.
> The downside is that there's one more error code userspace has to
> special-case.
> This would be more consistent with what we'd be doing in the blocking case.
>
> If NOTIF_RECV keeps returning -ENOENT, the event loop has to also tell
> the seccomp logic what the revents are.
>
> I guess it probably doesn't really matter much.

So, in practice, if you're emulating a blocking syscall (such as open,
perf_event_open, or any of a number of other syscalls), you probably
have to do it on a separate thread in the supervisor because you want
to continue to be able to receive new notifications if any other process
generates a seccomp notification event that you need to handle.

In addition to that, some of these syscalls are preemptible, so you need
to poll SECCOMP_IOCTL_NOTIF_ID_VALID to make sure that the program
under supervision hasn't left the syscall.

If we're to implement a mechanism that makes the seccomp ioctl receive
non-blocking, it would be valuable to address this problem as well (getting
a notification when the supervisor is processing a syscall and needs to
preempt it). In the best case, this can be a minor inconvenience, and
in the worst case this can result in weird errors where you're keeping
resources open that the container expects to be closed.
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

WARNING: multiple messages have this Message-ID (diff)
From: Sargun Dhillon <sargun@sargun.me>
To: Jann Horn <jannh@google.com>
Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>,
	Tycho Andersen <tycho@tycho.pizza>,
	Kees Cook <keescook@chromium.org>,
	Christian Brauner <christian@brauner.io>,
	linux-man <linux-man@vger.kernel.org>,
	lkml <linux-kernel@vger.kernel.org>,
	Aleksa Sarai <cyphar@cyphar.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Will Drewry <wad@chromium.org>, bpf <bpf@vger.kernel.org>,
	Song Liu <songliubraving@fb.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andy Lutomirski <luto@amacapital.net>,
	Linux Containers <containers@lists.linux-foundation.org>,
	Giuseppe Scrivano <gscrivan@redhat.com>,
	Robert Sesek <rsesek@google.com>
Subject: Re: For review: seccomp_user_notif(2) manual page
Date: Tue, 27 Oct 2020 23:31:39 -0700	[thread overview]
Message-ID: <CAMp4zn_Qt2MYuoLojn5ikRkr-J5yGimirjevoAvorK5wfzrBHg@mail.gmail.com> (raw)
In-Reply-To: <CAG48ez2Nb95ae+XwZPYRju1KO-Ps_4R6QxN6ioUhOy2Uok=uAg@mail.gmail.com>

On Tue, Oct 27, 2020 at 3:28 AM Jann Horn <jannh@google.com> wrote:
>
> On Tue, Oct 27, 2020 at 7:14 AM Michael Kerrisk (man-pages)
> <mtk.manpages@gmail.com> wrote:
> > On 10/26/20 4:54 PM, Jann Horn wrote:
> > > I'm a bit on the fence now on whether non-blocking mode should use
> > > ENOTCONN or not... I guess if we returned ENOENT even when there are
> > > no more listeners, you'd have to disambiguate through the poll()
> > > revents, which would be kinda ugly?
> >
> > I must confess, I'm not quite clear on which two cases you
> > are trying to distinguish. Can you elaborate?
>
> Let's say someone writes a program whose responsibilities are just to
> handle seccomp events and to listen on some other fd for commands. And
> this is implemented with an event loop. Then once all the target
> processes are gone (including zombie reaping), we'll start getting
> EPOLLERR.
>
> If NOTIF_RECV starts returning -ENOTCONN at this point, the event loop
> can just call into the seccomp logic without any arguments; it can
> just call NOTIF_RECV one more time, see the -ENOTCONN, and terminate.
> The downside is that there's one more error code userspace has to
> special-case.
> This would be more consistent with what we'd be doing in the blocking case.
>
> If NOTIF_RECV keeps returning -ENOENT, the event loop has to also tell
> the seccomp logic what the revents are.
>
> I guess it probably doesn't really matter much.

So, in practice, if you're emulating a blocking syscall (such as open,
perf_event_open, or any of a number of other syscalls), you probably
have to do it on a separate thread in the supervisor because you want
to continue to be able to receive new notifications if any other process
generates a seccomp notification event that you need to handle.

In addition to that, some of these syscalls are preemptible, so you need
to poll SECCOMP_IOCTL_NOTIF_ID_VALID to make sure that the program
under supervision hasn't left the syscall.

If we're to implement a mechanism that makes the seccomp ioctl receive
non-blocking, it would be valuable to address this problem as well (getting
a notification when the supervisor is processing a syscall and needs to
preempt it). In the best case, this can be a minor inconvenience, and
in the worst case this can result in weird errors where you're keeping
resources open that the container expects to be closed.

  reply	other threads:[~2020-10-28  6:32 UTC|newest]

Thread overview: 105+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-30 11:07 For review: seccomp_user_notif(2) manual page Michael Kerrisk (man-pages)
2020-09-30 11:07 ` Michael Kerrisk (man-pages)
2020-09-30 15:03 ` Tycho Andersen
2020-09-30 15:03   ` Tycho Andersen
2020-09-30 15:11   ` Tycho Andersen
2020-09-30 15:11     ` Tycho Andersen
2020-09-30 20:34   ` Michael Kerrisk (man-pages)
2020-09-30 20:34     ` Michael Kerrisk (man-pages)
2020-09-30 23:03     ` Tycho Andersen
2020-09-30 23:03       ` Tycho Andersen
2020-09-30 23:11       ` Jann Horn via Containers
2020-09-30 23:11         ` Jann Horn
2020-09-30 23:24         ` Tycho Andersen
2020-09-30 23:24           ` Tycho Andersen
2020-10-01  1:52           ` Jann Horn via Containers
2020-10-01  1:52             ` Jann Horn
2020-10-01  2:14             ` Jann Horn via Containers
2020-10-01  2:14               ` Jann Horn
2020-10-25 16:31               ` Michael Kerrisk (man-pages)
2020-10-25 16:31                 ` Michael Kerrisk (man-pages)
2020-10-26 15:54                 ` Jann Horn via Containers
2020-10-26 15:54                   ` Jann Horn
2020-10-27  6:14                   ` Michael Kerrisk (man-pages)
2020-10-27  6:14                     ` Michael Kerrisk (man-pages)
2020-10-27 10:28                     ` Jann Horn via Containers
2020-10-27 10:28                       ` Jann Horn
2020-10-28  6:31                       ` Sargun Dhillon [this message]
2020-10-28  6:31                         ` Sargun Dhillon
2020-10-28  9:43                         ` Jann Horn via Containers
2020-10-28  9:43                           ` Jann Horn
2020-10-28 17:43                           ` Sargun Dhillon
2020-10-28 17:43                             ` Sargun Dhillon
2020-10-28 18:20                             ` Jann Horn via Containers
2020-10-28 18:20                               ` Jann Horn
2020-10-01  7:49             ` Michael Kerrisk (man-pages)
2020-10-01  7:49               ` Michael Kerrisk (man-pages)
2020-10-26  0:32             ` Kees Cook
2020-10-26  0:32               ` Kees Cook
2020-10-26  9:51               ` Jann Horn via Containers
2020-10-26  9:51                 ` Jann Horn
2020-10-26 10:31                 ` Jann Horn via Containers
2020-10-26 10:31                   ` Jann Horn
2020-10-28 22:56                   ` Kees Cook
2020-10-28 22:56                     ` Kees Cook
2020-10-29  1:11                     ` Jann Horn via Containers
2020-10-29  1:11                       ` Jann Horn
2020-10-29  2:13                   ` Tycho Andersen
2020-10-29  4:26                     ` Jann Horn via Containers
2020-10-29  4:26                       ` Jann Horn
2020-10-28 22:53                 ` Kees Cook
2020-10-28 22:53                   ` Kees Cook
2020-10-29  1:25                   ` Jann Horn via Containers
2020-10-29  1:25                     ` Jann Horn
2020-10-01  7:45       ` Michael Kerrisk (man-pages)
2020-10-01  7:45         ` Michael Kerrisk (man-pages)
2020-10-14  4:40         ` Michael Kerrisk (man-pages)
2020-10-14  4:40           ` Michael Kerrisk (man-pages)
2020-09-30 15:53 ` Jann Horn via Containers
2020-09-30 15:53   ` Jann Horn
2020-10-01 12:54   ` Christian Brauner
2020-10-01 12:54     ` Christian Brauner
2020-10-01 15:47     ` Jann Horn via Containers
2020-10-01 15:47       ` Jann Horn
2020-10-01 16:58       ` Tycho Andersen
2020-10-01 16:58         ` Tycho Andersen
2020-10-01 17:12         ` Christian Brauner
2020-10-01 17:12           ` Christian Brauner
2020-10-14  5:41           ` Michael Kerrisk (man-pages)
2020-10-14  5:41             ` Michael Kerrisk (man-pages)
2020-10-01 18:18         ` Jann Horn via Containers
2020-10-01 18:18           ` Jann Horn
2020-10-01 18:56           ` Tycho Andersen
2020-10-01 18:56             ` Tycho Andersen
2020-10-01 17:05       ` Christian Brauner
2020-10-01 17:05         ` Christian Brauner
2020-10-15 11:24   ` Michael Kerrisk (man-pages)
2020-10-15 11:24     ` Michael Kerrisk (man-pages)
2020-10-15 20:32     ` Jann Horn via Containers
2020-10-15 20:32       ` Jann Horn
2020-10-16 18:29       ` Michael Kerrisk (man-pages)
2020-10-16 18:29         ` Michael Kerrisk (man-pages)
2020-10-17  0:25         ` Jann Horn via Containers
2020-10-17  0:25           ` Jann Horn
2020-10-24 12:52           ` Michael Kerrisk (man-pages)
2020-10-24 12:52             ` Michael Kerrisk (man-pages)
2020-10-26  9:32             ` Jann Horn via Containers
2020-10-26  9:32               ` Jann Horn
2020-10-26  9:47               ` Michael Kerrisk (man-pages)
2020-10-26  9:47                 ` Michael Kerrisk (man-pages)
2020-09-30 23:39 ` Kees Cook
2020-09-30 23:39   ` Kees Cook
2020-10-15 11:24   ` Michael Kerrisk (man-pages)
2020-10-15 11:24     ` Michael Kerrisk (man-pages)
2020-10-26  0:19     ` Kees Cook
2020-10-26  0:19       ` Kees Cook
2020-10-26  9:39       ` Michael Kerrisk (man-pages)
2020-10-26  9:39         ` Michael Kerrisk (man-pages)
2020-10-01 12:36 ` Christian Brauner
2020-10-01 12:36   ` Christian Brauner
2020-10-15 11:23   ` Michael Kerrisk (man-pages)
2020-10-15 11:23     ` Michael Kerrisk (man-pages)
2020-10-01 21:06 ` Sargun Dhillon
2020-10-01 21:06   ` Sargun Dhillon
2020-10-01 23:19   ` Tycho Andersen
2020-10-01 23:19     ` Tycho Andersen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAMp4zn_Qt2MYuoLojn5ikRkr-J5yGimirjevoAvorK5wfzrBHg@mail.gmail.com \
    --to=sargun@sargun.me \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=christian@brauner.io \
    --cc=containers@lists.linux-foundation.org \
    --cc=daniel@iogearbox.net \
    --cc=gscrivan@redhat.com \
    --cc=jannh@google.com \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-man@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=mtk.manpages@gmail.com \
    --cc=rsesek@google.com \
    --cc=songliubraving@fb.com \
    --cc=wad@chromium.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.