bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jann Horn <jannh@google.com>
To: Kees Cook <keescook@chromium.org>
Cc: Tycho Andersen <tycho@tycho.pizza>,
	"Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>,
	Sargun Dhillon <sargun@sargun.me>,
	Christian Brauner <christian@brauner.io>,
	linux-man <linux-man@vger.kernel.org>,
	lkml <linux-kernel@vger.kernel.org>,
	Aleksa Sarai <cyphar@cyphar.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Will Drewry <wad@chromium.org>, bpf <bpf@vger.kernel.org>,
	Song Liu <songliubraving@fb.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andy Lutomirski <luto@amacapital.net>,
	Linux Containers <containers@lists.linux-foundation.org>,
	Giuseppe Scrivano <gscrivan@redhat.com>,
	Robert Sesek <rsesek@google.com>
Subject: Re: For review: seccomp_user_notif(2) manual page
Date: Mon, 26 Oct 2020 11:31:01 +0100	[thread overview]
Message-ID: <CAG48ez2OWhpH3HHUJSrAmokJ8=SVwKrmQMSw0gEbTJmKE4myCw@mail.gmail.com> (raw)
In-Reply-To: <CAG48ez2b-fnsp8YAR=H5uRMT4bBTid_hyU4m6KavHxDko1Efog@mail.gmail.com>

On Mon, Oct 26, 2020 at 10:51 AM Jann Horn <jannh@google.com> wrote:
> On Mon, Oct 26, 2020 at 1:32 AM Kees Cook <keescook@chromium.org> wrote:
> > On Thu, Oct 01, 2020 at 03:52:02AM +0200, Jann Horn wrote:
> > > On Thu, Oct 1, 2020 at 1:25 AM Tycho Andersen <tycho@tycho.pizza> wrote:
> > > > On Thu, Oct 01, 2020 at 01:11:33AM +0200, Jann Horn wrote:
> > > > > On Thu, Oct 1, 2020 at 1:03 AM Tycho Andersen <tycho@tycho.pizza> wrote:
> > > > > > On Wed, Sep 30, 2020 at 10:34:51PM +0200, Michael Kerrisk (man-pages) wrote:
> > > > > > > On 9/30/20 5:03 PM, Tycho Andersen wrote:
> > > > > > > > On Wed, Sep 30, 2020 at 01:07:38PM +0200, Michael Kerrisk (man-pages) wrote:
> > > > > > > >>        ┌─────────────────────────────────────────────────────┐
> > > > > > > >>        │FIXME                                                │
> > > > > > > >>        ├─────────────────────────────────────────────────────┤
> > > > > > > >>        │From my experiments,  it  appears  that  if  a  SEC‐ │
> > > > > > > >>        │COMP_IOCTL_NOTIF_RECV   is  done  after  the  target │
> > > > > > > >>        │process terminates, then the ioctl()  simply  blocks │
> > > > > > > >>        │(rather than returning an error to indicate that the │
> > > > > > > >>        │target process no longer exists).                    │
> > > > > > > >
> > > > > > > > Yeah, I think Christian wanted to fix this at some point,
> > > > > > >
> > > > > > > Do you have a pointer that discussion? I could not find it with a
> > > > > > > quick search.
> > > > > > >
> > > > > > > > but it's a
> > > > > > > > bit sticky to do.
> > > > > > >
> > > > > > > Can you say a few words about the nature of the problem?
> > > > > >
> > > > > > I remembered wrong, it's actually in the tree: 99cdb8b9a573 ("seccomp:
> > > > > > notify about unused filter"). So maybe there's a bug here?
> > > > >
> > > > > That thing only notifies on ->poll, it doesn't unblock ioctls; and
> > > > > Michael's sample code uses SECCOMP_IOCTL_NOTIF_RECV to wait. So that
> > > > > commit doesn't have any effect on this kind of usage.
> > > >
> > > > Yes, thanks. And the ones stuck in RECV are waiting on a semaphore so
> > > > we don't have a count of all of them, unfortunately.
> > > >
> > > > We could maybe look inside the wait_list, but that will probably make
> > > > people angry :)
> > >
> > > The easiest way would probably be to open-code the semaphore-ish part,
> > > and let the semaphore and poll share the waitqueue. The current code
> > > kind of mirrors the semaphore's waitqueue in the wqh - open-coding the
> > > entire semaphore would IMO be cleaner than that. And it's not like
> > > semaphore semantics are even a good fit for this code anyway.
> > >
> > > Let's see... if we didn't have the existing UAPI to worry about, I'd
> > > do it as follows (*completely* untested). That way, the ioctl would
> > > block exactly until either there actually is a request to deliver or
> > > there are no more users of the filter. The problem is that if we just
> > > apply this patch, existing users of SECCOMP_IOCTL_NOTIF_RECV that use
> > > an event loop and don't set O_NONBLOCK will be screwed. So we'd
> >
> > Wait, why? Do you mean a ioctl calling loop (rather than a poll event
> > loop)?
>
> No, I'm talking about poll event loops.
>
> > I think poll would be fine, but a "try calling RECV and expect to
> > return ENOENT" loop would change. But I don't think anyone would do this
> > exactly because it _currently_ acts like O_NONBLOCK, yes?
> >
> > > probably also have to add some stupid counter in place of the
> > > semaphore's counter that we can use to preserve the old behavior of
> > > returning -ENOENT once for each cancelled request. :(
> >
> > I only see this in Debian Code Search:
> > https://sources.debian.org/src/crun/0.15+dfsg-1/src/libcrun/seccomp_notify.c/?hl=166#L166
> > which is using epoll_wait():
> > https://sources.debian.org/src/crun/0.15+dfsg-1/src/libcrun/container.c/?hl=1326#L1326
> >
> > I expect LXC is using it. :)
>
> The problem is the scenario where a process is interrupted while it's
> waiting for the supervisor to reply.
>
> Consider the following scenario (with supervisor "S" and target "T"; S
> wants to wait for events on two file descriptors seccomp_fd and
> other_fd):
>
> S: starts poll() to wait for events on seccomp_fd and other_fd
> T: performs a syscall that's filtered with RET_USER_NOTIF
> S: poll() returns and signals readiness of seccomp_fd
> T: receives signal SIGUSR1
> T: syscall aborts, enters signal handler
> T: signal handler blocks on unfiltered syscall (e.g. write())
> S: starts SECCOMP_IOCTL_NOTIF_RECV
> S: blocks because no syscalls are pending
>
> Depending on what other_fd is, this could in a worst case even lead to
> a deadlock (if e.g. the signal handler wants to write to stdout, but
> the stdout fd is hooked up to other_fd in the supervisor, but the
> supervisor can't consume the data written because it's stuck in
> seccomp handling).
>
> So we have to ensure that when existing code (like that crun code you
> linked to) triggers this case, SECCOMP_IOCTL_NOTIF_RECV returns
> immediately instead of blocking.

Or I guess we could also just set O_NONBLOCK on the fd by default?
Since the one existing user is eventloop-based...

  reply	other threads:[~2020-10-26 10:33 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-30 11:07 For review: seccomp_user_notif(2) manual page Michael Kerrisk (man-pages)
2020-09-30 15:03 ` Tycho Andersen
2020-09-30 15:11   ` Tycho Andersen
2020-09-30 20:34   ` Michael Kerrisk (man-pages)
2020-09-30 23:03     ` Tycho Andersen
2020-09-30 23:11       ` Jann Horn
2020-09-30 23:24         ` Tycho Andersen
2020-10-01  1:52           ` Jann Horn
2020-10-01  2:14             ` Jann Horn
2020-10-25 16:31               ` Michael Kerrisk (man-pages)
2020-10-26 15:54                 ` Jann Horn
2020-10-27  6:14                   ` Michael Kerrisk (man-pages)
2020-10-27 10:28                     ` Jann Horn
2020-10-28  6:31                       ` Sargun Dhillon
2020-10-28  9:43                         ` Jann Horn
2020-10-28 17:43                           ` Sargun Dhillon
2020-10-28 18:20                             ` Jann Horn
2020-10-01  7:49             ` Michael Kerrisk (man-pages)
2020-10-26  0:32             ` Kees Cook
2020-10-26  9:51               ` Jann Horn
2020-10-26 10:31                 ` Jann Horn [this message]
2020-10-28 22:56                   ` Kees Cook
2020-10-29  1:11                     ` Jann Horn
     [not found]                   ` <20201029021348.GB25673@cisco>
2020-10-29  4:26                     ` Jann Horn
2020-10-28 22:53                 ` Kees Cook
2020-10-29  1:25                   ` Jann Horn
2020-10-01  7:45       ` Michael Kerrisk (man-pages)
2020-10-14  4:40         ` Michael Kerrisk (man-pages)
2020-09-30 15:53 ` Jann Horn
2020-10-01 12:54   ` Christian Brauner
2020-10-01 15:47     ` Jann Horn
2020-10-01 16:58       ` Tycho Andersen
2020-10-01 17:12         ` Christian Brauner
2020-10-14  5:41           ` Michael Kerrisk (man-pages)
2020-10-01 18:18         ` Jann Horn
2020-10-01 18:56           ` Tycho Andersen
2020-10-01 17:05       ` Christian Brauner
2020-10-15 11:24   ` Michael Kerrisk (man-pages)
2020-10-15 20:32     ` Jann Horn
2020-10-16 18:29       ` Michael Kerrisk (man-pages)
2020-10-17  0:25         ` Jann Horn
2020-10-24 12:52           ` Michael Kerrisk (man-pages)
2020-10-26  9:32             ` Jann Horn
2020-10-26  9:47               ` Michael Kerrisk (man-pages)
2020-09-30 23:39 ` Kees Cook
2020-10-15 11:24   ` Michael Kerrisk (man-pages)
2020-10-26  0:19     ` Kees Cook
2020-10-26  9:39       ` Michael Kerrisk (man-pages)
2020-10-01 12:36 ` Christian Brauner
2020-10-15 11:23   ` Michael Kerrisk (man-pages)
2020-10-01 21:06 ` Sargun Dhillon
2020-10-01 23:19   ` Tycho Andersen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAG48ez2OWhpH3HHUJSrAmokJ8=SVwKrmQMSw0gEbTJmKE4myCw@mail.gmail.com' \
    --to=jannh@google.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=christian@brauner.io \
    --cc=containers@lists.linux-foundation.org \
    --cc=cyphar@cyphar.com \
    --cc=daniel@iogearbox.net \
    --cc=gscrivan@redhat.com \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-man@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=mtk.manpages@gmail.com \
    --cc=rsesek@google.com \
    --cc=sargun@sargun.me \
    --cc=songliubraving@fb.com \
    --cc=tycho@tycho.pizza \
    --cc=wad@chromium.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).