linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sargun Dhillon <sargun@sargun.me>
To: Tycho Andersen <tycho@tycho.pizza>
Cc: "Andy Lutomirski" <luto@kernel.org>,
	"Kees Cook" <keescook@chromium.org>,
	LKML <linux-kernel@vger.kernel.org>,
	"Linux Containers" <containers@lists.linux-foundation.org>,
	"Rodrigo Campos" <rodrigo@kinvolk.io>,
	"Christian Brauner" <christian.brauner@ubuntu.com>,
	"Mauricio Vásquez Bernal" <mauricio@kinvolk.io>,
	"Giuseppe Scrivano" <gscrivan@redhat.com>,
	"Will Drewry" <wad@chromium.org>,
	"Alban Crequy" <alban@kinvolk.io>
Subject: Re: [PATCH RESEND 2/5] seccomp: Add wait_killable semantic to seccomp user notifier
Date: Tue, 27 Apr 2021 22:10:29 +0000	[thread overview]
Message-ID: <20210427221028.GA16602@ircssh-2.c.rugged-nimbus-611.internal> (raw)
In-Reply-To: <20210427170753.GA1786245@cisco>

On Tue, Apr 27, 2021 at 11:07:53AM -0600, Tycho Andersen wrote:
> On Tue, Apr 27, 2021 at 09:23:42AM -0700, Andy Lutomirski wrote:
> > On Tue, Apr 27, 2021 at 6:48 AM Tycho Andersen <tycho@tycho.pizza> wrote:
> > >
> > > On Mon, Apr 26, 2021 at 10:15:28PM +0000, Sargun Dhillon wrote:
> > > > On Mon, Apr 26, 2021 at 01:02:29PM -0600, Tycho Andersen wrote:
> > > > > On Mon, Apr 26, 2021 at 11:06:07AM -0700, Sargun Dhillon wrote:
> > > > > > @@ -1103,11 +1111,31 @@ static int seccomp_do_user_notification(int this_syscall,
> > > > > >    * This is where we wait for a reply from userspace.
> > > > > >    */
> > > > > >   do {
> > > > > > +         interruptible = notification_interruptible(&n);
> > > > > > +
> > > > > >           mutex_unlock(&match->notify_lock);
> > > > > > -         err = wait_for_completion_interruptible(&n.ready);
> > > > > > +         if (interruptible)
> > > > > > +                 err = wait_for_completion_interruptible(&n.ready);
> > > > > > +         else
> > > > > > +                 err = wait_for_completion_killable(&n.ready);
> > > > > >           mutex_lock(&match->notify_lock);
> > > > > > -         if (err != 0)
> > > > > > +
> > > > > > +         if (err != 0) {
> > > > > > +                 /*
> > > > > > +                  * There is a race condition here where if the
> > > > > > +                  * notification was received with the
> > > > > > +                  * SECCOMP_USER_NOTIF_FLAG_WAIT_KILLABLE flag, but a
> > > > > > +                  * non-fatal signal was received before we could
> > > > > > +                  * transition we could erroneously end our wait early.
> > > > > > +                  *
> > > > > > +                  * The next wait for completion will ensure the signal
> > > > > > +                  * was not fatal.
> > > > > > +                  */
> > > > > > +                 if (interruptible && !notification_interruptible(&n))
> > > > > > +                         continue;
> > > > >
> > > > > I'm trying to understand how one would hit this race,
> > > > >
> > > >
> > > > I'm thinking:
> > > > P: Process that "generates" notification
> > > > S: Supervisor
> > > > U: User
> > > >
> > > > P: Generated notification
> > > > S: ioctl(RECV...) // With wait_killable flag.
> > > > ...complete is called in the supervisor, but the P may not be woken up...
> > > > U: kill -SIGTERM $P
> > > > ...signal gets delivered to p and causes wakeup and
> > > > wait_for_completion_interruptible returns 1...
> > > >
> > > > Then you need to check the race
> > >
> > > I see, thanks. This seems like a consequence of having the flag be
> > > per-RECV-call vs. per-filter. Seems like it might be simpler to have
> > > it be per-filter?
> > >
I agree that it is hard / impossible to guarantee correctness *after* the fact.
> > 
> > Backing up a minute, how is the current behavior not a serious
> > correctness issue?  I can think of two scenarios that seem entirely
> > broken right now:
> > 
> > 1. Process makes a syscall that is not permitted to return -EINTR.  It
> > gets a signal and returns -EINTR when user notifiers are in use.
> >
Yes, there's a whole host of problems here. Things like fsmount should not
be interruptible.
 
> > 2. Process makes a syscall that is permitted to return -EINTR.  But
> > -EINTR for IO means "I got interrupted and *did not do the IO*".
> > Nevertheless, the syscall returns -EINTR and the IO is done.
In general, I think that the idea is to do as little side-effect I/O
as possible. The use cases we've looked at all have nice ways to unwind
them (perf_event_open, BPF, accept), but others are less good for unwinding
(mount). There are some middle ground calls like connect, but they're
less bad.

> > 
> > ISTM the current behavior is severely broken, and the new behavior
> > isn't *that* much better since it simply ignores signals and can't
> > emulate -EINTR (or all the various restart modes, sigh).  Surely the
> > right behavior is to have the seccomped process notice that it got a
> > signal and inform the monitor of that fact so that the monitor can
> > take appropriate action.
> 
> This doesn't help your case (2) though, since the IO could be done
> before the supervisor gets the notification.
> 
I think for something like mount, if it fails (gets interrupted) via a
fatal signal, that's grounds for terminating the container.

> > IOW, I don't think that the current behavior *or* the patched opt-in
> > behavior is great.  I think we would do better to have the filter
> > indicate that it is signal-aware and to document that non-signal-aware
> > filters cannot behave correctly with respect to signals.
> 
> I think it would be hard to make a signal-aware filter, it really does
> feel like the only thing to do is a killable wait.
> 
> Tycho

There are plenty of scenarios where the syscall can be handled in an interruptible
fashion. I like to use accept as an example. I think Jann Horn had put together
a patchset on how the supervisor could be notified (as opposed to background
polling). If the call is interrupted, you can just "finish" the accept on restart
of the sycall by handing the FD over.

I see a handful of paths forward:

* We add a new action USER_NOTIF_KILLABLE which requires a fatal signal
  in order to be interrupted
* We add a chunk of data to the USER_NOTIF return code (say, WAIT_KILLABLE)
  from the BPF filter that indicates what kind of wait should happen
* (what is happening now) An ioctl flag to say pickup the notification
  and put it into a wait_killable state
* An ioctl "command" that puts an existing notifcation in progress into
  the wait killable state.

  reply	other threads:[~2021-04-27 22:11 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-26 18:06 [PATCH RESEND 0/5] Handle seccomp notification preemption Sargun Dhillon
2021-04-26 18:06 ` [PATCH RESEND 1/5] seccomp: Refactor notification handler to prepare for new semantics Sargun Dhillon
2021-04-26 18:06 ` [PATCH RESEND 2/5] seccomp: Add wait_killable semantic to seccomp user notifier Sargun Dhillon
2021-04-26 19:02   ` Tycho Andersen
2021-04-26 22:15     ` Sargun Dhillon
2021-04-27 13:48       ` Tycho Andersen
2021-04-27 16:23         ` Andy Lutomirski
2021-04-27 17:07           ` Tycho Andersen
2021-04-27 22:10             ` Sargun Dhillon [this message]
2021-04-27 23:19               ` Andy Lutomirski
2021-04-28  0:22                 ` Tycho Andersen
2021-04-28 11:10                   ` Rodrigo Campos
2021-04-28 13:20                     ` Rodrigo Campos
2021-04-28 14:08                       ` Tycho Andersen
2021-04-28 17:13                         ` Sargun Dhillon
2021-04-28  3:20                 ` Sargun Dhillon
2021-04-27 16:34         ` Sargun Dhillon
2021-04-26 18:06 ` [PATCH RESEND 3/5] selftests/seccomp: Add test for wait killable notifier Sargun Dhillon
2021-04-26 18:51   ` Tycho Andersen
2021-04-26 18:06 ` [PATCH RESEND 4/5] seccomp: Support atomic "addfd + send reply" Sargun Dhillon
2021-04-26 18:06 ` [PATCH RESEND 5/5] selftests/seccomp: Add test for atomic addfd+send Sargun Dhillon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210427221028.GA16602@ircssh-2.c.rugged-nimbus-611.internal \
    --to=sargun@sargun.me \
    --cc=alban@kinvolk.io \
    --cc=christian.brauner@ubuntu.com \
    --cc=containers@lists.linux-foundation.org \
    --cc=gscrivan@redhat.com \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mauricio@kinvolk.io \
    --cc=rodrigo@kinvolk.io \
    --cc=tycho@tycho.pizza \
    --cc=wad@chromium.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).