linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sargun Dhillon <sargun@sargun.me>
To: Rodrigo Campos <rodrigo@kinvolk.io>
Cc: Kees Cook <keescook@chromium.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Linux Containers <containers@lists.linux-foundation.org>,
	Christian Brauner <christian.brauner@ubuntu.com>,
	Giuseppe Scrivano <gscrivan@redhat.com>,
	Will Drewry <wad@chromium.org>,
	Andy Lutomirski <luto@amacapital.net>,
	Alban Crequy <alban@kinvolk.io>
Subject: Re: [PATCH v3 1/2] seccomp: Add wait_killable semantic to seccomp user notifier
Date: Fri, 29 Apr 2022 17:14:37 +0000	[thread overview]
Message-ID: <20220429171437.GA1267404@ircssh-3.c.rugged-nimbus-611.internal> (raw)
In-Reply-To: <CACaBj2bW8XkENHoLNXEprQ_d8=_aLT_XQdjCZtSOiPJis8G_pQ@mail.gmail.com>

On Fri, Apr 29, 2022 at 11:42:15AM +0200, Rodrigo Campos wrote:
> On Fri, Apr 29, 2022 at 4:32 AM Sargun Dhillon <sargun@sargun.me> wrote:
> > the concept is searchable. If the notifying process is signaled prior
> > to the notification being received by the userspace agent, it will
> > be handled as normal.
> 
> Why is that? Why not always handle in the same way (if wait killable
> is set, wait like that)
> 

The goal is to avoid two things:
1. Unncessary work - Often times, we see workloads that implement techniques
   like hedging (Also known as request racing[1]). In fact, RFC3484
   (destination address selection) gets implemented where the DNS library
   will connect to many backend addresses and whichever one comes back first
   "wins".
2. Side effects - We don't want a situation where a syscall is in progress
   that is non-trivial to rollback (mount), and from user space's perspective
   this syscall never completed.

Blocking before the syscall even starts is excessive. When we looked at this
we found that with runtimes like Golang, they can get into a bad situation
if they have many (1000s) of threads that are in the middle of a syscall
because all of them need to elide prior to GC. In this case the runtime
prioritizes the liveness of GC vs. the syscalls.

That being said, there may be some syscalls in a filter that need the suggested 
behaviour. I can imagine introducing a new flag
(say SECCOMP_FILTER_FLAG_WAIT_KILLABLE) that applies to all states.
Alternatively, in one implementation, I put the behaviour in the data
field of the return from the BPF filter.


> > diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> > index db10e73d06e0..9291b0843cb2 100644
> > --- a/kernel/seccomp.c
> > +++ b/kernel/seccomp.c
> > @@ -1081,6 +1088,12 @@ static void seccomp_handle_addfd(struct seccomp_kaddfd *addfd, struct seccomp_kn
> >         complete(&addfd->completion);
> >  }
> >
> > +static bool should_sleep_killable(struct seccomp_filter *match,
> > +                                 struct seccomp_knotif *n)
> > +{
> > +       return match->wait_killable_recv && n->state == SECCOMP_NOTIFY_SENT;
> 
> Here for some reason we check the notification state to be SENT.
> 
Because we don't want to block unless the notification has been received
by userspace.

> > +}
> > +
> >  static int seccomp_do_user_notification(int this_syscall,
> >                                         struct seccomp_filter *match,
> >                                         const struct seccomp_data *sd)
> > @@ -1111,11 +1124,25 @@ static int seccomp_do_user_notification(int this_syscall,
> >          * This is where we wait for a reply from userspace.
> >          */
> >         do {
> > +               bool wait_killable = should_sleep_killable(match, &n);
> > +
> 
> So here, the first time this runs this will be false even if the
> wait_killable flag was used in the filter (because that function
> checks the notification state to be sent, that is not true the first
> time)
> 
> Why not just do wait_for_completion_killable if match->wait_killable
> and wait_for_completion_interruptible otherwise? Am I missing
> something?
Again, this is to allow for the notification to be able to be preempted
prior to being received by the supervisor.

> 
> 
> 
> Best,
> Rodrigo
[1]: https://research.google/pubs/pub40801/

  reply	other threads:[~2022-04-29 17:14 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-29  2:31 [PATCH v3 0/2] Handle seccomp notification preemption Sargun Dhillon
2022-04-29  2:31 ` [PATCH v3 1/2] seccomp: Add wait_killable semantic to seccomp user notifier Sargun Dhillon
2022-04-29  9:42   ` Rodrigo Campos
2022-04-29 17:14     ` Sargun Dhillon [this message]
2022-04-29 18:20       ` Kees Cook
2022-05-02 12:48         ` Rodrigo Campos
2022-04-29 18:22   ` Kees Cook
2022-05-02 14:15   ` Rodrigo Campos
2022-05-02 16:04     ` Sargun Dhillon
2022-05-03 14:27       ` Rodrigo Campos
2022-04-29  2:31 ` [PATCH v3 2/2] selftests/seccomp: Add test for wait killable notifier Sargun Dhillon
2022-04-29 18:19   ` Kees Cook
2022-04-29 22:35     ` Sargun Dhillon
2022-04-29 22:43       ` Kees Cook
2022-04-29  9:24 ` [PATCH v3 0/2] Handle seccomp notification preemption Rodrigo Campos

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220429171437.GA1267404@ircssh-3.c.rugged-nimbus-611.internal \
    --to=sargun@sargun.me \
    --cc=alban@kinvolk.io \
    --cc=christian.brauner@ubuntu.com \
    --cc=containers@lists.linux-foundation.org \
    --cc=gscrivan@redhat.com \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=rodrigo@kinvolk.io \
    --cc=wad@chromium.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).