linux-man.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
To: Jann Horn <jannh@google.com>, Tycho Andersen <tycho@tycho.pizza>
Cc: mtk.manpages@gmail.com, Sargun Dhillon <sargun@sargun.me>,
	Kees Cook <keescook@chromium.org>,
	Christian Brauner <christian@brauner.io>,
	linux-man <linux-man@vger.kernel.org>,
	lkml <linux-kernel@vger.kernel.org>,
	Aleksa Sarai <cyphar@cyphar.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Will Drewry <wad@chromium.org>, bpf <bpf@vger.kernel.org>,
	Song Liu <songliubraving@fb.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andy Lutomirski <luto@amacapital.net>,
	Linux Containers <containers@lists.linux-foundation.org>,
	Giuseppe Scrivano <gscrivan@redhat.com>,
	Robert Sesek <rsesek@google.com>
Subject: Re: For review: seccomp_user_notif(2) manual page
Date: Sun, 25 Oct 2020 17:31:57 +0100	[thread overview]
Message-ID: <656a37b5-75e3-0ded-6ba8-3bb57b537b24@gmail.com> (raw)
In-Reply-To: <CAG48ez3kpEDO1x_HfvOM2R9M78Ach9O_4+Pjs-vLLfqvZL+13A@mail.gmail.com>

Hi Jann,

On 10/1/20 4:14 AM, Jann Horn wrote:
> On Thu, Oct 1, 2020 at 3:52 AM Jann Horn <jannh@google.com> wrote:
>> On Thu, Oct 1, 2020 at 1:25 AM Tycho Andersen <tycho@tycho.pizza> wrote:
>>> On Thu, Oct 01, 2020 at 01:11:33AM +0200, Jann Horn wrote:
>>>> On Thu, Oct 1, 2020 at 1:03 AM Tycho Andersen <tycho@tycho.pizza> wrote:
>>>>> On Wed, Sep 30, 2020 at 10:34:51PM +0200, Michael Kerrisk (man-pages) wrote:
>>>>>> On 9/30/20 5:03 PM, Tycho Andersen wrote:
>>>>>>> On Wed, Sep 30, 2020 at 01:07:38PM +0200, Michael Kerrisk (man-pages) wrote:
>>>>>>>>        ┌─────────────────────────────────────────────────────┐
>>>>>>>>        │FIXME                                                │
>>>>>>>>        ├─────────────────────────────────────────────────────┤
>>>>>>>>        │From my experiments,  it  appears  that  if  a  SEC‐ │
>>>>>>>>        │COMP_IOCTL_NOTIF_RECV   is  done  after  the  target │
>>>>>>>>        │process terminates, then the ioctl()  simply  blocks │
>>>>>>>>        │(rather than returning an error to indicate that the │
>>>>>>>>        │target process no longer exists).                    │
>>>>>>>
>>>>>>> Yeah, I think Christian wanted to fix this at some point,
>>>>>>
>>>>>> Do you have a pointer that discussion? I could not find it with a
>>>>>> quick search.
>>>>>>
>>>>>>> but it's a
>>>>>>> bit sticky to do.
>>>>>>
>>>>>> Can you say a few words about the nature of the problem?
>>>>>
>>>>> I remembered wrong, it's actually in the tree: 99cdb8b9a573 ("seccomp:
>>>>> notify about unused filter"). So maybe there's a bug here?
>>>>
>>>> That thing only notifies on ->poll, it doesn't unblock ioctls; and
>>>> Michael's sample code uses SECCOMP_IOCTL_NOTIF_RECV to wait. So that
>>>> commit doesn't have any effect on this kind of usage.
>>>
>>> Yes, thanks. And the ones stuck in RECV are waiting on a semaphore so
>>> we don't have a count of all of them, unfortunately.
>>>
>>> We could maybe look inside the wait_list, but that will probably make
>>> people angry :)
>>
>> The easiest way would probably be to open-code the semaphore-ish part,
>> and let the semaphore and poll share the waitqueue. The current code
>> kind of mirrors the semaphore's waitqueue in the wqh - open-coding the
>> entire semaphore would IMO be cleaner than that. And it's not like
>> semaphore semantics are even a good fit for this code anyway.
>>
>> Let's see... if we didn't have the existing UAPI to worry about, I'd
>> do it as follows (*completely* untested). That way, the ioctl would
>> block exactly until either there actually is a request to deliver or
>> there are no more users of the filter. The problem is that if we just
>> apply this patch, existing users of SECCOMP_IOCTL_NOTIF_RECV that use
>> an event loop and don't set O_NONBLOCK will be screwed. So we'd
>> probably also have to add some stupid counter in place of the
>> semaphore's counter that we can use to preserve the old behavior of
>> returning -ENOENT once for each cancelled request. :(
>>
>> I guess this is a nice point in favor of Michael's usual complaint
>> that if there are no man pages for a feature by the time the feature
>> lands upstream, there's a higher chance that the UAPI will suck
>> forever...
> 
> And I guess this would be the UAPI-compatible version - not actually
> as terrible as I thought it might be. Do y'all want this? If so, feel
> free to either turn this into a proper patch with Co-developed-by, or
> tell me that I should do it and I'll try to get around to turning it
> into something proper.

Thanks for taking a shot at this.

I tried applying the patch below to vanilla 5.9.0.
(There's one typo: s/ENOTCON/ENOTCONN).

It seems not to work though; when I send a signal to my test
target process that is sleeping waiting for the notification
response, the process enters the uninterruptible D state.
Any thoughts?

Thanks,

Michael

> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index 676d4af62103..d08c453fcc2c 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -138,7 +138,7 @@ struct seccomp_kaddfd {
>   * @notifications: A list of struct seccomp_knotif elements.
>   */
>  struct notification {
> -       struct semaphore request;
> +       bool canceled_reqs;
>         u64 next_id;
>         struct list_head notifications;
>  };
> @@ -859,7 +859,6 @@ static int seccomp_do_user_notification(int this_syscall,
>         list_add(&n.list, &match->notif->notifications);
>         INIT_LIST_HEAD(&n.addfd);
> 
> -       up(&match->notif->request);
>         wake_up_poll(&match->wqh, EPOLLIN | EPOLLRDNORM);
>         mutex_unlock(&match->notify_lock);
> 
> @@ -901,8 +900,20 @@ static int seccomp_do_user_notification(int this_syscall,
>          * *reattach* to a notifier right now. If one is added, we'll need to
>          * keep track of the notif itself and make sure they match here.
>          */
> -       if (match->notif)
> +       if (match->notif) {
>                 list_del(&n.list);
> +
> +               /*
> +                * We are stuck with a UAPI that requires that after a spurious
> +                * wakeup, SECCOMP_IOCTL_NOTIF_RECV must return immediately.
> +                * This is the tracking for that, keeping track of whether we
> +                * canceled a request after waking waiters, but before userspace
> +                * picked up the notification.
> +                */
> +               if (n.state == SECCOMP_NOTIFY_INIT)
> +                       match->notif->canceled_reqs = true;
> +       }
> +
>  out:
>         mutex_unlock(&match->notify_lock);
> 
> @@ -1178,6 +1189,7 @@ static long seccomp_notify_recv(struct
> seccomp_filter *filter,
>                                 void __user *buf)
>  {
>         struct seccomp_knotif *knotif = NULL, *cur;
> +       DECLARE_WAITQUEUE(wait, current);
>         struct seccomp_notif unotif;
>         ssize_t ret;
> 
> @@ -1190,11 +1202,9 @@ static long seccomp_notify_recv(struct
> seccomp_filter *filter,
> 
>         memset(&unotif, 0, sizeof(unotif));
> 
> -       ret = down_interruptible(&filter->notif->request);
> -       if (ret < 0)
> -               return ret;
> -
>         mutex_lock(&filter->notify_lock);
> +
> +retry:
>         list_for_each_entry(cur, &filter->notif->notifications, list) {
>                 if (cur->state == SECCOMP_NOTIFY_INIT) {
>                         knotif = cur;
> @@ -1202,14 +1212,32 @@ static long seccomp_notify_recv(struct
> seccomp_filter *filter,
>                 }
>         }
> 
> -       /*
> -        * If we didn't find a notification, it could be that the task was
> -        * interrupted by a fatal signal between the time we were woken and
> -        * when we were able to acquire the rw lock.
> -        */
>         if (!knotif) {
> -               ret = -ENOENT;
> -               goto out;
> +               /* This has to happen before checking &filter->users. */
> +               prepare_to_wait(&filter->wqh, &wait, TASK_INTERRUPTIBLE);
> +
> +               /*
> +                * If all users of the filter are gone, throw an error instead
> +                * of pointlessly continuing to block.
> +                */
> +               if (refcount_read(&filter->users) == 0) {
> +                       ret = -ENOTCON;
> +                       goto out;
> +               }
> +               if (filter->notif->canceled_reqs) {
> +                       ret = -ENOENT;
> +                       goto out;
> +               } else {
> +                       /* No notifications pending - wait for one,
> then retry. */
> +                       mutex_unlock(&filter->notify_lock);
> +                       schedule();
> +                       mutex_lock(&filter->notify_lock);
> +                       if (signal_pending(current)) {
> +                               ret = -EINTR;
> +                               goto out;
> +                       }
> +                       goto retry;
> +               }
>         }
> 
>         unotif.id = knotif->id;
> @@ -1220,6 +1248,8 @@ static long seccomp_notify_recv(struct
> seccomp_filter *filter,
>         wake_up_poll(&filter->wqh, EPOLLOUT | EPOLLWRNORM);
>         ret = 0;
>  out:
> +       filter->notif->canceled_reqs = false;
> +       finish_wait(&filter->wqh, &wait);
>         mutex_unlock(&filter->notify_lock);
> 
>         if (ret == 0 && copy_to_user(buf, &unotif, sizeof(unotif))) {
> @@ -1233,10 +1263,8 @@ static long seccomp_notify_recv(struct
> seccomp_filter *filter,
>                  */
>                 mutex_lock(&filter->notify_lock);
>                 knotif = find_notification(filter, unotif.id);
> -               if (knotif) {
> +               if (knotif)
>                         knotif->state = SECCOMP_NOTIFY_INIT;
> -                       up(&filter->notif->request);
> -               }
>                 mutex_unlock(&filter->notify_lock);
>         }
> 
> @@ -1485,7 +1513,6 @@ static struct file *init_listener(struct
> seccomp_filter *filter)
>         if (!filter->notif)
>                 goto out;
> 
> -       sema_init(&filter->notif->request, 0);
>         filter->notif->next_id = get_random_u64();
>         INIT_LIST_HEAD(&filter->notif->notifications);
> 


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

  reply	other threads:[~2020-10-25 16:32 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-30 11:07 Michael Kerrisk (man-pages)
2020-09-30 15:03 ` Tycho Andersen
2020-09-30 15:11   ` Tycho Andersen
2020-09-30 20:34   ` Michael Kerrisk (man-pages)
2020-09-30 23:03     ` Tycho Andersen
2020-09-30 23:11       ` Jann Horn
2020-09-30 23:24         ` Tycho Andersen
2020-10-01  1:52           ` Jann Horn
2020-10-01  2:14             ` Jann Horn
2020-10-25 16:31               ` Michael Kerrisk (man-pages) [this message]
2020-10-26 15:54                 ` Jann Horn
2020-10-27  6:14                   ` Michael Kerrisk (man-pages)
2020-10-27 10:28                     ` Jann Horn
2020-10-28  6:31                       ` Sargun Dhillon
2020-10-28  9:43                         ` Jann Horn
2020-10-28 17:43                           ` Sargun Dhillon
2020-10-28 18:20                             ` Jann Horn
2020-10-01  7:49             ` Michael Kerrisk (man-pages)
2020-10-26  0:32             ` Kees Cook
2020-10-26  9:51               ` Jann Horn
2020-10-26 10:31                 ` Jann Horn
2020-10-28 22:56                   ` Kees Cook
2020-10-29  1:11                     ` Jann Horn
     [not found]                   ` <20201029021348.GB25673@cisco>
2020-10-29  4:26                     ` Jann Horn
2020-10-28 22:53                 ` Kees Cook
2020-10-29  1:25                   ` Jann Horn
2020-10-01  7:45       ` Michael Kerrisk (man-pages)
2020-10-14  4:40         ` Michael Kerrisk (man-pages)
2020-09-30 15:53 ` Jann Horn
2020-10-01 12:54   ` Christian Brauner
2020-10-01 15:47     ` Jann Horn
2020-10-01 16:58       ` Tycho Andersen
2020-10-01 17:12         ` Christian Brauner
2020-10-14  5:41           ` Michael Kerrisk (man-pages)
2020-10-01 18:18         ` Jann Horn
2020-10-01 18:56           ` Tycho Andersen
2020-10-01 17:05       ` Christian Brauner
2020-10-15 11:24   ` Michael Kerrisk (man-pages)
2020-10-15 20:32     ` Jann Horn
2020-10-16 18:29       ` Michael Kerrisk (man-pages)
2020-10-17  0:25         ` Jann Horn
2020-10-24 12:52           ` Michael Kerrisk (man-pages)
2020-10-26  9:32             ` Jann Horn
2020-10-26  9:47               ` Michael Kerrisk (man-pages)
2020-09-30 23:39 ` Kees Cook
2020-10-15 11:24   ` Michael Kerrisk (man-pages)
2020-10-26  0:19     ` Kees Cook
2020-10-26  9:39       ` Michael Kerrisk (man-pages)
2020-10-01 12:36 ` Christian Brauner
2020-10-15 11:23   ` Michael Kerrisk (man-pages)
2020-10-01 21:06 ` Sargun Dhillon
2020-10-01 23:19   ` Tycho Andersen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=656a37b5-75e3-0ded-6ba8-3bb57b537b24@gmail.com \
    --to=mtk.manpages@gmail.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=christian@brauner.io \
    --cc=containers@lists.linux-foundation.org \
    --cc=cyphar@cyphar.com \
    --cc=daniel@iogearbox.net \
    --cc=gscrivan@redhat.com \
    --cc=jannh@google.com \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-man@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=rsesek@google.com \
    --cc=sargun@sargun.me \
    --cc=songliubraving@fb.com \
    --cc=tycho@tycho.pizza \
    --cc=wad@chromium.org \
    --subject='Re: For review: seccomp_user_notif(2) manual page' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).