All of lore.kernel.org
 help / color / mirror / Atom feed
From: Suren Baghdasaryan <surenb@google.com>
To: "Serge E. Hallyn" <serge@hallyn.com>
Cc: "Daniel Colascione" <dancol@google.com>,
	"Christian Brauner" <christian@brauner.io>,
	"Joel Fernandes" <joel@joelfernandes.org>,
	"Steven Rostedt" <rostedt@goodmis.org>,
	"Sultan Alsawaf" <sultan@kerneltoast.com>,
	"Tim Murray" <timmurray@google.com>,
	"Michal Hocko" <mhocko@kernel.org>,
	"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	"Arve Hjønnevåg" <arve@android.com>,
	"Todd Kjos" <tkjos@android.com>,
	"Martijn Coenen" <maco@android.com>,
	"Ingo Molnar" <mingo@redhat.com>,
	"Peter Zijlstra" <peterz@infradead.org>,
	LKML <linux-kernel@vger.kernel.org>,
	"open list:ANDROID DRIVERS" <devel@driverdev.osuosl.org>,
	linux-mm <linux-mm@kvack.org>,
	kernel-team <kernel-team@android.com>,
	"Oleg Nesterov" <oleg@redhat.com>,
	"Andy Lutomirski" <luto@amacapital.net>
Subject: Re: [RFC] simple_lmk: Introduce Simple Low Memory Killer for Android
Date: Sun, 17 Mar 2019 15:02:35 -0700	[thread overview]
Message-ID: <CAJuCfpF8m5kTMXu=G4qBcvSbrpFH91GmS43VHuphEa3hDxOq+Q@mail.gmail.com> (raw)
In-Reply-To: <20190317171652.GA10567@mail.hallyn.com>

On Sun, Mar 17, 2019 at 10:16 AM Serge E. Hallyn <serge@hallyn.com> wrote:
>
> On Sun, Mar 17, 2019 at 10:11:10AM -0700, Daniel Colascione wrote:
> > On Sun, Mar 17, 2019 at 9:35 AM Serge E. Hallyn <serge@hallyn.com> wrote:
> > >
> > > On Sun, Mar 17, 2019 at 12:42:40PM +0100, Christian Brauner wrote:
> > > > On Sat, Mar 16, 2019 at 09:53:06PM -0400, Joel Fernandes wrote:
> > > > > On Sat, Mar 16, 2019 at 12:37:18PM -0700, Suren Baghdasaryan wrote:
> > > > > > On Sat, Mar 16, 2019 at 11:57 AM Christian Brauner <christian@brauner.io> wrote:
> > > > > > >
> > > > > > > On Sat, Mar 16, 2019 at 11:00:10AM -0700, Daniel Colascione wrote:
> > > > > > > > On Sat, Mar 16, 2019 at 10:31 AM Suren Baghdasaryan <surenb@google.com> wrote:
> > > > > > > > >
> > > > > > > > > On Fri, Mar 15, 2019 at 11:49 AM Joel Fernandes <joel@joelfernandes.org> wrote:
> > > > > > > > > >
> > > > > > > > > > On Fri, Mar 15, 2019 at 07:24:28PM +0100, Christian Brauner wrote:
> > > > > > > > > > [..]
> > > > > > > > > > > > why do we want to add a new syscall (pidfd_wait) though? Why not just use
> > > > > > > > > > > > standard poll/epoll interface on the proc fd like Daniel was suggesting.
> > > > > > > > > > > > AFAIK, once the proc file is opened, the struct pid is essentially pinned
> > > > > > > > > > > > even though the proc number may be reused. Then the caller can just poll.
> > > > > > > > > > > > We can add a waitqueue to struct pid, and wake up any waiters on process
> > > > > > > > > > > > death (A quick look shows task_struct can be mapped to its struct pid) and
> > > > > > > > > > > > also possibly optimize it using Steve's TIF flag idea. No new syscall is
> > > > > > > > > > > > needed then, let me know if I missed something?
> > > > > > > > > > >
> > > > > > > > > > > Huh, I thought that Daniel was against the poll/epoll solution?
> > > > > > > > > >
> > > > > > > > > > Hmm, going through earlier threads, I believe so now. Here was Daniel's
> > > > > > > > > > reasoning about avoiding a notification about process death through proc
> > > > > > > > > > directory fd: http://lkml.iu.edu/hypermail/linux/kernel/1811.0/00232.html
> > > > > > > > > >
> > > > > > > > > > May be a dedicated syscall for this would be cleaner after all.
> > > > > > > > >
> > > > > > > > > Ah, I wish I've seen that discussion before...
> > > > > > > > > syscall makes sense and it can be non-blocking and we can use
> > > > > > > > > select/poll/epoll if we use eventfd.
> > > > > > > >
> > > > > > > > Thanks for taking a look.
> > > > > > > >
> > > > > > > > > I would strongly advocate for
> > > > > > > > > non-blocking version or at least to have a non-blocking option.
> > > > > > > >
> > > > > > > > Waiting for FD readiness is *already* blocking or non-blocking
> > > > > > > > according to the caller's desire --- users can pass options they want
> > > > > > > > to poll(2) or whatever. There's no need for any kind of special
> > > > > > > > configuration knob or non-blocking option. We already *have* a
> > > > > > > > non-blocking option that works universally for everything.
> > > > > > > >
> > > > > > > > As I mentioned in the linked thread, waiting for process exit should
> > > > > > > > work just like waiting for bytes to appear on a pipe. Process exit
> > > > > > > > status is just another blob of bytes that a process might receive. A
> > > > > > > > process exit handle ought to be just another information source. The
> > > > > > > > reason the unix process API is so awful is that for whatever reason
> > > > > > > > the original designers treated processes as some kind of special kind
> > > > > > > > of resource instead of fitting them into the otherwise general-purpose
> > > > > > > > unix data-handling API. Let's not repeat that mistake.
> > > > > > > >
> > > > > > > > > Something like this:
> > > > > > > > >
> > > > > > > > > evfd = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC);
> > > > > > > > > // register eventfd to receive death notification
> > > > > > > > > pidfd_wait(pid_to_kill, evfd);
> > > > > > > > > // kill the process
> > > > > > > > > pidfd_send_signal(pid_to_kill, ...)
> > > > > > > > > // tend to other things
> > > > > > > >
> > > > > > > > Now you've lost me. pidfd_wait should return a *new* FD, not wire up
> > > > > > > > an eventfd.
> > > > > > > >
> > > > > >
> > > > > > Ok, I probably misunderstood your post linked by Joel. I though your
> > > > > > original proposal was based on being able to poll a file under
> > > > > > /proc/pid and then you changed your mind to have a separate syscall
> > > > > > which I assumed would be a blocking one to wait for process exit.
> > > > > > Maybe you can describe the new interface you are thinking about in
> > > > > > terms of userspace usage like I did above? Several lines of code would
> > > > > > explain more than paragraphs of text.
> > > > >
> > > > > Hey, Thanks Suren for the eventfd idea. I agree with Daniel on this. The idea
> > > > > from Daniel here is to wait for process death and exit events by just
> > > > > referring to a stable fd, independent of whatever is going on in /proc.
> > > > >
> > > > > What is needed is something like this (in highly pseudo-code form):
> > > > >
> > > > > pidfd = opendir("/proc/<pid>",..);
> > > > > wait_fd = pidfd_wait(pidfd);
> > > > > read or poll wait_fd (non-blocking or blocking whichever)
> > > > >

Thanks for the explanation Joel. Now I understand the proposal. Will
think about it some more and looking forward for the implementation
patch.

> > > > > wait_fd will block until the task has either died or reaped. In both these
> > > > > cases, it can return a suitable string such as "dead" or "reaped" although an
> > > > > integer with some predefined meaning is also Ok.
> > > > >
> > > > > What that guarantees is, even if the task's PID has been reused, or the task
> > > > > has already died or already died + reaped, all of these events cannot race
> > > > > with the code above and the information passed to the user is race-free and
> > > > > stable / guaranteed.
> > > > >
> > > > > An eventfd seems to not fit well, because AFAICS passing the raw PID to
> > > > > eventfd as in your example would still race since the PID could have been
> > > > > reused by another process by the time the eventfd is created.
> > > > > Also Andy's idea in [1] seems to use poll flags to communicate various tihngs
> > > > > which is still not as explicit about the PID's status so that's a poor API
> > > > > choice compared to the explicit syscall.
> > > > >
> > > > > I am planning to work on a prototype patch based on Daniel's idea and post something
> > > > > soon (chatted with Daniel about it and will reference him in the posting as
> > > > > well), during this posting I will also summarize all the previous discussions
> > > > > and come up with some tests as well.  I hope to have something soon.
> > > >
> > > > Having pidfd_wait() return another fd will make the syscall harder to
> > > > swallow for a lot of people I reckon.
> > > > What exactly prevents us from making the pidfd itself readable/pollable
> > > > for the exit staus? They are "special" fds anyway. I would really like
> > > > to avoid polluting the api with multiple different types of fds if possible.
> > > >
> > > > ret = pidfd_wait(pidfd);
> > > > read or poll pidfd
> > >
> > > I'm not quite clear on what the two steps are doing here.  Is pidfd_wait()
> > > doing a waitpid(2), and the read gets exit status info?
> >
> > pidfd_wait on an open pidfd returns a "wait handle" FD. The wait
>
> That is what you are proposing.  I'm not sure that's what Christian
> was proposing.  'ret' is ambiguous there.  Christian?
>
> > handle works just like a pipe: you can select/epoll/whatever for
> > readability. read(2) on the wait handle (which blocks unless you set
> > O_NONBLOCK, just like a pipe) completes with a siginfo_t when the
> > process to which the wait handle is attached exits. Roughly,
> >
> > int kill_and_wait_for_exit(int pidfd) {
> >   int wait_handle = pidfd_wait(pidfd);
> >   pidfd_send_signal(pidfd, ...);
> >   siginfo_t exit_info;
> >   read(wait_handle, &exit_info, sizeof(exit_info)); // Blocks because
> > we haven't configured non-blocking behavior, just like a pipe.
> >   close(wait_handle);
> >   return exit_info.si_status;
> > }
> >
> > >
> > > > (Note that I'm traveling so my responses might be delayed quite a bit.)
> > > > (Ccing a few people that might have an opinion here.)
> > > >
> > > > Christian
> > >
> > > On its own, what you (Christian) show seems nicer.  But think about a main event
> > > loop (like in lxc), where we just loop over epoll_wait() on various descriptors.
> > > If we want to wait for any of several types of events - maybe a signalfd, socket
> > > traffic, or a process death - it would be nice if we can treat them all the same
> > > way, without having to setup a separate thread to watch the pidfd and send
> > > data over another fd.  Is there a nice way we can provide that with what you've
> > > got above?
> >
> > Nobody is proposing any kind of mechanism that would require a
> > separate thread. What I'm proposing works with poll and read and
> > should be trivial to integrate into any existing event loop: from the
> > perspective of the event loop, it looks just like a pipe.
>
> (yes, I understood your proposal)
>
> -serge

      reply	other threads:[~2019-03-17 22:02 UTC|newest]

Thread overview: 125+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-10 20:34 [RFC] simple_lmk: Introduce Simple Low Memory Killer for Android Sultan Alsawaf
2019-03-10 20:34 ` Sultan Alsawaf
2019-03-10 21:03 ` Greg Kroah-Hartman
2019-03-10 21:26   ` Sultan Alsawaf
2019-03-11 16:32 ` Joel Fernandes
2019-03-11 16:37   ` Joel Fernandes
2019-03-11 17:43 ` Michal Hocko
2019-03-11 17:58   ` Sultan Alsawaf
2019-03-11 20:10     ` Suren Baghdasaryan
2019-03-11 20:46       ` Sultan Alsawaf
2019-03-11 21:11         ` Joel Fernandes
2019-03-11 21:46           ` Sultan Alsawaf
2019-03-11 22:15         ` Suren Baghdasaryan
2019-03-11 22:36           ` Sultan Alsawaf
2019-03-12  8:05           ` Michal Hocko
2019-03-12 14:36             ` Suren Baghdasaryan
2019-03-12 15:25             ` Matthew Wilcox
2019-03-12 15:33               ` Michal Hocko
2019-03-12 15:39                 ` Michal Hocko
2019-03-12 16:37             ` Sultan Alsawaf
2019-03-12 16:48               ` Michal Hocko
2019-03-12 16:58               ` Michal Hocko
2019-03-12 17:15                 ` Suren Baghdasaryan
2019-03-12 17:17               ` Tim Murray
2019-03-12 17:45                 ` Sultan Alsawaf
2019-03-12 18:43                   ` Tim Murray
2019-03-12 18:50                     ` Christian Brauner
2019-03-14 17:47                 ` Joel Fernandes
2019-03-14 20:49                   ` Sultan Alsawaf
2019-03-15  2:54                     ` Joel Fernandes
2019-03-15  3:43                       ` Sultan Alsawaf
2019-03-15  3:16                     ` Steven Rostedt
2019-03-15  3:45                       ` Sultan Alsawaf
2019-03-15  4:36                       ` Daniel Colascione
2019-03-15 13:36                         ` Joel Fernandes
2019-03-15 15:56                         ` Suren Baghdasaryan
2019-03-15 16:12                           ` Daniel Colascione
2019-03-15 16:43                         ` Steven Rostedt
2019-03-15 17:17                           ` Daniel Colascione
2019-03-15 18:03                         ` Christian Brauner
2019-03-15 18:13                           ` Joel Fernandes
2019-03-15 18:24                             ` Christian Brauner
2019-03-15 18:49                               ` Joel Fernandes
2019-03-16 17:31                                 ` Suren Baghdasaryan
2019-03-16 18:00                                   ` Daniel Colascione
2019-03-16 18:57                                     ` Christian Brauner
2019-03-16 19:37                                       ` Suren Baghdasaryan
2019-03-17  1:53                                         ` Joel Fernandes
2019-03-17 11:42                                           ` Christian Brauner
2019-03-17 15:40                                             ` Daniel Colascione
2019-03-18  0:29                                               ` Christian Brauner
2019-03-18 23:50                                                 ` Joel Fernandes
2019-03-19 22:14                                                   ` Christian Brauner
2019-03-19 22:26                                                     ` Joel Fernandes
2019-03-19 22:48                                                     ` Daniel Colascione
2019-03-19 23:10                                                       ` Christian Brauner
2019-03-20  1:52                                                         ` Joel Fernandes
2019-03-20  2:42                                                           ` pidfd design Daniel Colascione
2019-03-20  3:59                                                             ` Christian Brauner
2019-03-20  7:02                                                               ` Daniel Colascione
2019-03-20 11:33                                                                 ` Joel Fernandes
2019-03-20 11:33                                                                   ` Joel Fernandes
2019-03-20 18:26                                                                   ` Christian Brauner
2019-03-20 18:38                                                                     ` Daniel Colascione
2019-03-20 18:51                                                                       ` Christian Brauner
2019-03-20 18:58                                                                         ` Andy Lutomirski
2019-03-20 19:14                                                                           ` Christian Brauner
2019-03-20 19:40                                                                             ` Daniel Colascione
2019-03-21 17:02                                                                               ` Andy Lutomirski
2019-03-25 20:13                                                                                 ` Jann Horn
2019-03-25 20:13                                                                                   ` Jann Horn
2019-03-25 20:23                                                                                   ` Daniel Colascione
2019-03-25 20:23                                                                                     ` Daniel Colascione
2019-03-25 23:42                                                                                     ` Andy Lutomirski
2019-03-25 23:42                                                                                       ` Andy Lutomirski
2019-03-25 23:45                                                                                       ` Christian Brauner
2019-03-25 23:45                                                                                         ` Christian Brauner
2019-03-26  0:00                                                                                         ` Andy Lutomirski
2019-03-26  0:00                                                                                           ` Andy Lutomirski
2019-03-26  0:12                                                                                           ` Christian Brauner
2019-03-26  0:12                                                                                             ` Christian Brauner
2019-03-26  0:24                                                                                             ` Andy Lutomirski
2019-03-26  0:24                                                                                               ` Andy Lutomirski
2019-03-28  9:21                                                                                               ` Christian Brauner
2019-03-28  9:21                                                                                                 ` Christian Brauner
2019-03-20 19:19                                                                         ` Joel Fernandes
2019-03-20 19:29                                                                         ` Daniel Colascione
2019-03-24 14:44                                                                           ` Serge E. Hallyn
2019-03-24 18:48                                                                             ` Joel Fernandes
2019-03-20 19:11                                                                     ` Joel Fernandes
2019-05-07  2:16                                                           ` [RFC] simple_lmk: Introduce Simple Low Memory Killer for Android Sultan Alsawaf
2019-05-07  2:16                                                             ` Sultan Alsawaf
2019-05-07  7:04                                                             ` Greg Kroah-Hartman
2019-05-07  7:27                                                               ` Sultan Alsawaf
2019-05-07  7:43                                                                 ` Greg Kroah-Hartman
2019-05-07  8:12                                                                   ` Sultan Alsawaf
2019-05-07 10:58                                                                     ` Christian Brauner
2019-05-07 16:28                                                                       ` Suren Baghdasaryan
2019-05-07 16:38                                                                         ` Christian Brauner
2019-05-07 16:53                                                                         ` Sultan Alsawaf
2019-05-07 20:01                                                                           ` Suren Baghdasaryan
2019-05-07 18:46                                                                         ` Joel Fernandes
2019-05-07 17:17                                                                       ` Sultan Alsawaf
2019-05-07 17:29                                                                         ` Greg Kroah-Hartman
2019-05-07 11:09                                                                     ` Greg Kroah-Hartman
2019-05-07 12:26                                                             ` Michal Hocko
2019-05-07 15:31                                                             ` Oleg Nesterov
2019-05-07 16:35                                                               ` Sultan Alsawaf
2019-05-09 15:56                                                                 ` Oleg Nesterov
2019-05-09 18:33                                                                   ` Sultan Alsawaf
2019-05-10 15:10                                                                     ` Oleg Nesterov
2019-05-13 16:45                                                                       ` Sultan Alsawaf
2019-05-14 16:44                                                                         ` Steven Rostedt
2019-05-14 17:31                                                                           ` Sultan Alsawaf
2019-05-15 14:58                                                                         ` Oleg Nesterov
2019-05-15 17:27                                                                           ` Sultan Alsawaf
2019-05-15 17:27                                                                             ` Sultan Alsawaf
2019-05-15 18:32                                                                             ` Steven Rostedt
2019-05-15 18:52                                                                               ` Sultan Alsawaf
2019-05-15 20:09                                                                                 ` Steven Rostedt
2019-05-16 13:54                                                                             ` Oleg Nesterov
2019-03-17 16:35                                             ` Serge E. Hallyn
2019-03-17 17:11                                               ` Daniel Colascione
2019-03-17 17:16                                                 ` Serge E. Hallyn
2019-03-17 22:02                                                   ` Suren Baghdasaryan [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJuCfpF8m5kTMXu=G4qBcvSbrpFH91GmS43VHuphEa3hDxOq+Q@mail.gmail.com' \
    --to=surenb@google.com \
    --cc=arve@android.com \
    --cc=christian@brauner.io \
    --cc=dancol@google.com \
    --cc=devel@driverdev.osuosl.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=joel@joelfernandes.org \
    --cc=kernel-team@android.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@amacapital.net \
    --cc=maco@android.com \
    --cc=mhocko@kernel.org \
    --cc=mingo@redhat.com \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=serge@hallyn.com \
    --cc=sultan@kerneltoast.com \
    --cc=timmurray@google.com \
    --cc=tkjos@android.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.