linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [GIT PULL RESEND] pidfd changes for v5.1-rc1
@ 2019-03-12 13:52 Christian Brauner
  2019-03-16  1:22 ` Joel Fernandes
  0 siblings, 1 reply; 4+ messages in thread
From: Christian Brauner @ 2019-03-12 13:52 UTC (permalink / raw)
  To: torvalds; +Cc: tglx, linux-kernel, x86, arnd, Christian Brauner

Hi Linus,

This is a resend of the pull request for the pidfd_send_signal() syscall
which I sent last Tuesday. I'm not sure whether you just wanted to take a
closer look.

The following changes since commit f17b5f06cb92ef2250513a1e154c47b78df07d40:

  Linux 5.0-rc4 (2019-01-27 15:18:05 -0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux.git tags/pidfd-v5.1-rc1

The patchset introduces the ability to use file descriptors from proc/<pid>
as stable handles on struct pid. Even if a pid is recycled the handle will
not change. For a start these fds can be used to send signals to the
processes they refer to.

With the ability to use /proc/<pid> fds as stable handles on struct pid we
can fix a long-standing issue where after a process has exited its pid can
be reused by another process. If a caller sends a signal to a reused pid it
will end up signaling the wrong process.
With this patchset we enable a variety of use cases. One obvious example is
that we can now safely delegate an important part of process management -
sending signals - to processes other than the parent of a given process by
sending file descriptors around via scm rights and not fearing that the
given process will have been recycled in the meantime.
It also allows for easy testing whether a given process is still alive or
not by sending signal 0 to a pidfd which is quite handy.
There has been some interest in this feature e.g. from systems management
(systemd, glibc) and container managers. I have requested and gotten
comments from glibc to make sure that this syscall is suitable for their
needs as well. In the future I expect it to take on most other pid-based
signal syscalls. But such features are left for the future once they are
needed.

The patchset has been sitting in linux-next for quite a while and has
not caused any issues. It comes with selftests which verify basic
functionality and also test that a recycled pid cannot be signaled via a
pidfd.

Jon has written about a prior version of this patchset. It should cover the
basic functionality since not a lot has changed since then:

https://lwn.net/Articles/773459/

The commit message for the syscall itself is extensively documenting the
syscall, including it's functionality and extensibility.

/* Merge conflict and sycall number coordination */
Please note, there will be a merge conflict between the Jens' io_uring
patch set in the block tree and this tree. To minimize its impact Arnd
worked with Jens and me to coordinate syscall numbers in advance.
pidfd_send_signal() takes 424 and Jens' patchset took 425 to 427.

/* Separate tree on kernel.org */
At the beginning of last merge cycle it was suggested to move this patchset
into a separate tree on kernel.org as there will be more work coming that
will be extending the use of file descriptors for processes. The tree was
announced in January:

https://lore.kernel.org/lkml/20190108234722.bojj5bqowlutymnt@brauner.io/

The pidfd tree is located on kernel.org

https://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux.git/

and it's for-next branch is already tracked by Stephen in linux-next since
the beginning of the 5.0 development cycle. I'm prepared to deal with any
fallouts coming from this work going forward.

The only thing that has changed recently in these patches was the addition
of two more Acked-by/Reviewed-by from  David Howells and tglx after the
last round of reviews.

Please consider pulling these changes from the signed pidfd-v5.1-rc1 tag.

Thanks!
Christian

----------------------------------------------------------------
pidfd patches for v5.1-rc1

----------------------------------------------------------------
Christian Brauner (2):
      signal: add pidfd_send_signal() syscall
      selftests: add tests for pidfd_send_signal()

 arch/x86/entry/syscalls/syscall_32.tbl     |   1 +
 arch/x86/entry/syscalls/syscall_64.tbl     |   1 +
 fs/proc/base.c                             |   9 +
 include/linux/proc_fs.h                    |   6 +
 include/linux/syscalls.h                   |   3 +
 include/uapi/asm-generic/unistd.h          |   4 +-
 kernel/signal.c                            | 133 +++++++++-
 kernel/sys_ni.c                            |   1 +
 tools/testing/selftests/Makefile           |   1 +
 tools/testing/selftests/pidfd/Makefile     |   6 +
 tools/testing/selftests/pidfd/pidfd_test.c | 381 +++++++++++++++++++++++++++++
 11 files changed, 539 insertions(+), 7 deletions(-)
 create mode 100644 tools/testing/selftests/pidfd/Makefile
 create mode 100644 tools/testing/selftests/pidfd/pidfd_test.c

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [GIT PULL RESEND] pidfd changes for v5.1-rc1
  2019-03-12 13:52 [GIT PULL RESEND] pidfd changes for v5.1-rc1 Christian Brauner
@ 2019-03-16  1:22 ` Joel Fernandes
  0 siblings, 0 replies; 4+ messages in thread
From: Joel Fernandes @ 2019-03-16  1:22 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Linus Torvalds, Thomas Glexiner, LKML,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	Arnd Bergmann

On Tue, Mar 12, 2019 at 6:53 AM Christian Brauner <christian@brauner.io> wrote:
>
> Hi Linus,
>
> This is a resend of the pull request for the pidfd_send_signal() syscall
> which I sent last Tuesday. I'm not sure whether you just wanted to take a
> closer look.
>
> The following changes since commit f17b5f06cb92ef2250513a1e154c47b78df07d40:
>
>   Linux 5.0-rc4 (2019-01-27 15:18:05 -0800)
>
> are available in the Git repository at:
>
>   git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux.git tags/pidfd-v5.1-rc1
>
> The patchset introduces the ability to use file descriptors from proc/<pid>
> as stable handles on struct pid. Even if a pid is recycled the handle will
> not change. For a start these fds can be used to send signals to the
> processes they refer to.

Joel from the Android team here. This will solve a long standing issue we
have with Android's low memory killer daemon (lmkd) where the killing of
a PID is racy with the traditional signal delivery methods. With this new API,
we can kill things correctly in a race free way. I hope this will get merged
soon and I look forward to further developing on top of this (such as
for support knowing when something was killed and waiting for it reliably -
right now we have a very suboptimal 100ms periodic polling loop to
check for process death, whichslows down how fast we can kill processes to
reclaim their memory).

thanks,

- Joel


>
> With the ability to use /proc/<pid> fds as stable handles on struct pid we
> can fix a long-standing issue where after a process has exited its pid can
> be reused by another process. If a caller sends a signal to a reused pid it
> will end up signaling the wrong process.
> With this patchset we enable a variety of use cases. One obvious example is
> that we can now safely delegate an important part of process management -
> sending signals - to processes other than the parent of a given process by
> sending file descriptors around via scm rights and not fearing that the
> given process will have been recycled in the meantime.
> It also allows for easy testing whether a given process is still alive or
> not by sending signal 0 to a pidfd which is quite handy.
> There has been some interest in this feature e.g. from systems management
> (systemd, glibc) and container managers. I have requested and gotten
> comments from glibc to make sure that this syscall is suitable for their
> needs as well. In the future I expect it to take on most other pid-based
> signal syscalls. But such features are left for the future once they are
> needed.
>
> The patchset has been sitting in linux-next for quite a while and has
> not caused any issues. It comes with selftests which verify basic
> functionality and also test that a recycled pid cannot be signaled via a
> pidfd.
>
> Jon has written about a prior version of this patchset. It should cover the
> basic functionality since not a lot has changed since then:
>
> https://lwn.net/Articles/773459/
>
> The commit message for the syscall itself is extensively documenting the
> syscall, including it's functionality and extensibility.
>
> /* Merge conflict and sycall number coordination */
> Please note, there will be a merge conflict between the Jens' io_uring
> patch set in the block tree and this tree. To minimize its impact Arnd
> worked with Jens and me to coordinate syscall numbers in advance.
> pidfd_send_signal() takes 424 and Jens' patchset took 425 to 427.
>
> /* Separate tree on kernel.org */
> At the beginning of last merge cycle it was suggested to move this patchset
> into a separate tree on kernel.org as there will be more work coming that
> will be extending the use of file descriptors for processes. The tree was
> announced in January:
>
> https://lore.kernel.org/lkml/20190108234722.bojj5bqowlutymnt@brauner.io/
>
> The pidfd tree is located on kernel.org
>
> https://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux.git/
>
> and it's for-next branch is already tracked by Stephen in linux-next since
> the beginning of the 5.0 development cycle. I'm prepared to deal with any
> fallouts coming from this work going forward.
>
> The only thing that has changed recently in these patches was the addition
> of two more Acked-by/Reviewed-by from  David Howells and tglx after the
> last round of reviews.
>
> Please consider pulling these changes from the signed pidfd-v5.1-rc1 tag.
>
> Thanks!
> Christian
>
> ----------------------------------------------------------------
> pidfd patches for v5.1-rc1
>
> ----------------------------------------------------------------
> Christian Brauner (2):
>       signal: add pidfd_send_signal() syscall
>       selftests: add tests for pidfd_send_signal()
>
>  arch/x86/entry/syscalls/syscall_32.tbl     |   1 +
>  arch/x86/entry/syscalls/syscall_64.tbl     |   1 +
>  fs/proc/base.c                             |   9 +
>  include/linux/proc_fs.h                    |   6 +
>  include/linux/syscalls.h                   |   3 +
>  include/uapi/asm-generic/unistd.h          |   4 +-
>  kernel/signal.c                            | 133 +++++++++-
>  kernel/sys_ni.c                            |   1 +
>  tools/testing/selftests/Makefile           |   1 +
>  tools/testing/selftests/pidfd/Makefile     |   6 +
>  tools/testing/selftests/pidfd/pidfd_test.c | 381 +++++++++++++++++++++++++++++
>  11 files changed, 539 insertions(+), 7 deletions(-)
>  create mode 100644 tools/testing/selftests/pidfd/Makefile
>  create mode 100644 tools/testing/selftests/pidfd/pidfd_test.c

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [GIT PULL RESEND] pidfd changes for v5.1-rc1
  2019-03-13  5:00 Jonathon Kowalski
@ 2019-03-13  9:10 ` Christian Brauner
  0 siblings, 0 replies; 4+ messages in thread
From: Christian Brauner @ 2019-03-13  9:10 UTC (permalink / raw)
  To: Jonathon Kowalski; +Cc: torvalds, arnd, linux-kernel, x86, tglx

On Wed, Mar 13, 2019 at 05:00:57AM +0000, Jonathon Kowalski wrote:
> Hi,
> 
> Thanks for the work on this system call! I am interested in making use of it
> in my process supervisor. It works pretty well and avoids the long-standing
> issue of PID reuse.

Thanks! The systemd folks have been quite excited about this too.

> 
> One thing that instantly came to mind is to be able to delegate killing to
> some third process depending on the confguration. However, I don't see that
> permissions are attached to the open file description, but seemed to be
> checked when calling pidfd_send_signal as they are with kill(2). Is there

It came up during the discussion. We all preferred to have something
simple and not introduce a new permission model.

There's nothing necessarily blocking us from doing this in the future
though. It's not off the table but out of scope for now.

Thanks!
Christian

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [GIT PULL RESEND] pidfd changes for v5.1-rc1
@ 2019-03-13  5:00 Jonathon Kowalski
  2019-03-13  9:10 ` Christian Brauner
  0 siblings, 1 reply; 4+ messages in thread
From: Jonathon Kowalski @ 2019-03-13  5:00 UTC (permalink / raw)
  To: christian; +Cc: torvalds, arnd, linux-kernel, x86, tglx

Hi,

Thanks for the work on this system call! I am interested in making use 
of it in my process supervisor. It works pretty well and avoids the 
long-standing issue of PID reuse.

One thing that instantly came to mind is to be able to delegate killing 
to some third process depending on the confguration. However, I don't 
see that permissions are attached to the open file description, but 
seemed to be checked when calling pidfd_send_signal as they are with 
kill(2). Is there any particular reason this was avoided? For instance, 
if a process with CAP_KILL opens the procfd, shouldn't any process that 
uses a descriptor pointing to this same file description be permitted to 
send signals? It would be a lot more useful that way.

There doesn't seem to much benefit of using file descriptors for 
processes otherwise if cannot use them that way, apart from PID reuse.

So, is something like this on the roadmap in the future, and if not, 
what was the reason it was avoided? I don't see a problem with using 
CAP_KILL to not check permissions at call time, otherwise I can see why 
it would be a problem in general (because processes can change credentials).

Regards,
Jonathon Kowalski

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-03-16  1:22 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-12 13:52 [GIT PULL RESEND] pidfd changes for v5.1-rc1 Christian Brauner
2019-03-16  1:22 ` Joel Fernandes
2019-03-13  5:00 Jonathon Kowalski
2019-03-13  9:10 ` Christian Brauner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).