linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Daniel Colascione <dancol@google.com>
To: David Laight <David.Laight@aculab.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"timmurray@google.com" <timmurray@google.com>,
	"joelaf@google.com" <joelaf@google.com>
Subject: Re: [RFC PATCH] Minimal non-child process exit notification support
Date: Wed, 31 Oct 2018 14:48:34 +0000	[thread overview]
Message-ID: <CAKOZueuuAGSDwXc2jdZM_=EJdsG0HeGZFLaFWDFNHLLKXJpwbg@mail.gmail.com> (raw)
In-Reply-To: <4ed9af67cf4a46708905d3f392344bcb@AcuMS.aculab.com>

On Wed, Oct 31, 2018 at 2:25 PM, David Laight <David.Laight@aculab.com> wrote:
> From: Daniel Colascione
>> Sent: 31 October 2018 12:56
>> On Wed, Oct 31, 2018 at 12:27 PM, David Laight <David.Laight@aculab.com> wrote:
>> > From: Daniel Colascione
>> >> Sent: 29 October 2018 17:53
>> >>
>> >> This patch adds a new file under /proc/pid, /proc/pid/exithand.
>> >> Attempting to read from an exithand file will block until the
>> >> corresponding process exits, at which point the read will successfully
>> >> complete with EOF.  The file descriptor supports both blocking
>> >> operations and poll(2). It's intended to be a minimal interface for
>> >> allowing a program to wait for the exit of a process that is not one
>> >> of its children.
>> >
>> > Why do you need an extra file?
>>
>> Because no current file suffices.
>
> That doesn't stop you making something work on any/all of the existing files.

Why overload?

>> > It ought to be possible to use poll() to wait for POLLERR having set
>> > 'events' to zero on any of the nodes in /proc/pid - or even on
>> > the directory itself.
>>
>> That doesn't actually work today. And waiting on a directory with
>> POLLERR would be very weird, since directories in general don't do
>> things like blocking reads or poll support. A separate file with
>> self-contained, well-defined semantics is cleaner.
>
> Device drivers will (well ought to) return POLLERR when a device
> is removed.
> Making procfs behave the same way wouldn't be too stupid.

That's overkill. You'd need to add poll support to files throughout
/proc/pid, and I don't think doing so would add any new capabilities
over keeping the changes localized to one new place.

>> > Indeed, to avoid killing the wrong process you need to have opened
>> > some node of /proc/pid/* (maybe cmdline) before sending the kill
>> > signal.
>>
>> The kernel really needs better documentation of the semantics of
>> procfs file descriptors. You're not the only person to think,
>> mistakenly, that keeping a reference to a /proc/$PID/something FD
>> reserves $PID and prevents it being used for another process. Procfs
>> FDs do no such thing. kill(2) is unsafe whether or not
>> /proc/pid/cmdline or any other /proc file is open.
>
> Interesting.
> Linux 'fixed' the problem of pid reuse in the kernel by adding (IIRC)
> 'struct pid' that reference counts the pid stopping reuse.

Struct pid doesn't stop PID reuse. It just allows the kernel to
distinguish between a stale and current reference to a given PID. In a
sense, the "struct pid*" is the "real" name of a process and the
numeric PID is just a convenient way to find a struct pid.

> But since the pids are still allocated sequentially userspace can
> still reference a pid that is freed and immediately reused.
> I'd have thought that procfs nodes held a reference count on the 'struct pid'.
> There's probably no reason why it shouldn't.
>
> Annoyingly non-GPL drivers can't release references to 'struct pid' so
> are very constrained about which processes they can signal.

If I had my way, I'd s/EXPORT_SYMBOL_GPL/EXPORT_SYMBOL/ in pid.c.
These routines seem generally useful. As an alternative, I suppose
drivers could just use the new /proc/pid/kill interface, with
sufficient contortions, just as userspace can.

> I also managed to use a stale 'struct pid' and kill the wrong process
> - much more likely that the pid number being reused.

That shouldn't be possible, unless by "stale 'struct pid'" you mean a
struct pid* referring to a struct pid that's been released and reused
for some other object (possibly a different struct pid instances), and
that's UB.

> If you look at the NetBSD pid allocator you'll see that it uses the
> low pid bits to index an array and the high bits as a sequence number.
> The array slots are also reused LIFO, so you always need a significant
> number of pid allocate/free before a number is reused.
> The non-sequential allocation also makes it significantly more difficult
> to predict when a pid will be reused.
> The table size is doubled when it gets nearly full.

NetBSD is still just papering over the problem. The real issue is that
the whole PID-based process API model is unsafe, and a clever PID
allocator doesn't address the fundamental race condition. As long as
PID reuse is possible at all, there's a potential race condition, and
correctness depends on hope. The only way you could address the PID
race problem while not changing the Unix process API is by making
pid_t ridiculously wide so that it never wraps around.

  parent reply	other threads:[~2018-10-31 14:48 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-29 17:53 [RFC PATCH] Minimal non-child process exit notification support Daniel Colascione
2018-10-29 19:22 ` [RFC PATCH v2] " Daniel Colascione
2018-11-01  7:00   ` Aleksa Sarai
2018-11-01  7:06     ` Aleksa Sarai
2018-11-01  9:58       ` Christian Brauner
2018-11-01  9:59     ` Daniel Colascione
2018-11-01 10:47       ` Aleksa Sarai
2018-11-01 11:32         ` Daniel Colascione
2018-10-29 19:45 ` [RFC PATCH] " Joel Fernandes
2018-10-29 19:47   ` Joel Fernandes
2018-10-29 20:01   ` Daniel Colascione
2018-10-30  3:06     ` Joel Fernandes
2018-10-30  8:59       ` Daniel Colascione
2018-10-30 22:24         ` Joel Fernandes
2018-10-31 12:27 ` David Laight
2018-10-31 12:56   ` Daniel Colascione
2018-10-31 14:25     ` David Laight
2018-10-31 14:41       ` Joel Fernandes
2018-10-31 14:43         ` Joel Fernandes
2018-10-31 14:48       ` Daniel Colascione [this message]
2018-10-31 16:53 ` Daniel Colascione
     [not found]   ` <175DDE5D-E738-4C35-B664-6AD8B9CF446D@amacapital.net>
2018-10-31 17:44     ` Daniel Colascione

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKOZueuuAGSDwXc2jdZM_=EJdsG0HeGZFLaFWDFNHLLKXJpwbg@mail.gmail.com' \
    --to=dancol@google.com \
    --cc=David.Laight@aculab.com \
    --cc=joelaf@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=timmurray@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).