linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Document /proc/pid PID reuse behavior
@ 2018-10-31 15:06 Daniel Colascione
  2018-11-01  7:08 ` Mike Rapoport
  2018-11-05 13:22 ` [PATCH v2] " Daniel Colascione
  0 siblings, 2 replies; 30+ messages in thread
From: Daniel Colascione @ 2018-10-31 15:06 UTC (permalink / raw)
  To: linux-kernel
  Cc: timmurray, joelaf, surenb, Daniel Colascione, Jonathan Corbet,
	Andrew Morton, Roman Gushchin, Mike Rapoport, Vlastimil Babka,
	Dennis Zhou (Facebook),
	Kirill A. Shutemov, Prashant Dhamdhere, open list:DOCUMENTATION

State explicitly that holding a /proc/pid file descriptor open does
not reserve the PID. Also note that in the event of PID reuse, these
open file descriptors refer to the old, now-dead process, and not the
new one that happens to be named the same numeric PID.
---
 Documentation/filesystems/proc.txt | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index 12a5e6e693b6..567f66a8a23c 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -214,6 +214,14 @@ asynchronous manner and the value may not be very precise. To see a precise
 snapshot of a moment, you can see /proc/<pid>/smaps file and scan page table.
 It's slow but very precise.
 
+Note that an open a file descriptor to /proc/<pid> or to any of its
+contained files or subdirectories does not prevent <pid> being reused
+for some other process in the event that <pid> exits. Operations on
+open /proc/<pid> file descriptors corresponding to dead processes
+never act on any new process that the kernel may, through chance, have
+also assigned the process ID <pid>. Instead, operations on these FDs
+usually fail with ESRCH.
+
 Table 1-2: Contents of the status files (as of 4.8)
 ..............................................................................
  Field                       Content
-- 
2.19.1.568.g152ad8e336-goog


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH] Document /proc/pid PID reuse behavior
  2018-10-31 15:06 [PATCH] Document /proc/pid PID reuse behavior Daniel Colascione
@ 2018-11-01  7:08 ` Mike Rapoport
  2018-11-05 13:22 ` [PATCH v2] " Daniel Colascione
  1 sibling, 0 replies; 30+ messages in thread
From: Mike Rapoport @ 2018-11-01  7:08 UTC (permalink / raw)
  To: Daniel Colascione
  Cc: linux-kernel, timmurray, joelaf, surenb, Jonathan Corbet,
	Andrew Morton, Roman Gushchin, Mike Rapoport, Vlastimil Babka,
	Dennis Zhou (Facebook),
	Kirill A. Shutemov, Prashant Dhamdhere, open list:DOCUMENTATION,
	Michael Kerrisk

On Wed, Oct 31, 2018 at 03:06:22PM +0000, Daniel Colascione wrote:
> State explicitly that holding a /proc/pid file descriptor open does
> not reserve the PID. Also note that in the event of PID reuse, these
> open file descriptors refer to the old, now-dead process, and not the
> new one that happens to be named the same numeric PID.

Signed-off is missing.

> ---
>  Documentation/filesystems/proc.txt | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
> index 12a5e6e693b6..567f66a8a23c 100644
> --- a/Documentation/filesystems/proc.txt
> +++ b/Documentation/filesystems/proc.txt
> @@ -214,6 +214,14 @@ asynchronous manner and the value may not be very precise. To see a precise
>  snapshot of a moment, you can see /proc/<pid>/smaps file and scan page table.
>  It's slow but very precise.
> 
> +Note that an open a file descriptor to /proc/<pid> or to any of its
> +contained files or subdirectories does not prevent <pid> being reused
> +for some other process in the event that <pid> exits. Operations on
> +open /proc/<pid> file descriptors corresponding to dead processes
> +never act on any new process that the kernel may, through chance, have
> +also assigned the process ID <pid>. Instead, operations on these FDs
> +usually fail with ESRCH.
> +

I'd put this text in the beginning of the section, just before table 1-1.
Otherwise looks good.

It maybe also useful to update 'man 5 proc' (1) as well

[1] https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/tree/man5/proc.5

>  Table 1-2: Contents of the status files (as of 4.8)
>  ..............................................................................
>   Field                       Content
> -- 
> 2.19.1.568.g152ad8e336-goog
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v2] Document /proc/pid PID reuse behavior
  2018-10-31 15:06 [PATCH] Document /proc/pid PID reuse behavior Daniel Colascione
  2018-11-01  7:08 ` Mike Rapoport
@ 2018-11-05 13:22 ` Daniel Colascione
  2018-11-06  6:01   ` Mike Rapoport
                     ` (2 more replies)
  1 sibling, 3 replies; 30+ messages in thread
From: Daniel Colascione @ 2018-11-05 13:22 UTC (permalink / raw)
  To: linux-kernel, rppt
  Cc: timmurray, joelaf, surenb, Daniel Colascione, Jonathan Corbet,
	Andrew Morton, Roman Gushchin, Mike Rapoport, Vlastimil Babka,
	Kirill A. Shutemov, Dennis Zhou (Facebook),
	Prashant Dhamdhere, open list:DOCUMENTATION

State explicitly that holding a /proc/pid file descriptor open does
not reserve the PID. Also note that in the event of PID reuse, these
open file descriptors refer to the old, now-dead process, and not the
new one that happens to be named the same numeric PID.

Signed-off-by: Daniel Colascione <dancol@google.com>
---
 Documentation/filesystems/proc.txt | 7 +++++++
 1 file changed, 7 insertions(+)

Moved paragraphed to start of /proc/pid section; added signed-off-by.

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index 12a5e6e693b6..0b14460f721d 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -125,6 +125,13 @@ process running on the system, which is named after the process ID (PID).
 The link  self  points  to  the  process reading the file system. Each process
 subdirectory has the entries listed in Table 1-1.
 
+Note that an open a file descriptor to /proc/<pid> or to any of its
+contained files or subdirectories does not prevent <pid> being reused
+for some other process in the event that <pid> exits. Operations on
+open /proc/<pid> file descriptors corresponding to dead processes
+never act on any new process that the kernel may, through chance, have
+also assigned the process ID <pid>. Instead, operations on these FDs
+usually fail with ESRCH.
 
 Table 1-1: Process specific entries in /proc
 ..............................................................................
-- 
2.19.1.930.g4563a0d9d0-goog


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] Document /proc/pid PID reuse behavior
  2018-11-05 13:22 ` [PATCH v2] " Daniel Colascione
@ 2018-11-06  6:01   ` Mike Rapoport
  2018-11-07 17:16     ` Matthew Wilcox
  2018-11-06 13:05   ` Michal Hocko
  2018-11-19 10:54   ` Pavel Machek
  2 siblings, 1 reply; 30+ messages in thread
From: Mike Rapoport @ 2018-11-06  6:01 UTC (permalink / raw)
  To: Daniel Colascione
  Cc: linux-kernel, timmurray, joelaf, surenb, Jonathan Corbet,
	Andrew Morton, Roman Gushchin, Mike Rapoport, Vlastimil Babka,
	Kirill A. Shutemov, Dennis Zhou (Facebook),
	Prashant Dhamdhere, open list:DOCUMENTATION

On Mon, Nov 05, 2018 at 01:22:05PM +0000, Daniel Colascione wrote:
> State explicitly that holding a /proc/pid file descriptor open does
> not reserve the PID. Also note that in the event of PID reuse, these
> open file descriptors refer to the old, now-dead process, and not the
> new one that happens to be named the same numeric PID.
> 
> Signed-off-by: Daniel Colascione <dancol@google.com>

Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>

> ---
>  Documentation/filesystems/proc.txt | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> Moved paragraphed to start of /proc/pid section; added signed-off-by.
> 
> diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
> index 12a5e6e693b6..0b14460f721d 100644
> --- a/Documentation/filesystems/proc.txt
> +++ b/Documentation/filesystems/proc.txt
> @@ -125,6 +125,13 @@ process running on the system, which is named after the process ID (PID).
>  The link  self  points  to  the  process reading the file system. Each process
>  subdirectory has the entries listed in Table 1-1.
>  
> +Note that an open a file descriptor to /proc/<pid> or to any of its
> +contained files or subdirectories does not prevent <pid> being reused
> +for some other process in the event that <pid> exits. Operations on
> +open /proc/<pid> file descriptors corresponding to dead processes
> +never act on any new process that the kernel may, through chance, have
> +also assigned the process ID <pid>. Instead, operations on these FDs
> +usually fail with ESRCH.
>  
>  Table 1-1: Process specific entries in /proc
>  ..............................................................................
> -- 
> 2.19.1.930.g4563a0d9d0-goog
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] Document /proc/pid PID reuse behavior
  2018-11-05 13:22 ` [PATCH v2] " Daniel Colascione
  2018-11-06  6:01   ` Mike Rapoport
@ 2018-11-06 13:05   ` Michal Hocko
  2018-11-07 15:48     ` Daniel Colascione
  2018-11-19 10:54   ` Pavel Machek
  2 siblings, 1 reply; 30+ messages in thread
From: Michal Hocko @ 2018-11-06 13:05 UTC (permalink / raw)
  To: Daniel Colascione
  Cc: linux-kernel, rppt, timmurray, joelaf, surenb, Jonathan Corbet,
	Andrew Morton, Roman Gushchin, Mike Rapoport, Vlastimil Babka,
	Kirill A. Shutemov, Dennis Zhou (Facebook),
	Prashant Dhamdhere, open list:DOCUMENTATION

On Mon 05-11-18 13:22:05, Daniel Colascione wrote:
> State explicitly that holding a /proc/pid file descriptor open does
> not reserve the PID. Also note that in the event of PID reuse, these
> open file descriptors refer to the old, now-dead process, and not the
> new one that happens to be named the same numeric PID.

This sounds quite obvious otherwise anybody could simply DoS the system
by consuming all available pids. But I do not see any strong reason
against having that stated explicitly.

> Signed-off-by: Daniel Colascione <dancol@google.com>

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  Documentation/filesystems/proc.txt | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> Moved paragraphed to start of /proc/pid section; added signed-off-by.
> 
> diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
> index 12a5e6e693b6..0b14460f721d 100644
> --- a/Documentation/filesystems/proc.txt
> +++ b/Documentation/filesystems/proc.txt
> @@ -125,6 +125,13 @@ process running on the system, which is named after the process ID (PID).
>  The link  self  points  to  the  process reading the file system. Each process
>  subdirectory has the entries listed in Table 1-1.
>  
> +Note that an open a file descriptor to /proc/<pid> or to any of its
> +contained files or subdirectories does not prevent <pid> being reused
> +for some other process in the event that <pid> exits. Operations on
> +open /proc/<pid> file descriptors corresponding to dead processes
> +never act on any new process that the kernel may, through chance, have
> +also assigned the process ID <pid>. Instead, operations on these FDs
> +usually fail with ESRCH.
>  
>  Table 1-1: Process specific entries in /proc
>  ..............................................................................
> -- 
> 2.19.1.930.g4563a0d9d0-goog

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] Document /proc/pid PID reuse behavior
  2018-11-06 13:05   ` Michal Hocko
@ 2018-11-07 15:48     ` Daniel Colascione
  2018-11-07 16:00       ` Michal Hocko
  0 siblings, 1 reply; 30+ messages in thread
From: Daniel Colascione @ 2018-11-07 15:48 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-kernel, rppt, Tim Murray, Joel Fernandes,
	Suren Baghdasaryan, Jonathan Corbet, Andrew Morton,
	Roman Gushchin, Mike Rapoport, Vlastimil Babka,
	Kirill A. Shutemov, Dennis Zhou (Facebook),
	Prashant Dhamdhere, open list:DOCUMENTATION

On Tue, Nov 6, 2018 at 1:05 PM, Michal Hocko <mhocko@kernel.org> wrote:
> On Mon 05-11-18 13:22:05, Daniel Colascione wrote:
>> State explicitly that holding a /proc/pid file descriptor open does
>> not reserve the PID. Also note that in the event of PID reuse, these
>> open file descriptors refer to the old, now-dead process, and not the
>> new one that happens to be named the same numeric PID.
>
> This sounds quite obvious

Many people *on* *LKML* were wrong about this behavior. If it's not
obvious to experienced kernel developers, it's certainly not obvious
to the public.

> otherwise anybody could simply DoS the system
> by consuming all available pids.

People can do that today using the instrument of terror widely known
as fork(2). The only thing standing between fork(2) and a full process
table is RLIMIT_NPROC. In a world where we really did reserve a
numeric PID through the lifetime of any struct pid to which it refers
(i.e., where "cd /proc/$PID" held $PID), we could charge these struct
pid reservations against RLIMIT_NPROC and achieve behavior as safe as
what we have today. The details would be subtle (you'd have to take
pains to avoid double-counting, for example), but it could be made to
work. Other people, on the various lkml threads about my process API
improvement proposals, have proposed fixing the longstanding PID race
problem by making struct pid behave the way people mistakenly believe
it behaves today. It's a serious idea worth actual consideration.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] Document /proc/pid PID reuse behavior
  2018-11-07 15:48     ` Daniel Colascione
@ 2018-11-07 16:00       ` Michal Hocko
  2018-11-07 16:10         ` Daniel Colascione
  2018-11-07 17:04         ` Martin Steigerwald
  0 siblings, 2 replies; 30+ messages in thread
From: Michal Hocko @ 2018-11-07 16:00 UTC (permalink / raw)
  To: Daniel Colascione
  Cc: linux-kernel, rppt, Tim Murray, Joel Fernandes,
	Suren Baghdasaryan, Jonathan Corbet, Andrew Morton,
	Roman Gushchin, Mike Rapoport, Vlastimil Babka,
	Kirill A. Shutemov, Dennis Zhou (Facebook),
	Prashant Dhamdhere, open list:DOCUMENTATION

On Wed 07-11-18 15:48:20, Daniel Colascione wrote:
> On Tue, Nov 6, 2018 at 1:05 PM, Michal Hocko <mhocko@kernel.org> wrote:
> > On Mon 05-11-18 13:22:05, Daniel Colascione wrote:
> >> State explicitly that holding a /proc/pid file descriptor open does
> >> not reserve the PID. Also note that in the event of PID reuse, these
> >> open file descriptors refer to the old, now-dead process, and not the
> >> new one that happens to be named the same numeric PID.
> >
> > This sounds quite obvious
> 
> Many people *on* *LKML* were wrong about this behavior. If it's not
> obvious to experienced kernel developers, it's certainly not obvious
> to the public.

Fair enough

> > otherwise anybody could simply DoS the system
> > by consuming all available pids.
> 
> People can do that today using the instrument of terror widely known
> as fork(2). The only thing standing between fork(2) and a full process
> table is RLIMIT_NPROC.

not really. If you really do care about pid space depletion then you
should use pid cgroup controller.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] Document /proc/pid PID reuse behavior
  2018-11-07 16:00       ` Michal Hocko
@ 2018-11-07 16:10         ` Daniel Colascione
  2018-11-07 16:19           ` Michal Hocko
  2018-11-19 11:16           ` Aleksa Sarai
  2018-11-07 17:04         ` Martin Steigerwald
  1 sibling, 2 replies; 30+ messages in thread
From: Daniel Colascione @ 2018-11-07 16:10 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-kernel, rppt, Tim Murray, Joel Fernandes,
	Suren Baghdasaryan, Jonathan Corbet, Andrew Morton,
	Roman Gushchin, Mike Rapoport, Vlastimil Babka,
	Kirill A. Shutemov, Dennis Zhou (Facebook),
	Prashant Dhamdhere, open list:DOCUMENTATION

On Wed, Nov 7, 2018 at 4:00 PM, Michal Hocko <mhocko@kernel.org> wrote:
> On Wed 07-11-18 15:48:20, Daniel Colascione wrote:
>> On Tue, Nov 6, 2018 at 1:05 PM, Michal Hocko <mhocko@kernel.org> wrote:
>> > otherwise anybody could simply DoS the system
>> > by consuming all available pids.
>>
>> People can do that today using the instrument of terror widely known
>> as fork(2). The only thing standing between fork(2) and a full process
>> table is RLIMIT_NPROC.
>
> not really.

What else, besides memory consumption and (as you mention below)
cgroups? In practice, nobody uses RLIMIT_NPROC, so outside of various
container-y namespaced setups, avoidance of
system-DoS-through-PID-exhaustion isn't a pressing problem.

If you really do care about pid space depletion then you
> should use pid cgroup controller.

Or that, sure. But since cgroups are optional, the problem with the
core model remains. In general, if there's a problem X with the core
system API, and you can mitigate X by using a cgroup, X is still a
problem.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] Document /proc/pid PID reuse behavior
  2018-11-07 16:10         ` Daniel Colascione
@ 2018-11-07 16:19           ` Michal Hocko
  2018-11-19 11:16           ` Aleksa Sarai
  1 sibling, 0 replies; 30+ messages in thread
From: Michal Hocko @ 2018-11-07 16:19 UTC (permalink / raw)
  To: Daniel Colascione
  Cc: linux-kernel, rppt, Tim Murray, Joel Fernandes,
	Suren Baghdasaryan, Jonathan Corbet, Andrew Morton,
	Roman Gushchin, Mike Rapoport, Vlastimil Babka,
	Kirill A. Shutemov, Dennis Zhou (Facebook),
	Prashant Dhamdhere, open list:DOCUMENTATION

On Wed 07-11-18 16:10:01, Daniel Colascione wrote:
> On Wed, Nov 7, 2018 at 4:00 PM, Michal Hocko <mhocko@kernel.org> wrote:
[...]
> > If you really do care about pid space depletion then you
> > should use pid cgroup controller.
> 
> Or that, sure. But since cgroups are optional, the problem with the
> core model remains. In general, if there's a problem X with the core
> system API, and you can mitigate X by using a cgroup, X is still a
> problem.

I am not questioning that. All that I am saying is that there is a way to
mitigate the issue. This is not the only resource where the standard
rlimit is not sufficient and there is no other way than cgroups.
Actually cgroups were introduced to address rlimit limits IIRC.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] Document /proc/pid PID reuse behavior
  2018-11-07 16:00       ` Michal Hocko
  2018-11-07 16:10         ` Daniel Colascione
@ 2018-11-07 17:04         ` Martin Steigerwald
  2018-11-08 12:02           ` David Laight
  2018-11-08 13:25           ` Michal Hocko
  1 sibling, 2 replies; 30+ messages in thread
From: Martin Steigerwald @ 2018-11-07 17:04 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Daniel Colascione, linux-kernel, rppt, Tim Murray,
	Joel Fernandes, Suren Baghdasaryan, Jonathan Corbet,
	Andrew Morton, Roman Gushchin, Mike Rapoport, Vlastimil Babka,
	Kirill A. Shutemov, Dennis Zhou (Facebook),
	Prashant Dhamdhere, open list:DOCUMENTATION

Michal Hocko - 07.11.18, 17:00:
> > > otherwise anybody could simply DoS the system
> > > by consuming all available pids.
> > 
> > People can do that today using the instrument of terror widely known
> > as fork(2). The only thing standing between fork(2) and a full
> > process table is RLIMIT_NPROC.
> 
> not really. If you really do care about pid space depletion then you
> should use pid cgroup controller.

Its not quite on-topic, but I am curious now: AFAIK PID limit is 16 
bits. Right? Could it be raised to 32 bits? I bet it would be a major 
change throughout different parts of the kernel.

16 bits sound a bit low these days, not only for PIDs, but also for 
connections / ports.

-- 
Martin



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] Document /proc/pid PID reuse behavior
  2018-11-06  6:01   ` Mike Rapoport
@ 2018-11-07 17:16     ` Matthew Wilcox
  2018-11-07 18:21       ` Daniel Colascione
  0 siblings, 1 reply; 30+ messages in thread
From: Matthew Wilcox @ 2018-11-07 17:16 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Daniel Colascione, linux-kernel, timmurray, joelaf, surenb,
	Jonathan Corbet, Andrew Morton, Roman Gushchin, Vlastimil Babka,
	Kirill A. Shutemov, Dennis Zhou (Facebook),
	Prashant Dhamdhere, open list:DOCUMENTATION

On Tue, Nov 06, 2018 at 08:01:13AM +0200, Mike Rapoport wrote:
> > diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
> > index 12a5e6e693b6..0b14460f721d 100644
> > --- a/Documentation/filesystems/proc.txt
> > +++ b/Documentation/filesystems/proc.txt
> > @@ -125,6 +125,13 @@ process running on the system, which is named after the process ID (PID).
> >  The link  self  points  to  the  process reading the file system. Each process
> >  subdirectory has the entries listed in Table 1-1.
> >  
> > +Note that an open a file descriptor to /proc/<pid> or to any of its

"open file descriptor" (the "a" is unnecessary)

> > +contained files or subdirectories does not prevent <pid> being reused
> > +for some other process in the event that <pid> exits. Operations on
> > +open /proc/<pid> file descriptors corresponding to dead processes
> > +never act on any new process that the kernel may, through chance, have
> > +also assigned the process ID <pid>. Instead, operations on these FDs
> > +usually fail with ESRCH.

The paragraph is a bit wordy.  More pithy:

An open file descriptor for /proc/<pid> (or any of the files or
subdirectories in it) does not prevent <pid> from being reused after
the process exits.  Operations on a file descriptor referring to a dead
process usually return ESRCH.  They do not act on any new process which
has been assigned the same <pid>.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] Document /proc/pid PID reuse behavior
  2018-11-07 17:16     ` Matthew Wilcox
@ 2018-11-07 18:21       ` Daniel Colascione
  0 siblings, 0 replies; 30+ messages in thread
From: Daniel Colascione @ 2018-11-07 18:21 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Mike Rapoport, linux-kernel, Tim Murray, Joel Fernandes,
	Suren Baghdasaryan, Jonathan Corbet, Andrew Morton,
	Roman Gushchin, Vlastimil Babka, Kirill A. Shutemov,
	Dennis Zhou (Facebook),
	Prashant Dhamdhere, open list:DOCUMENTATION

On Wed, Nov 7, 2018 at 9:16 AM, Matthew Wilcox <willy@infradead.org> wrote:
> On Tue, Nov 06, 2018 at 08:01:13AM +0200, Mike Rapoport wrote:
>> > diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
>> > index 12a5e6e693b6..0b14460f721d 100644
>> > --- a/Documentation/filesystems/proc.txt
>> > +++ b/Documentation/filesystems/proc.txt
>> > @@ -125,6 +125,13 @@ process running on the system, which is named after the process ID (PID).
>> >  The link  self  points  to  the  process reading the file system. Each process
>> >  subdirectory has the entries listed in Table 1-1.
>> >
>> > +Note that an open a file descriptor to /proc/<pid> or to any of its
>
> "open file descriptor" (the "a" is unnecessary)

Thanks for spotting that. I had to re-read that sentence three times
or so before even seeing that extra "a".

>> > +contained files or subdirectories does not prevent <pid> being reused
>> > +for some other process in the event that <pid> exits. Operations on
>> > +open /proc/<pid> file descriptors corresponding to dead processes
>> > +never act on any new process that the kernel may, through chance, have
>> > +also assigned the process ID <pid>. Instead, operations on these FDs
>> > +usually fail with ESRCH.
>
> The paragraph is a bit wordy.  More pithy:
>
> An open file descriptor for /proc/<pid> (or any of the files or
> subdirectories in it) does not prevent <pid> from being reused after
> the process exits.  Operations on a file descriptor referring to a dead
> process usually return ESRCH.  They do not act on any new process which
> has been assigned the same <pid>.

I'll send a new patch version --- unless we can just tweak the patch
when we merge it into the tree?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH v2] Document /proc/pid PID reuse behavior
  2018-11-07 17:04         ` Martin Steigerwald
@ 2018-11-08 12:02           ` David Laight
  2018-11-08 12:27             ` Matthew Wilcox
  2018-11-08 13:25           ` Michal Hocko
  1 sibling, 1 reply; 30+ messages in thread
From: David Laight @ 2018-11-08 12:02 UTC (permalink / raw)
  To: 'Martin Steigerwald', Michal Hocko
  Cc: Daniel Colascione, linux-kernel, rppt, Tim Murray,
	Joel Fernandes, Suren Baghdasaryan, Jonathan Corbet,
	Andrew Morton, Roman Gushchin, Mike Rapoport, Vlastimil Babka,
	Kirill A. Shutemov, Dennis Zhou (Facebook),
	Prashant Dhamdhere, open list:DOCUMENTATION

From: Martin Steigerwald
> Sent: 07 November 2018 17:05
...
> Its not quite on-topic, but I am curious now: AFAIK PID limit is 16
> bits. Right? Could it be raised to 32 bits? I bet it would be a major
> change throughout different parts of the kernel.

It is probably 15 bits (since -ve pid numbers are used for process groups).

My guess is that userspace and the system call interface will handle 32bit
(signed) pid numbers.
(I don't remember 'linux emulation' being one of the emulations that
would truncate 32bit pids when one of the BDSs went to 32bit pids.)
The main problem will be that big numbers will mess up the alignment
of printouts from ps and top (etc).
This can be mitigated by only allocating 'big' numbers on systems
that have a lot of pids.
You also really want an O(1) allocator.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] Document /proc/pid PID reuse behavior
  2018-11-08 12:02           ` David Laight
@ 2018-11-08 12:27             ` Matthew Wilcox
  2018-11-08 13:42               ` David Laight
  0 siblings, 1 reply; 30+ messages in thread
From: Matthew Wilcox @ 2018-11-08 12:27 UTC (permalink / raw)
  To: David Laight
  Cc: 'Martin Steigerwald',
	Michal Hocko, Daniel Colascione, linux-kernel, rppt, Tim Murray,
	Joel Fernandes, Suren Baghdasaryan, Jonathan Corbet,
	Andrew Morton, Roman Gushchin, Mike Rapoport, Vlastimil Babka,
	Kirill A. Shutemov, Dennis Zhou (Facebook),
	Prashant Dhamdhere, open list:DOCUMENTATION

On Thu, Nov 08, 2018 at 12:02:49PM +0000, David Laight wrote:
> From: Martin Steigerwald
> > Sent: 07 November 2018 17:05
> ...
> > Its not quite on-topic, but I am curious now: AFAIK PID limit is 16
> > bits. Right? Could it be raised to 32 bits? I bet it would be a major
> > change throughout different parts of the kernel.
> 
> It is probably 15 bits (since -ve pid numbers are used for process groups).
> 
> My guess is that userspace and the system call interface will handle 32bit
> (signed) pid numbers.
> (I don't remember 'linux emulation' being one of the emulations that
> would truncate 32bit pids when one of the BDSs went to 32bit pids.)
> The main problem will be that big numbers will mess up the alignment
> of printouts from ps and top (etc).
> This can be mitigated by only allocating 'big' numbers on systems
> that have a lot of pids.
> You also really want an O(1) allocator.

The allocator is O(log n) -- it's the IDR allocator, used in cyclic mode.
n in this case is the highest ID which is still in use.  The tree is
log_64(n) levels high.  It walks to the bottom of the tree and puts a
pointer into the tree.  If the cursor has wrapped to the beginning of
the tree, it may encounter a PID which is still in use; if it does,
it does a bitmap scan of that node, and will then walk up the tree,
doing a bitmap scan forward at each level until it finds a free PID.

So it's not exactly O(log(n)), but it's close enough for all practical
purposes.  And more importantly, it doesn't touch a lot of cachelines.
Two or three at each level of the tree that it accesses.  If we went
all the way to a 32-bit PID, the tree would grow to 6 levels deep,
and worst-case would touch 6 + 5 + 4 levels of the tree (starting with
trying to allocate PID 0xffffffff, failing, trying to allocate PID 300,
then having to walk all the way forward to find PID 0xe0000000), so
that's only 45 cachelines.

People care a little too much about O(1)/O(n) behaviour.  Cacheline
behaviour, and good average-case performance without falling off a cliff
in the worst case is much more important.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] Document /proc/pid PID reuse behavior
  2018-11-07 17:04         ` Martin Steigerwald
  2018-11-08 12:02           ` David Laight
@ 2018-11-08 13:25           ` Michal Hocko
  1 sibling, 0 replies; 30+ messages in thread
From: Michal Hocko @ 2018-11-08 13:25 UTC (permalink / raw)
  To: Martin Steigerwald
  Cc: Daniel Colascione, linux-kernel, rppt, Tim Murray,
	Joel Fernandes, Suren Baghdasaryan, Jonathan Corbet,
	Andrew Morton, Roman Gushchin, Mike Rapoport, Vlastimil Babka,
	Kirill A. Shutemov, Dennis Zhou (Facebook),
	Prashant Dhamdhere, open list:DOCUMENTATION

On Wed 07-11-18 18:04:59, Martin Steigerwald wrote:
> Michal Hocko - 07.11.18, 17:00:
> > > > otherwise anybody could simply DoS the system
> > > > by consuming all available pids.
> > > 
> > > People can do that today using the instrument of terror widely known
> > > as fork(2). The only thing standing between fork(2) and a full
> > > process table is RLIMIT_NPROC.
> > 
> > not really. If you really do care about pid space depletion then you
> > should use pid cgroup controller.
> 
> Its not quite on-topic, but I am curious now: AFAIK PID limit is 16 
> bits. Right? Could it be raised to 32 bits? I bet it would be a major 
> change throughout different parts of the kernel.
> 
> 16 bits sound a bit low these days, not only for PIDs, but also for 
> connections / ports.

Do you have any specific example of the pid space exhaustion? Well
except for a fork bomb attacks that could be mitigated by the pid cgroup
controller.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH v2] Document /proc/pid PID reuse behavior
  2018-11-08 12:27             ` Matthew Wilcox
@ 2018-11-08 13:42               ` David Laight
  2018-11-08 14:07                 ` Matthew Wilcox
  0 siblings, 1 reply; 30+ messages in thread
From: David Laight @ 2018-11-08 13:42 UTC (permalink / raw)
  To: 'Matthew Wilcox'
  Cc: 'Martin Steigerwald',
	Michal Hocko, Daniel Colascione, linux-kernel, rppt, Tim Murray,
	Joel Fernandes, Suren Baghdasaryan, Jonathan Corbet,
	Andrew Morton, Roman Gushchin, Mike Rapoport, Vlastimil Babka,
	Kirill A. Shutemov, Dennis Zhou (Facebook),
	Prashant Dhamdhere, open list:DOCUMENTATION

From: Matthew Wilcox
> Sent: 08 November 2018 12:28
>
> On Thu, Nov 08, 2018 at 12:02:49PM +0000, David Laight wrote:
> > From: Martin Steigerwald
> > > Sent: 07 November 2018 17:05
> > ...
> > > Its not quite on-topic, but I am curious now: AFAIK PID limit is 16
> > > bits. Right? Could it be raised to 32 bits? I bet it would be a major
> > > change throughout different parts of the kernel.
> >
> > It is probably 15 bits (since -ve pid numbers are used for process groups).
> >
> > My guess is that userspace and the system call interface will handle 32bit
> > (signed) pid numbers.
> > (I don't remember 'linux emulation' being one of the emulations that
> > would truncate 32bit pids when one of the BDSs went to 32bit pids.)
> > The main problem will be that big numbers will mess up the alignment
> > of printouts from ps and top (etc).
> > This can be mitigated by only allocating 'big' numbers on systems
> > that have a lot of pids.
> > You also really want an O(1) allocator.
> 
> The allocator is O(log n) -- it's the IDR allocator, used in cyclic mode.
> n in this case is the highest ID which is still in use.  The tree is
> log_64(n) levels high.  It walks to the bottom of the tree and puts a
> pointer into the tree.  If the cursor has wrapped to the beginning of
> the tree, it may encounter a PID which is still in use; if it does,
> it does a bitmap scan of that node, and will then walk up the tree,
> doing a bitmap scan forward at each level until it finds a free PID.

Right, but you can choose the pid so that you get a perfect hash.
You can then put a FIFO free list through the unused entries of
the hash table (just an array).
Then pid allocate just picks the oldest free entry and ups the
high bits (that the hash masks out) to make the old value stale.
Almost no cache lines are involved in the whole operation.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] Document /proc/pid PID reuse behavior
  2018-11-08 13:42               ` David Laight
@ 2018-11-08 14:07                 ` Matthew Wilcox
  2018-11-08 14:14                   ` David Laight
  0 siblings, 1 reply; 30+ messages in thread
From: Matthew Wilcox @ 2018-11-08 14:07 UTC (permalink / raw)
  To: David Laight
  Cc: 'Martin Steigerwald',
	Michal Hocko, Daniel Colascione, linux-kernel, rppt, Tim Murray,
	Joel Fernandes, Suren Baghdasaryan, Jonathan Corbet,
	Andrew Morton, Roman Gushchin, Mike Rapoport, Vlastimil Babka,
	Kirill A. Shutemov, Dennis Zhou (Facebook),
	Prashant Dhamdhere, open list:DOCUMENTATION

On Thu, Nov 08, 2018 at 01:42:41PM +0000, David Laight wrote:
> From: Matthew Wilcox
> > On Thu, Nov 08, 2018 at 12:02:49PM +0000, David Laight wrote:
> > > This can be mitigated by only allocating 'big' numbers on systems
> > > that have a lot of pids.
> > > You also really want an O(1) allocator.
> > 
> > The allocator is O(log n) -- it's the IDR allocator, used in cyclic mode.
> > n in this case is the highest ID which is still in use.  The tree is
> > log_64(n) levels high.  It walks to the bottom of the tree and puts a
> > pointer into the tree.  If the cursor has wrapped to the beginning of
> > the tree, it may encounter a PID which is still in use; if it does,
> > it does a bitmap scan of that node, and will then walk up the tree,
> > doing a bitmap scan forward at each level until it finds a free PID.
> 
> Right, but you can choose the pid so that you get a perfect hash.
> You can then put a FIFO free list through the unused entries of
> the hash table (just an array).
> Then pid allocate just picks the oldest free entry and ups the
> high bits (that the hash masks out) to make the old value stale.
> Almost no cache lines are involved in the whole operation.

You'd be looking for something like dynamic perfect hashing then?
I think that'd make a great project for someone to try out.  I don't
have the time to look into it myself, and I'm not convinced there's
a real workload that would benefit.  But it'd be a great project for
a university student.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH v2] Document /proc/pid PID reuse behavior
  2018-11-08 14:07                 ` Matthew Wilcox
@ 2018-11-08 14:14                   ` David Laight
  0 siblings, 0 replies; 30+ messages in thread
From: David Laight @ 2018-11-08 14:14 UTC (permalink / raw)
  To: 'Matthew Wilcox'
  Cc: 'Martin Steigerwald',
	Michal Hocko, Daniel Colascione, linux-kernel, rppt, Tim Murray,
	Joel Fernandes, Suren Baghdasaryan, Jonathan Corbet,
	Andrew Morton, Roman Gushchin, Mike Rapoport, Vlastimil Babka,
	Kirill A. Shutemov, Dennis Zhou (Facebook),
	Prashant Dhamdhere, open list:DOCUMENTATION

From: Matthew Wilcox
> Sent: 08 November 2018 14:07
...
> 
> You'd be looking for something like dynamic perfect hashing then?
> I think that'd make a great project for someone to try out.  I don't
> have the time to look into it myself, and I'm not convinced there's
> a real workload that would benefit.  But it'd be a great project for
> a university student.

Been there, done that several times.
Half an afternoon's work.

I'm surprised there isn't a C++ container class that will give
you a 'id' for a pointer (or other large value).
But they all do it the other way around.

It is also trivial to double the size of the array and 'unzip'
the allocated items into their correct halves while adding the
other entry to the free list.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] Document /proc/pid PID reuse behavior
  2018-11-05 13:22 ` [PATCH v2] " Daniel Colascione
  2018-11-06  6:01   ` Mike Rapoport
  2018-11-06 13:05   ` Michal Hocko
@ 2018-11-19 10:54   ` Pavel Machek
  2018-11-19 16:24     ` Daniel Colascione
  2018-11-20  9:05     ` Vlastimil Babka
  2 siblings, 2 replies; 30+ messages in thread
From: Pavel Machek @ 2018-11-19 10:54 UTC (permalink / raw)
  To: Daniel Colascione
  Cc: linux-kernel, rppt, timmurray, joelaf, surenb, Jonathan Corbet,
	Andrew Morton, Roman Gushchin, Mike Rapoport, Vlastimil Babka,
	Kirill A. Shutemov, Dennis Zhou (Facebook),
	Prashant Dhamdhere, open list:DOCUMENTATION

[-- Attachment #1: Type: text/plain, Size: 1501 bytes --]

On Mon 2018-11-05 13:22:05, Daniel Colascione wrote:
> State explicitly that holding a /proc/pid file descriptor open does
> not reserve the PID. Also note that in the event of PID reuse, these
> open file descriptors refer to the old, now-dead process, and not the
> new one that happens to be named the same numeric PID.
> 
> Signed-off-by: Daniel Colascione <dancol@google.com>
> ---
>  Documentation/filesystems/proc.txt | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> Moved paragraphed to start of /proc/pid section; added signed-off-by.
> 
> diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
> index 12a5e6e693b6..0b14460f721d 100644
> --- a/Documentation/filesystems/proc.txt
> +++ b/Documentation/filesystems/proc.txt
> @@ -125,6 +125,13 @@ process running on the system, which is named after the process ID (PID).
>  The link  self  points  to  the  process reading the file system. Each process
>  subdirectory has the entries listed in Table 1-1.
>  
> +Note that an open a file descriptor to /proc/<pid> or to any of its
> +contained files or subdirectories does not prevent <pid> being reused
> +for some other process in the event that <pid> exits. Operations on

"does not" -> "may not"?

We want to leave this unspecified, so that we can change it in future.
									Pavel


-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] Document /proc/pid PID reuse behavior
  2018-11-07 16:10         ` Daniel Colascione
  2018-11-07 16:19           ` Michal Hocko
@ 2018-11-19 11:16           ` Aleksa Sarai
  1 sibling, 0 replies; 30+ messages in thread
From: Aleksa Sarai @ 2018-11-19 11:16 UTC (permalink / raw)
  To: Daniel Colascione
  Cc: Michal Hocko, linux-kernel, rppt, Tim Murray, Joel Fernandes,
	Suren Baghdasaryan, Jonathan Corbet, Andrew Morton,
	Roman Gushchin, Mike Rapoport, Vlastimil Babka,
	Kirill A. Shutemov, Dennis Zhou (Facebook),
	Prashant Dhamdhere, open list:DOCUMENTATION

[-- Attachment #1: Type: text/plain, Size: 1483 bytes --]

On 2018-11-07, Daniel Colascione <dancol@google.com> wrote:
> On Wed, Nov 7, 2018 at 4:00 PM, Michal Hocko <mhocko@kernel.org> wrote:
> > On Wed 07-11-18 15:48:20, Daniel Colascione wrote:
> >> On Tue, Nov 6, 2018 at 1:05 PM, Michal Hocko <mhocko@kernel.org> wrote:
> >> > otherwise anybody could simply DoS the system
> >> > by consuming all available pids.
> >>
> >> People can do that today using the instrument of terror widely known
> >> as fork(2). The only thing standing between fork(2) and a full process
> >> table is RLIMIT_NPROC.
> >
> > not really.
> 
> What else, besides memory consumption and (as you mention below)
> cgroups? In practice, nobody uses RLIMIT_NPROC, so outside of various
> container-y namespaced setups, avoidance of
> system-DoS-through-PID-exhaustion isn't a pressing problem.

systemd has had a default pid cgroup controller policy (for both user
and system slices) for a quite long time. I believe that the most recent
version of most enterprise and community distributions support it by
default (and probably even some older versions -- commit 49b786ea146f
was merged in 2015 and I think systemd grew support for it in 2016).

I agree with your overall point, but it should be noted that the vast
majority of Linux systems these days have protections against this (by
default) that use the pids cgroup controller.

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] Document /proc/pid PID reuse behavior
  2018-11-19 10:54   ` Pavel Machek
@ 2018-11-19 16:24     ` Daniel Colascione
  2018-11-20  8:50       ` Pavel Machek
  2018-11-20  9:05     ` Vlastimil Babka
  1 sibling, 1 reply; 30+ messages in thread
From: Daniel Colascione @ 2018-11-19 16:24 UTC (permalink / raw)
  To: Pavel Machek
  Cc: linux-kernel, Mike Rapoport, Tim Murray, Joel Fernandes,
	Suren Baghdasaryan, Jonathan Corbet, Andrew Morton,
	Roman Gushchin, Mike Rapoport, Vlastimil Babka,
	Kirill A. Shutemov, Dennis Zhou (Facebook),
	Prashant Dhamdhere, open list:DOCUMENTATION

On Mon, Nov 19, 2018 at 2:54 AM, Pavel Machek <pavel@ucw.cz> wrote:
> On Mon 2018-11-05 13:22:05, Daniel Colascione wrote:
>> State explicitly that holding a /proc/pid file descriptor open does
>> not reserve the PID. Also note that in the event of PID reuse, these
>> open file descriptors refer to the old, now-dead process, and not the
>> new one that happens to be named the same numeric PID.
>>
>> Signed-off-by: Daniel Colascione <dancol@google.com>
>> ---
>>  Documentation/filesystems/proc.txt | 7 +++++++
>>  1 file changed, 7 insertions(+)
>>
>> Moved paragraphed to start of /proc/pid section; added signed-off-by.
>>
>> diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
>> index 12a5e6e693b6..0b14460f721d 100644
>> --- a/Documentation/filesystems/proc.txt
>> +++ b/Documentation/filesystems/proc.txt
>> @@ -125,6 +125,13 @@ process running on the system, which is named after the process ID (PID).
>>  The link  self  points  to  the  process reading the file system. Each process
>>  subdirectory has the entries listed in Table 1-1.
>>
>> +Note that an open a file descriptor to /proc/<pid> or to any of its
>> +contained files or subdirectories does not prevent <pid> being reused
>> +for some other process in the event that <pid> exits. Operations on
>
> "does not" -> "may not"?
>
> We want to leave this unspecified, so that we can change it in future.

No. "Does not". I'm sick and tired of procfs behavior being vague and
unspecified to the point where even kernel developers have an
incorrect mental model of how it all works. With Christian Brauner's
good work in place and hopefully my own work to follow, we're moving
firmly in the direction of procfs handles being struct pid references.
Instead of waffling further, let's just buy into this model and do the
best job we can of making it work well.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] Document /proc/pid PID reuse behavior
  2018-11-19 16:24     ` Daniel Colascione
@ 2018-11-20  8:50       ` Pavel Machek
  0 siblings, 0 replies; 30+ messages in thread
From: Pavel Machek @ 2018-11-20  8:50 UTC (permalink / raw)
  To: Daniel Colascione
  Cc: linux-kernel, Mike Rapoport, Tim Murray, Joel Fernandes,
	Suren Baghdasaryan, Jonathan Corbet, Andrew Morton,
	Roman Gushchin, Mike Rapoport, Vlastimil Babka,
	Kirill A. Shutemov, Dennis Zhou (Facebook),
	Prashant Dhamdhere, open list:DOCUMENTATION

[-- Attachment #1: Type: text/plain, Size: 2413 bytes --]

On Mon 2018-11-19 08:24:02, Daniel Colascione wrote:
> On Mon, Nov 19, 2018 at 2:54 AM, Pavel Machek <pavel@ucw.cz> wrote:
> > On Mon 2018-11-05 13:22:05, Daniel Colascione wrote:
> >> State explicitly that holding a /proc/pid file descriptor open does
> >> not reserve the PID. Also note that in the event of PID reuse, these
> >> open file descriptors refer to the old, now-dead process, and not the
> >> new one that happens to be named the same numeric PID.
> >>
> >> Signed-off-by: Daniel Colascione <dancol@google.com>
> >> ---
> >>  Documentation/filesystems/proc.txt | 7 +++++++
> >>  1 file changed, 7 insertions(+)
> >>
> >> Moved paragraphed to start of /proc/pid section; added signed-off-by.
> >>
> >> diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
> >> index 12a5e6e693b6..0b14460f721d 100644
> >> --- a/Documentation/filesystems/proc.txt
> >> +++ b/Documentation/filesystems/proc.txt
> >> @@ -125,6 +125,13 @@ process running on the system, which is named after the process ID (PID).
> >>  The link  self  points  to  the  process reading the file system. Each process
> >>  subdirectory has the entries listed in Table 1-1.
> >>
> >> +Note that an open a file descriptor to /proc/<pid> or to any of its
> >> +contained files or subdirectories does not prevent <pid> being reused
> >> +for some other process in the event that <pid> exits. Operations on
> >
> > "does not" -> "may not"?
> >
> > We want to leave this unspecified, so that we can change it in future.
> 
> No. "Does not". I'm sick and tired of procfs behavior being vague and
> unspecified to the point where even kernel developers have an

You being sick and tired does not mean this is good idea.

> incorrect mental model of how it all works. With Christian Brauner's
> good work in place and hopefully my own work to follow, we're moving
> firmly in the direction of procfs handles being struct pid references.
> Instead of waffling further, let's just buy into this model and do the
> best job we can of making it work well.

So basically decision is being made now, without seeing Christian's
and your good work. That's not same thing as documentation...

Please don't do this.

NAK.
									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] Document /proc/pid PID reuse behavior
  2018-11-19 10:54   ` Pavel Machek
  2018-11-19 16:24     ` Daniel Colascione
@ 2018-11-20  9:05     ` Vlastimil Babka
  2018-11-20  9:18       ` Pavel Machek
                         ` (2 more replies)
  1 sibling, 3 replies; 30+ messages in thread
From: Vlastimil Babka @ 2018-11-20  9:05 UTC (permalink / raw)
  To: Pavel Machek, Daniel Colascione
  Cc: linux-kernel, rppt, timmurray, joelaf, surenb, Jonathan Corbet,
	Andrew Morton, Roman Gushchin, Mike Rapoport, Kirill A. Shutemov,
	Dennis Zhou (Facebook),
	Prashant Dhamdhere, open list:DOCUMENTATION

On 11/19/18 11:54 AM, Pavel Machek wrote:
> On Mon 2018-11-05 13:22:05, Daniel Colascione wrote:
>> State explicitly that holding a /proc/pid file descriptor open does
>> not reserve the PID. Also note that in the event of PID reuse, these
>> open file descriptors refer to the old, now-dead process, and not the
>> new one that happens to be named the same numeric PID.
>>
>> Signed-off-by: Daniel Colascione <dancol@google.com>
>> ---
>>  Documentation/filesystems/proc.txt | 7 +++++++
>>  1 file changed, 7 insertions(+)
>>
>> Moved paragraphed to start of /proc/pid section; added signed-off-by.
>>
>> diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
>> index 12a5e6e693b6..0b14460f721d 100644
>> --- a/Documentation/filesystems/proc.txt
>> +++ b/Documentation/filesystems/proc.txt
>> @@ -125,6 +125,13 @@ process running on the system, which is named after the process ID (PID).
>>  The link  self  points  to  the  process reading the file system. Each process
>>  subdirectory has the entries listed in Table 1-1.
>>  
>> +Note that an open a file descriptor to /proc/<pid> or to any of its
>> +contained files or subdirectories does not prevent <pid> being reused
>> +for some other process in the event that <pid> exits. Operations on
> 
> "does not" -> "may not"?
> 
> We want to leave this unspecified, so that we can change it in future.

Why can't the documentation describe the current implementation, and
change in the future if the implementation changes? I doubt somebody
would ever rely on the pid being reused while having the descriptor
open. How would that make sense?
 									Pavel
> 
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] Document /proc/pid PID reuse behavior
  2018-11-20  9:05     ` Vlastimil Babka
@ 2018-11-20  9:18       ` Pavel Machek
  2018-11-20 17:39         ` Matthew Wilcox
  2018-11-20 16:37       ` Joel Fernandes
  2018-11-20 16:49       ` Jonathan Corbet
  2 siblings, 1 reply; 30+ messages in thread
From: Pavel Machek @ 2018-11-20  9:18 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Daniel Colascione, linux-kernel, rppt, timmurray, joelaf, surenb,
	Jonathan Corbet, Andrew Morton, Roman Gushchin, Mike Rapoport,
	Kirill A. Shutemov, Dennis Zhou (Facebook),
	Prashant Dhamdhere, open list:DOCUMENTATION

[-- Attachment #1: Type: text/plain, Size: 2157 bytes --]

On Tue 2018-11-20 10:05:21, Vlastimil Babka wrote:
> On 11/19/18 11:54 AM, Pavel Machek wrote:
> > On Mon 2018-11-05 13:22:05, Daniel Colascione wrote:
> >> State explicitly that holding a /proc/pid file descriptor open does
> >> not reserve the PID. Also note that in the event of PID reuse, these
> >> open file descriptors refer to the old, now-dead process, and not the
> >> new one that happens to be named the same numeric PID.
> >>
> >> Signed-off-by: Daniel Colascione <dancol@google.com>
> >> ---
> >>  Documentation/filesystems/proc.txt | 7 +++++++
> >>  1 file changed, 7 insertions(+)
> >>
> >> Moved paragraphed to start of /proc/pid section; added signed-off-by.
> >>
> >> diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
> >> index 12a5e6e693b6..0b14460f721d 100644
> >> --- a/Documentation/filesystems/proc.txt
> >> +++ b/Documentation/filesystems/proc.txt
> >> @@ -125,6 +125,13 @@ process running on the system, which is named after the process ID (PID).
> >>  The link  self  points  to  the  process reading the file system. Each process
> >>  subdirectory has the entries listed in Table 1-1.
> >>  
> >> +Note that an open a file descriptor to /proc/<pid> or to any of its
> >> +contained files or subdirectories does not prevent <pid> being reused
> >> +for some other process in the event that <pid> exits. Operations on
> > 
> > "does not" -> "may not"?
> > 
> > We want to leave this unspecified, so that we can change it in future.
> 
> Why can't the documentation describe the current implementation, and
> change in the future if the implementation changes? I doubt somebody

Documentation should describe "contract" between kernel and userspace.

> would ever rely on the pid being reused while having the descriptor
> open. How would that make sense?

I agree this is corner space, but users might be surprised that
keeping FDs of /proc/pid/X would lead to PID space exhaustion, for
example.

								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] Document /proc/pid PID reuse behavior
  2018-11-20  9:05     ` Vlastimil Babka
  2018-11-20  9:18       ` Pavel Machek
@ 2018-11-20 16:37       ` Joel Fernandes
  2018-11-20 16:49       ` Jonathan Corbet
  2 siblings, 0 replies; 30+ messages in thread
From: Joel Fernandes @ 2018-11-20 16:37 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: pavel, Daniel Colascione, LKML, rppt, Tim Murray,
	Suren Baghdasaryan, Jonathan Corbet, Andrew Morton, guro, rppt,
	kirill.shutemov, dennisszhou, pdhamdhe, open list:DOCUMENTATION

On Tue, Nov 20, 2018 at 1:05 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 11/19/18 11:54 AM, Pavel Machek wrote:
> > On Mon 2018-11-05 13:22:05, Daniel Colascione wrote:
> >> State explicitly that holding a /proc/pid file descriptor open does
> >> not reserve the PID. Also note that in the event of PID reuse, these
> >> open file descriptors refer to the old, now-dead process, and not the
> >> new one that happens to be named the same numeric PID.
> >>
> >> Signed-off-by: Daniel Colascione <dancol@google.com>
> >> ---
> >>  Documentation/filesystems/proc.txt | 7 +++++++
> >>  1 file changed, 7 insertions(+)
> >>
> >> Moved paragraphed to start of /proc/pid section; added signed-off-by.
> >>
> >> diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
> >> index 12a5e6e693b6..0b14460f721d 100644
> >> --- a/Documentation/filesystems/proc.txt
> >> +++ b/Documentation/filesystems/proc.txt
> >> @@ -125,6 +125,13 @@ process running on the system, which is named after the process ID (PID).
> >>  The link  self  points  to  the  process reading the file system. Each process
> >>  subdirectory has the entries listed in Table 1-1.
> >>
> >> +Note that an open a file descriptor to /proc/<pid> or to any of its
> >> +contained files or subdirectories does not prevent <pid> being reused
> >> +for some other process in the event that <pid> exits. Operations on
> >
> > "does not" -> "may not"?
> >
> > We want to leave this unspecified, so that we can change it in future.
>
> Why can't the documentation describe the current implementation, and
> change in the future if the implementation changes? I doubt somebody
> would ever rely on the pid being reused while having the descriptor
> open. How would that make sense?
>

Agreed. I am also of the opinion that this should be documented.

 - Joel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] Document /proc/pid PID reuse behavior
  2018-11-20  9:05     ` Vlastimil Babka
  2018-11-20  9:18       ` Pavel Machek
  2018-11-20 16:37       ` Joel Fernandes
@ 2018-11-20 16:49       ` Jonathan Corbet
  2018-11-20 16:57         ` Pavel Machek
  2 siblings, 1 reply; 30+ messages in thread
From: Jonathan Corbet @ 2018-11-20 16:49 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Pavel Machek, Daniel Colascione, linux-kernel, rppt, timmurray,
	joelaf, surenb, Andrew Morton, Roman Gushchin, Mike Rapoport,
	Kirill A. Shutemov, Dennis Zhou (Facebook),
	Prashant Dhamdhere, open list:DOCUMENTATION

On Tue, 20 Nov 2018 10:05:21 +0100
Vlastimil Babka <vbabka@suse.cz> wrote:

> Why can't the documentation describe the current implementation, and
> change in the future if the implementation changes? I doubt somebody
> would ever rely on the pid being reused while having the descriptor
> open. How would that make sense?

In the hopes of ending this discussion, I'm going to go ahead and apply
this.  Documenting current behavior is good, especially in situations
where that behavior can surprise people; if the implementation changes,
the docs can change with it.

Thanks,

jon

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] Document /proc/pid PID reuse behavior
  2018-11-20 16:49       ` Jonathan Corbet
@ 2018-11-20 16:57         ` Pavel Machek
  0 siblings, 0 replies; 30+ messages in thread
From: Pavel Machek @ 2018-11-20 16:57 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Vlastimil Babka, Daniel Colascione, linux-kernel, rppt,
	timmurray, joelaf, surenb, Andrew Morton, Roman Gushchin,
	Mike Rapoport, Kirill A. Shutemov, Dennis Zhou (Facebook),
	Prashant Dhamdhere, open list:DOCUMENTATION

[-- Attachment #1: Type: text/plain, Size: 985 bytes --]

On Tue 2018-11-20 09:49:50, Jonathan Corbet wrote:
> On Tue, 20 Nov 2018 10:05:21 +0100
> Vlastimil Babka <vbabka@suse.cz> wrote:
> 
> > Why can't the documentation describe the current implementation, and
> > change in the future if the implementation changes? I doubt somebody
> > would ever rely on the pid being reused while having the descriptor
> > open. How would that make sense?
> 
> In the hopes of ending this discussion, I'm going to go ahead and apply
> this.  Documenting current behavior is good, especially in situations
> where that behavior can surprise people; if the implementation changes,
> the docs can change with it.

I'd still prefer changing from "does not" to "may not".

It is really simple change, and once we documented a behaviour, we
really should not be changing it.

Thanks,
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] Document /proc/pid PID reuse behavior
  2018-11-20  9:18       ` Pavel Machek
@ 2018-11-20 17:39         ` Matthew Wilcox
  2018-11-20 17:48           ` Daniel Colascione
  0 siblings, 1 reply; 30+ messages in thread
From: Matthew Wilcox @ 2018-11-20 17:39 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Vlastimil Babka, Daniel Colascione, linux-kernel, rppt,
	timmurray, joelaf, surenb, Jonathan Corbet, Andrew Morton,
	Roman Gushchin, Mike Rapoport, Kirill A. Shutemov,
	Dennis Zhou (Facebook),
	Prashant Dhamdhere, open list:DOCUMENTATION

On Tue, Nov 20, 2018 at 10:18:29AM +0100, Pavel Machek wrote:
> > would ever rely on the pid being reused while having the descriptor
> > open. How would that make sense?
> 
> I agree this is corner space, but users might be surprised that
> keeping FDs of /proc/pid/X would lead to PID space exhaustion, for
> example.

We have a limit on the number of FDs a process can have open for a reason.
Well, for many reasons.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] Document /proc/pid PID reuse behavior
  2018-11-20 17:39         ` Matthew Wilcox
@ 2018-11-20 17:48           ` Daniel Colascione
  2018-11-20 17:59             ` Matthew Wilcox
  0 siblings, 1 reply; 30+ messages in thread
From: Daniel Colascione @ 2018-11-20 17:48 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Pavel Machek, Vlastimil Babka, linux-kernel, Mike Rapoport,
	Tim Murray, Joel Fernandes, Suren Baghdasaryan, Jonathan Corbet,
	Andrew Morton, Roman Gushchin, Mike Rapoport, Kirill A. Shutemov,
	Dennis Zhou (Facebook),
	Prashant Dhamdhere, open list:DOCUMENTATION

On Tue, Nov 20, 2018 at 9:39 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Tue, Nov 20, 2018 at 10:18:29AM +0100, Pavel Machek wrote:
> > > would ever rely on the pid being reused while having the descriptor
> > > open. How would that make sense?
> >
> > I agree this is corner space, but users might be surprised that
> > keeping FDs of /proc/pid/X would lead to PID space exhaustion, for
> > example.
>
> We have a limit on the number of FDs a process can have open for a reason.
> Well, for many reasons.

And the typical limit is too low. (I've seen people clamp it to 1024
for some reason.) A file descriptor is just a handle to a kernel
resource. All kernel resources held on behalf of applications need
*some* kind of management interface. File descriptors provide a
consistent and uniform instance of such a management interface. Unless
there's a very good reason, nobody should be using non-FD handles for
kernel resource management. A low default FD table size limit is not
an example of one of these good reasons, not when we can raise FD
table size limit. In general, the software projects should not have to
put up with ugly workarounds for limitations they impose on
themselves.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] Document /proc/pid PID reuse behavior
  2018-11-20 17:48           ` Daniel Colascione
@ 2018-11-20 17:59             ` Matthew Wilcox
  0 siblings, 0 replies; 30+ messages in thread
From: Matthew Wilcox @ 2018-11-20 17:59 UTC (permalink / raw)
  To: Daniel Colascione
  Cc: Pavel Machek, Vlastimil Babka, linux-kernel, Mike Rapoport,
	Tim Murray, Joel Fernandes, Suren Baghdasaryan, Jonathan Corbet,
	Andrew Morton, Roman Gushchin, Mike Rapoport, Kirill A. Shutemov,
	Dennis Zhou (Facebook),
	Prashant Dhamdhere, open list:DOCUMENTATION

On Tue, Nov 20, 2018 at 09:48:27AM -0800, Daniel Colascione wrote:
> On Tue, Nov 20, 2018 at 9:39 AM Matthew Wilcox <willy@infradead.org> wrote:
> > We have a limit on the number of FDs a process can have open for a reason.
> > Well, for many reasons.
> 
> And the typical limit is too low. (I've seen people clamp it to 1024
> for some reason.)

1024 is the soft limit.  4096 is the default hard limit.  You can always
ask root to set your hard limit higher if that's what you need.

> A file descriptor is just a handle to a kernel
> resource. All kernel resources held on behalf of applications need
> *some* kind of management interface. File descriptors provide a
> consistent and uniform instance of such a management interface. Unless
> there's a very good reason, nobody should be using non-FD handles for
> kernel resource management. A low default FD table size limit is not
> an example of one of these good reasons, not when we can raise FD
> table size limit. In general, the software projects should not have to
> put up with ugly workarounds for limitations they impose on
> themselves.

I'm not really sure why you decided to go off on this rant.  My point to
Pavel was that there's no way a single process can tie up all of the PIDs.
Unless root decided to let them shoot everybody else in the system in
the foot.

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2018-11-20 18:00 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-31 15:06 [PATCH] Document /proc/pid PID reuse behavior Daniel Colascione
2018-11-01  7:08 ` Mike Rapoport
2018-11-05 13:22 ` [PATCH v2] " Daniel Colascione
2018-11-06  6:01   ` Mike Rapoport
2018-11-07 17:16     ` Matthew Wilcox
2018-11-07 18:21       ` Daniel Colascione
2018-11-06 13:05   ` Michal Hocko
2018-11-07 15:48     ` Daniel Colascione
2018-11-07 16:00       ` Michal Hocko
2018-11-07 16:10         ` Daniel Colascione
2018-11-07 16:19           ` Michal Hocko
2018-11-19 11:16           ` Aleksa Sarai
2018-11-07 17:04         ` Martin Steigerwald
2018-11-08 12:02           ` David Laight
2018-11-08 12:27             ` Matthew Wilcox
2018-11-08 13:42               ` David Laight
2018-11-08 14:07                 ` Matthew Wilcox
2018-11-08 14:14                   ` David Laight
2018-11-08 13:25           ` Michal Hocko
2018-11-19 10:54   ` Pavel Machek
2018-11-19 16:24     ` Daniel Colascione
2018-11-20  8:50       ` Pavel Machek
2018-11-20  9:05     ` Vlastimil Babka
2018-11-20  9:18       ` Pavel Machek
2018-11-20 17:39         ` Matthew Wilcox
2018-11-20 17:48           ` Daniel Colascione
2018-11-20 17:59             ` Matthew Wilcox
2018-11-20 16:37       ` Joel Fernandes
2018-11-20 16:49       ` Jonathan Corbet
2018-11-20 16:57         ` Pavel Machek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).