[Linux] Linux PID algorithm is BRAINDEAD!

* [Linux] Linux PID algorithm is BRAINDEAD!
@ 2015-10-10  2:00 Dave Goel
  2015-10-10  3:20 ` yalin wang
  2015-10-10 21:58 ` Theodore Ts'o
  0 siblings, 2 replies; 8+ messages in thread
From: Dave Goel @ 2015-10-10  2:00 UTC (permalink / raw)
  To: linux-kernel, Dave Goel

Pardon the subject line!  I think the PID algo. is actually pretty
good and cheap.

I just think that a very minor tweak could actually make it *actually* do
what it always intended to do (that is, satisfy the PID-AIM listed below)!
No expanded PID renumbering, no incompatibility introduction, nothing, just
a relatively *minor tweak*.

*** QUICK SUMMARY:

PROBLEM:  PID gets re-used immediately as soon as it is freed.

EXAMPLE: My program with PID 1323 uses temporary files that are of the form
PID-<something>. It finishes in two days By this time, linux has circled
PIDs back around five times. And, the counter happens to be right at 1323
right as the program finishes. As soon as the PID is freed, Linux thus
re-uses the PID 1323. A trash-cleaner program (I know, I know) gets
confused. 1323 exited uncleanly. The cleaner sees no 1323 running. It then
proceeds to delete the temp files. But, by that time, the kernel started
another 1323 which happens to use the very files. The cleaner ends up
deleting files for the wrong, running program!

This is just one example. There are many more examples of race conditions.

A TINY TWEAK AS SOLUTION:

The kernel already tries to do the right thing. The only minor tweak needed
is that:

***When Linux Re-Uses A Pid, It Uses The Pid That Has Been Free For
The Longest.***

That's it!

RESULT:

All PID-related race conditions are eliminated, period. No "good enough"
hacks needed any more by utilities trying to avoid race conditions. A
freshly freed PID will never get re-used immediately any more . No more race
conditions. (The only time you will ever see an immediate re-use any more is
when your machine actually has 32767(!)  or (2^22-1) processes!  And, by
that time, you have FAR bigger problems.)

----

**** DETAILS:

You don't have to google very much to see the 1000 algos and other bashisms
that exist to avoid the very race condition. For example, when you want to
read a PID list and deleting temporary files based on a PID. The concern
is that Linux might have created a new process with the same PID by the
time you read the file-list.  We could argue endlessly that these bashisms
are stupid and there are better ways. But, it seems to me that these better
ways are not foolproof either; they are themselves hacks. And, a very simple
tweak could alleviate 100% of these problems.

Now, 32768, or even 2^22 are actually very small numbers. OTOH, 2^200 is
not. In an ideal world, the PID would just sample from the 2^200 space and
declare that there will never be PID re-use or conflicts, period. If there's
a 32-bit limit, it could use multiple bytes and still use a 2^200 space,
etc.  But, all that would be a drastic change, which is why we are stuck
with the 2^15 space.

Is there a way to continue using the 2^15 (or 2^22) space, but more
reasonably?

I argue that the kernel already tries to satisfy a "PID-AIM" and already
tries to Do The Right Thing, but there's just a tiny thinko in its current
implementation that's easily corrected.

PID-AIM:
"No immediate re-use." The aim is to NOT immediately re-use a PID right
after it's been freed.  I argue that this aim can easily be satisfied, even
within the limited 2^15 space.

CURRENT IMPLEMENTATION:
The creators had the right idea. Linux indeed tries to satisfy the PID-AIM
condition. This is why it increments the counter even if the PID is actually
available.

But, looping happens (within a few hours for me). And, as soon as looping
happens, satisfying the PID-AIM goes out the window.

This tweak would ensure that immediate re-use never happens, period.

COMPLEXITY, ETC:

All that the entire system needs is one queue of free PIDs. Any time you
need a PID, take it from the head. Any time a PID is newly freed, push it at
the back of the queue.  That's it! The overhead seems minimal to me.

The queue is initially populated by 2-32768, of course.

In fact, we could even use a smaller queue and not even activate the
queue-mode till it is actually necessary; we could use various optimizing
conditions and shortcuts, say, to push PIDs in the queue in batches. After
all, it's not STRICTLY necessary to maintain exactly the correct order. The
ONLY aim we really want to satisfy is that the latest-freed PID NOT go near
the *head* of the queue; that's all.  Also, the queue is only even needed in
the first place until you've actually looped around.  So, tiny rescue disk
bootups would not even need to go the queue-mode.. (unless they've been
running many days.)

(Thanks to jd_1 and _mjg on for humoring and discussing this idea when I
presented it on ##kernel.)

Dave Goel (Please CC responses.)

^ permalink raw reply	[flat|nested] 8+ messages in thread