All of lore.kernel.org
 help / color / mirror / Atom feed
* [Patch 1/1] init: Provide a kernel start parameter to increase pid_max
@ 2010-04-21  1:40 Mike Travis
  2010-04-21  1:52 ` [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2 Mike Travis
  0 siblings, 1 reply; 32+ messages in thread
From: Mike Travis @ 2010-04-21  1:40 UTC (permalink / raw)
  To: Ingo Molnar, Greg Kroah-Hartman, Linus Torvalds
  Cc: Hedi Berriche, Jack Steiner, Andrew Morton, Robin Holt, LKML

Subject: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max
From: Hedi Berriche <hedi@sgi.com>

On a system with a substantial number of processors, the early default pid_max
of 32k will not be enough.  A system with 1664 CPU's, there are 25163 processes
started before the login prompt.  It's estimated that with 2048 CPU's we will pass
the 32k limit.  With 4096, we'll reach that limit very early during the boot cycle,
and processes would stall waiting for an available pid.

This provides a kernel start parameter to increase the early maximum number of
pids available.  It does not change any of the defaults.

Signed-off-by: Hedi Berriche <hedi@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Robin Holt <holt@sgi.com>

---
 Documentation/kernel-parameters.txt |   11 +++++++++++
 kernel/pid.c                        |   30 ++++++++++++++++++++++++++++++
 2 files changed, 41 insertions(+)

--- linux-2.6.32.orig/Documentation/kernel-parameters.txt
+++ linux-2.6.32/Documentation/kernel-parameters.txt
@@ -1327,6 +1327,17 @@ and is between 256 and 4096 characters.
 	max_luns=	[SCSI] Maximum number of LUNs to probe.
 			Should be between 1 and 2^32-1.
 
+	max_pid=nn[KMG]	[KNL] Maximum number of PID's to use.  On a system
+			with a large amount of processors, the default
+			pid_max may not be sufficient to allow the system
+			to boot.  The range of allowed values is limited from
+			pid_max_min to pid_max_max (configuration dependent.)
+			See kernel/pid.c and include/linux/threads.h for
+			specific values.  Note that specifying a value
+			too small may cause the system to fail to boot,
+			so that value is ignored.  Using a value too large,
+			and the largest allowed value will be used instead.
+
 	max_report_luns=
 			[SCSI] Maximum number of LUNs received.
 			Should be between 1 and 16384.
--- linux-2.6.32.orig/kernel/pid.c
+++ linux-2.6.32/kernel/pid.c
@@ -53,6 +53,36 @@ int pid_max_max = PID_MAX_LIMIT;
 #define BITS_PER_PAGE		(PAGE_SIZE*8)
 #define BITS_PER_PAGE_MASK	(BITS_PER_PAGE-1)
 
+static int __init set_pid_max(char *str)
+{
+	u64 maxp;
+
+	if (!str)
+		return -EINVAL;
+
+	maxp = memparse(str, &str);
+
+	if (maxp < pid_max_min) {
+		pr_warning(
+		    "pid_max smaller than minimum allowed value (%u)\n",
+			pid_max_min);
+		return -EINVAL;
+	}
+	if (maxp > pid_max_max) {
+		pr_warning(
+		    "pid_max larger than maximum allowed value, using %u\n",
+			pid_max_max);
+		pid_max = pid_max_max;
+	} else {
+		pid_max = maxp;
+		pr_info("pid_max set to %u\n", pid_max);
+	}
+
+	return 0;
+}
+
+early_param("pid_max", set_pid_max);
+
 static inline int mk_pid(struct pid_namespace *pid_ns,
 		struct pidmap *map, int off)
 {

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2
  2010-04-21  1:40 [Patch 1/1] init: Provide a kernel start parameter to increase pid_max Mike Travis
@ 2010-04-21  1:52 ` Mike Travis
  2010-04-21  9:23   ` Alan Cox
  0 siblings, 1 reply; 32+ messages in thread
From: Mike Travis @ 2010-04-21  1:52 UTC (permalink / raw)
  To: Ingo Molnar, Greg Kroah-Hartman, Linus Torvalds
  Cc: Hedi Berriche, Jack Steiner, Andrew Morton, Robin Holt, LKML

[Sorry, the previous patch I sent was an incorrect version.  The arg specified
in the Documentation file was wrong.]

Subject: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max
From: Hedi Berriche <hedi@sgi.com>

On a system with a substantial number of processors, the early default pid_max
of 32k will not be enough.  A system with 1664 CPU's, there are 25163 processes
started before the login prompt.  It's estimated that with 2048 CPU's we will pass
the 32k limit.  With 4096, we'll reach that limit very early during the boot cycle,
and processes would stall waiting for an available pid.

This provides a kernel start parameter to increase the early maximum number of
pids available.  It does not change any of the defaults.

Signed-off-by: Hedi Berriche <hedi@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Robin Holt <holt@sgi.com>

---
 Documentation/kernel-parameters.txt |   11 +++++++++++
 kernel/pid.c                        |   30 ++++++++++++++++++++++++++++++
 2 files changed, 41 insertions(+)

--- linux-2.6.32.orig/Documentation/kernel-parameters.txt
+++ linux-2.6.32/Documentation/kernel-parameters.txt
@@ -2033,6 +2033,17 @@ and is between 256 and 4096 characters.
 	pg.		[PARIDE]
 			See Documentation/blockdev/paride.txt.
 
+	pid_max=nn[KMG]	[KNL] Maximum number of PID's to use.  On a system
+			with a large amount of processors, the default
+			pid_max may not be sufficient to allow the system
+			to boot.  The range of allowed values is limited from
+			pid_max_min to pid_max_max (configuration dependent.)
+			See kernel/pid.c and include/linux/threads.h for
+			specific values.  Note that specifying a value
+			too small may cause the system to fail to boot,
+			so that value is ignored.  Using a value too large,
+			and the largest allowed value will be used instead.
+
 	pirq=		[SMP,APIC] Manual mp-table setup
 			See Documentation/x86/i386/IO-APIC.txt.
 
--- linux-2.6.32.orig/kernel/pid.c
+++ linux-2.6.32/kernel/pid.c
@@ -53,6 +53,36 @@ int pid_max_max = PID_MAX_LIMIT;
 #define BITS_PER_PAGE		(PAGE_SIZE*8)
 #define BITS_PER_PAGE_MASK	(BITS_PER_PAGE-1)
 
+static int __init set_pid_max(char *str)
+{
+	u64 maxp;
+
+	if (!str)
+		return -EINVAL;
+
+	maxp = memparse(str, &str);
+
+	if (maxp < pid_max_min) {
+		pr_warning(
+		    "pid_max smaller than minimum allowed value (%u)\n",
+			pid_max_min);
+		return -EINVAL;
+	}
+	if (maxp > pid_max_max) {
+		pr_warning(
+		    "pid_max larger than maximum allowed value, using %u\n",
+			pid_max_max);
+		pid_max = pid_max_max;
+	} else {
+		pid_max = maxp;
+		pr_info("pid_max set to %u\n", pid_max);
+	}
+
+	return 0;
+}
+
+early_param("pid_max", set_pid_max);
+
 static inline int mk_pid(struct pid_namespace *pid_ns,
 		struct pidmap *map, int off)
 {

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2
  2010-04-21  1:52 ` [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2 Mike Travis
@ 2010-04-21  9:23   ` Alan Cox
  2010-04-21 16:59     ` Hedi Berriche
  0 siblings, 1 reply; 32+ messages in thread
From: Alan Cox @ 2010-04-21  9:23 UTC (permalink / raw)
  To: Mike Travis
  Cc: Ingo Molnar, Greg Kroah-Hartman, Linus Torvalds, Hedi Berriche,
	Jack Steiner, Andrew Morton, Robin Holt, LKML

> of 32k will not be enough.  A system with 1664 CPU's, there are 25163 processes
> started before the login prompt.  It's estimated that with 2048 CPU's we will pass

Is that perhaps the bug not the 32K limit ? and does Tejun's work on work
queue sanity help avoid the need for this ?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2
  2010-04-21  9:23   ` Alan Cox
@ 2010-04-21 16:59     ` Hedi Berriche
  2010-04-21 17:18       ` Rik van Riel
  2010-04-21 17:58       ` [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2 Alan Cox
  0 siblings, 2 replies; 32+ messages in thread
From: Hedi Berriche @ 2010-04-21 16:59 UTC (permalink / raw)
  To: Alan Cox
  Cc: Mike Travis, Ingo Molnar, Greg Kroah-Hartman, Linus Torvalds,
	Jack Steiner, Andrew Morton, Robin Holt, LKML

On Wed, Apr 21, 2010 at 10:20 Alan Cox wrote:
| > of 32k will not be enough.  A system with 1664 CPU's, there are 25163 processes
| > started before the login prompt.  It's estimated that with 2048 CPU's we will pass
| 
| Is that perhaps the bug not the 32K limit?

Doubt it: I just checked on an *idle* 1664 CPUs system and I can see 26844
tasks, all but few being kernel threads.

Worst case scenario i.e. 4096 CPUs system (+ typically thousands of disks) will
most certainly pain to boot, if it ever manages to, when pid_max is set to 32K.

Cheers,
Hedi.
-- 
Be careful of reading health books, you might die of a misprint.
	-- Mark Twain

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2
  2010-04-21 16:59     ` Hedi Berriche
@ 2010-04-21 17:18       ` Rik van Riel
  2010-04-21 17:54         ` Mike Travis
  2010-04-21 19:14         ` John Stoffel
  2010-04-21 17:58       ` [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2 Alan Cox
  1 sibling, 2 replies; 32+ messages in thread
From: Rik van Riel @ 2010-04-21 17:18 UTC (permalink / raw)
  To: Hedi Berriche
  Cc: Alan Cox, Mike Travis, Ingo Molnar, Greg Kroah-Hartman,
	Linus Torvalds, Jack Steiner, Andrew Morton, Robin Holt, LKML

On 04/21/2010 12:59 PM, Hedi Berriche wrote:
> On Wed, Apr 21, 2010 at 10:20 Alan Cox wrote:
> |>  of 32k will not be enough.  A system with 1664 CPU's, there are 25163 processes
> |>  started before the login prompt.  It's estimated that with 2048 CPU's we will pass
> |
> | Is that perhaps the bug not the 32K limit?
>
> Doubt it: I just checked on an *idle* 1664 CPUs system and I can see 26844
> tasks, all but few being kernel threads.

That is 15 kernel threads per CPU.

Reducing the number of kernel threads sounds like a
useful thing to do.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2
  2010-04-21 17:18       ` Rik van Riel
@ 2010-04-21 17:54         ` Mike Travis
  2010-04-21 19:14         ` John Stoffel
  1 sibling, 0 replies; 32+ messages in thread
From: Mike Travis @ 2010-04-21 17:54 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Hedi Berriche, Alan Cox, Ingo Molnar, Greg Kroah-Hartman,
	Linus Torvalds, Jack Steiner, Andrew Morton, Robin Holt, LKML



Rik van Riel wrote:
> On 04/21/2010 12:59 PM, Hedi Berriche wrote:
>> On Wed, Apr 21, 2010 at 10:20 Alan Cox wrote:
>> |>  of 32k will not be enough.  A system with 1664 CPU's, there are 
>> 25163 processes
>> |>  started before the login prompt.  It's estimated that with 2048 
>> CPU's we will pass
>> |
>> | Is that perhaps the bug not the 32K limit?
>>
>> Doubt it: I just checked on an *idle* 1664 CPUs system and I can see 
>> 26844
>> tasks, all but few being kernel threads.
> 
> That is 15 kernel threads per CPU.
> 
> Reducing the number of kernel threads sounds like a
> useful thing to do.

I'm doing more research but all the udev modprobes seem to spawn
quite a few tasks.  And even though they go away, when the pid
pool is limited, I'm guessing many of them are waiting.

On the last test I did yesterday, the pid # was up in the 77000
range at the login prompt (I started the 1664 cpu system with
pid_max=128k).

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2
  2010-04-21 16:59     ` Hedi Berriche
  2010-04-21 17:18       ` Rik van Riel
@ 2010-04-21 17:58       ` Alan Cox
  2010-04-21 19:12         ` Hedi Berriche
  1 sibling, 1 reply; 32+ messages in thread
From: Alan Cox @ 2010-04-21 17:58 UTC (permalink / raw)
  To: Hedi Berriche
  Cc: Mike Travis, Ingo Molnar, Greg Kroah-Hartman, Linus Torvalds,
	Jack Steiner, Andrew Morton, Robin Holt, LKML

On Wed, 21 Apr 2010 17:59:34 +0100
Hedi Berriche <hedi@sgi.com> wrote:

> On Wed, Apr 21, 2010 at 10:20 Alan Cox wrote:
> | > of 32k will not be enough.  A system with 1664 CPU's, there are 25163 processes
> | > started before the login prompt.  It's estimated that with 2048 CPU's we will pass
> | 
> | Is that perhaps the bug not the 32K limit?
> 
> Doubt it: I just checked on an *idle* 1664 CPUs system and I can see 26844
> tasks, all but few being kernel threads.

So why have we got 26844 tasks. Isn't that a rather more relevant
question.

And as I asked before - how does Tejun's work on sanitizing work queues
affect this ?

Alan

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2
  2010-04-21 17:58       ` [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2 Alan Cox
@ 2010-04-21 19:12         ` Hedi Berriche
  2010-04-21 19:51           ` Greg KH
  2010-04-21 22:05           ` Jack Steiner
  0 siblings, 2 replies; 32+ messages in thread
From: Hedi Berriche @ 2010-04-21 19:12 UTC (permalink / raw)
  To: Alan Cox
  Cc: Mike Travis, Ingo Molnar, Greg Kroah-Hartman, Linus Torvalds,
	Jack Steiner, Andrew Morton, Robin Holt, LKML

On Wed, Apr 21, 2010 at 18:54 Alan Cox wrote:
| Hedi Berriche <hedi@sgi.com> wrote:
|
| > I just checked on an *idle* 1664 CPUs system and I can see 26844 tasks, all
| > but few being kernel threads.
| 
| So why have we got 26844 tasks. Isn't that a rather more relevant
| question.

OK, here's a rough breakdown of the tasks

     104 kswapd
    1664 aio
    1664 ata
    1664 crypto
    1664 events
    1664 ib_cm
    1664 kintegrityd
    1664 kondemand
    1664 ksoftirqd
    1664 kstop
    1664 migration
    1664 rpciod
    1664 scsi_tgtd
    1664 xfsconvertd
    1664 xfsdatad
    1664 xfslogd

that's 25064, omitting the rest as its contribution to the overall total is
negligible.

[[

Let's also not forget all those ephemeral user space tasks (udev and the likes)
that will be spawned at boot time on even large systems with even more
thousands of disks, arguably one might consider hack initrd and similar to work
around the problem and set pid_max as soon as /proc becomes available but it's
a bit of a PITA.

]]

| And as I asked before - how does Tejun's work on sanitizing work queues
| affect this ?

I'm not familiar with the work in question so I (we) will have to look it up,
and at it and see whether it's relevant to what we're seeing here. It does sound
like it might help, to certain extent at least.

That said, while I am genuinely interested in spending time on this and digging
further to see whether something has/can be done about keeping under control the
number of tasks required to comfortably boot a system of this size, I think that
in the meantime the boot parameter approach is useful in the sense that it addresses
the immediate problem of being able such systems *without* any risk to break the
code or alter the default behaviour.

Cheers,
Hedi.
-- 
Be careful of reading health books, you might die of a misprint.
	-- Mark Twain

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2
  2010-04-21 17:18       ` Rik van Riel
  2010-04-21 17:54         ` Mike Travis
@ 2010-04-21 19:14         ` John Stoffel
  2010-04-21 19:33           ` Hedi Berriche
  1 sibling, 1 reply; 32+ messages in thread
From: John Stoffel @ 2010-04-21 19:14 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Hedi Berriche, Alan Cox, Mike Travis, Ingo Molnar,
	Greg Kroah-Hartman, Linus Torvalds, Jack Steiner, Andrew Morton,
	Robin Holt, LKML

>>>>> "Rik" == Rik van Riel <riel@redhat.com> writes:

Rik> On 04/21/2010 12:59 PM, Hedi Berriche wrote:
>> On Wed, Apr 21, 2010 at 10:20 Alan Cox wrote:
>> |>  of 32k will not be enough.  A system with 1664 CPU's, there are 25163 processes
>> |>  started before the login prompt.  It's estimated that with 2048 CPU's we will pass
>> |
>> | Is that perhaps the bug not the 32K limit?
>> 
>> Doubt it: I just checked on an *idle* 1664 CPUs system and I can see 26844
>> tasks, all but few being kernel threads.

Rik> That is 15 kernel threads per CPU.

Rik> Reducing the number of kernel threads sounds like a
Rik> useful thing to do.

Isn't that already a project?  I thought someone (Jeff? Jorn? Tejun? Bueller
bueller....?) was already proposing a patch set to reduce the number
of kernel threads by having dynamic workqueues instead, so that we
didn't spawn a bunch of threads that never did anything?

John

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2
  2010-04-21 19:14         ` John Stoffel
@ 2010-04-21 19:33           ` Hedi Berriche
  2010-04-21 20:10             ` John Stoffel
  0 siblings, 1 reply; 32+ messages in thread
From: Hedi Berriche @ 2010-04-21 19:33 UTC (permalink / raw)
  To: John Stoffel
  Cc: Rik van Riel, Alan Cox, Mike Travis, Ingo Molnar,
	Greg Kroah-Hartman, Linus Torvalds, Jack Steiner, Andrew Morton,
	Robin Holt, LKML

On Wed, Apr 21, 2010 at 20:15 John Stoffel wrote:
| >>>>> "Rik" == Rik van Riel <riel@redhat.com> writes:
| 
| Rik> That is 15 kernel threads per CPU.
| 
| Rik> Reducing the number of kernel threads sounds like a
| Rik> useful thing to do.
| 
| Isn't that already a project? 

Yes, thanks to Alan's probing I looked it up 

    http://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git

but we're definitely talking long term solution vs. something that can ease
pain now.

Cheers,
Hedi.
-- 
Be careful of reading health books, you might die of a misprint.
	-- Mark Twain

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2
  2010-04-21 19:12         ` Hedi Berriche
@ 2010-04-21 19:51           ` Greg KH
  2010-04-21 20:12             ` Hedi Berriche
  2010-04-21 22:05           ` Jack Steiner
  1 sibling, 1 reply; 32+ messages in thread
From: Greg KH @ 2010-04-21 19:51 UTC (permalink / raw)
  To: Hedi Berriche
  Cc: Alan Cox, Mike Travis, Ingo Molnar, Linus Torvalds, Jack Steiner,
	Andrew Morton, Robin Holt, LKML

On Wed, Apr 21, 2010 at 08:12:13PM +0100, Hedi Berriche wrote:
> Let's also not forget all those ephemeral user space tasks (udev and the likes)
> that will be spawned at boot time on even large systems with even more
> thousands of disks, arguably one might consider hack initrd and similar to work
> around the problem and set pid_max as soon as /proc becomes available but it's
> a bit of a PITA.

udev should properly handle large numbers of cpus and the tasks that it
spawns so as to not overload things.  If not, and you feel it is
creating too many tasks, please let the udev developers know and they
will be glad to work with you on this issue.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2
  2010-04-21 19:33           ` Hedi Berriche
@ 2010-04-21 20:10             ` John Stoffel
  2010-04-21 22:24               ` Greg KH
  0 siblings, 1 reply; 32+ messages in thread
From: John Stoffel @ 2010-04-21 20:10 UTC (permalink / raw)
  To: Hedi Berriche
  Cc: John Stoffel, Rik van Riel, Alan Cox, Mike Travis, Ingo Molnar,
	Greg Kroah-Hartman, Linus Torvalds, Jack Steiner, Andrew Morton,
	Robin Holt, LKML

>>>>> "Hedi" == Hedi Berriche <hedi@sgi.com> writes:

Hedi> On Wed, Apr 21, 2010 at 20:15 John Stoffel wrote:
Hedi> | >>>>> "Rik" == Rik van Riel <riel@redhat.com> writes:
Hedi> | 
Hedi> | Rik> That is 15 kernel threads per CPU.
Hedi> | 
Hedi> | Rik> Reducing the number of kernel threads sounds like a
Hedi> | Rik> useful thing to do.
Hedi> | 
Hedi> | Isn't that already a project? 

Hedi> Yes, thanks to Alan's probing I looked it up 

Hedi>     http://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git

Hedi> but we're definitely talking long term solution vs. something
Hedi> that can ease pain now.

It seems to me that running Linux on such a large machine is such a
specialized niche, the putting in your change to the regular kernel
isn't a near term need either.  And from the sounds of it, Tejun's
work has better long term potential.

But hey, I'm generally clueless, so take what I say with a grain of
salt.  :]  

John

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2
  2010-04-21 19:51           ` Greg KH
@ 2010-04-21 20:12             ` Hedi Berriche
  0 siblings, 0 replies; 32+ messages in thread
From: Hedi Berriche @ 2010-04-21 20:12 UTC (permalink / raw)
  To: Greg KH
  Cc: Alan Cox, Mike Travis, Ingo Molnar, Linus Torvalds, Jack Steiner,
	Andrew Morton, Robin Holt, LKML

On Wed, Apr 21, 2010 at 20:52 Greg KH wrote:
| On Wed, Apr 21, 2010 at 08:12:13PM +0100, Hedi Berriche wrote:
| > Let's also not forget all those ephemeral user space tasks (udev and the likes)
| > that will be spawned at boot time on even large systems with even more
| > thousands of disks, arguably one might consider hack initrd and similar to work
| > around the problem and set pid_max as soon as /proc becomes available but it's
| > a bit of a PITA.
| 
| udev should properly handle large numbers of cpus and the tasks that it
| spawns so as to not overload things.  If not, and you feel it is
| creating too many tasks, please let the udev developers know and they
| will be glad to work with you on this issue.

Just to be clear here --and be done with the udev parenthesis-- we kind of need
udev to take advantage of the fact that there's a large number of CPUs on the
machine especially on in the case of a config with thousands of disks, as that
shortens the time required to have a box in a working state with all disks
available and all.

IOW, I am not after throttling or serialising udev, just mentioned it as an
example of user space beast that can contribute --in the current state of things--
to the need of having a large number of pid_max on certain configurations.

That said I do realise that bit too should be looked at and any problems, as you
quite rightly pointed out, should be discussed with the udev chaps.

Cheers,
Hedi.
-- 
Be careful of reading health books, you might die of a misprint.
	-- Mark Twain

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2
  2010-04-21 19:12         ` Hedi Berriche
  2010-04-21 19:51           ` Greg KH
@ 2010-04-21 22:05           ` Jack Steiner
  1 sibling, 0 replies; 32+ messages in thread
From: Jack Steiner @ 2010-04-21 22:05 UTC (permalink / raw)
  To: Hedi Berriche
  Cc: Alan Cox, Mike Travis, Ingo Molnar, Greg Kroah-Hartman,
	Linus Torvalds, Andrew Morton, Robin Holt, LKML

On Wed, Apr 21, 2010 at 08:12:13PM +0100, Hedi Berriche wrote:
> On Wed, Apr 21, 2010 at 18:54 Alan Cox wrote:
> | Hedi Berriche <hedi@sgi.com> wrote:
> |
> | > I just checked on an *idle* 1664 CPUs system and I can see 26844 tasks, all
> | > but few being kernel threads.
> | 
> | So why have we got 26844 tasks. Isn't that a rather more relevant
> | question.
> 
> OK, here's a rough breakdown of the tasks
> 
>      104 kswapd
>     1664 aio
>     1664 ata
>     1664 crypto
>     1664 events
>     1664 ib_cm
>     1664 kintegrityd
>     1664 kondemand
>     1664 ksoftirqd
>     1664 kstop
>     1664 migration
>     1664 rpciod
>     1664 scsi_tgtd
>     1664 xfsconvertd
>     1664 xfsdatad
>     1664 xfslogd
> 
> that's 25064, omitting the rest as its contribution to the overall total is
> negligible.

Also, our target for the number of cpus is 4096. We are not even halfway there.
(I certainly expect other issues to arise scaling to 4096p but running out of pids
_should_ not be one of them...)



> 
> [[
> 
> Let's also not forget all those ephemeral user space tasks (udev and the likes)
> that will be spawned at boot time on even large systems with even more
> thousands of disks, arguably one might consider hack initrd and similar to work
> around the problem and set pid_max as soon as /proc becomes available but it's
> a bit of a PITA.
> 
> ]]
> 
> | And as I asked before - how does Tejun's work on sanitizing work queues
> | affect this ?
> 
> I'm not familiar with the work in question so I (we) will have to look it up,
> and at it and see whether it's relevant to what we're seeing here. It does sound
> like it might help, to certain extent at least.
> 
> That said, while I am genuinely interested in spending time on this and digging
> further to see whether something has/can be done about keeping under control the
> number of tasks required to comfortably boot a system of this size, I think that
> in the meantime the boot parameter approach is useful in the sense that it addresses
> the immediate problem of being able such systems *without* any risk to break the
> code or alter the default behaviour.
> 
> Cheers,
> Hedi.
> -- 
> Be careful of reading health books, you might die of a misprint.
> 	-- Mark Twain

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2
  2010-04-21 20:10             ` John Stoffel
@ 2010-04-21 22:24               ` Greg KH
  2010-04-21 22:49                 ` Rik van Riel
  0 siblings, 1 reply; 32+ messages in thread
From: Greg KH @ 2010-04-21 22:24 UTC (permalink / raw)
  To: John Stoffel
  Cc: Hedi Berriche, Rik van Riel, Alan Cox, Mike Travis, Ingo Molnar,
	Linus Torvalds, Jack Steiner, Andrew Morton, Robin Holt, LKML

On Wed, Apr 21, 2010 at 04:10:08PM -0400, John Stoffel wrote:
> >>>>> "Hedi" == Hedi Berriche <hedi@sgi.com> writes:
> 
> Hedi> On Wed, Apr 21, 2010 at 20:15 John Stoffel wrote:
> Hedi> | >>>>> "Rik" == Rik van Riel <riel@redhat.com> writes:
> Hedi> | 
> Hedi> | Rik> That is 15 kernel threads per CPU.
> Hedi> | 
> Hedi> | Rik> Reducing the number of kernel threads sounds like a
> Hedi> | Rik> useful thing to do.
> Hedi> | 
> Hedi> | Isn't that already a project? 
> 
> Hedi> Yes, thanks to Alan's probing I looked it up 
> 
> Hedi>     http://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git
> 
> Hedi> but we're definitely talking long term solution vs. something
> Hedi> that can ease pain now.
> 
> It seems to me that running Linux on such a large machine is such a
> specialized niche, the putting in your change to the regular kernel
> isn't a near term need either.  And from the sounds of it, Tejun's
> work has better long term potential.

Tejun's work has much better long term potential, but this is still an
issue for large #cpu systems, which we want Linux to support well.  This
isn't a "specialized niche" for Linux, at all, Linux pretty much
dominates this hardware area, and it would be nice to ensure that this
continues.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2
  2010-04-21 22:24               ` Greg KH
@ 2010-04-21 22:49                 ` Rik van Riel
  2010-04-21 23:22                   ` Greg KH
  0 siblings, 1 reply; 32+ messages in thread
From: Rik van Riel @ 2010-04-21 22:49 UTC (permalink / raw)
  To: Greg KH
  Cc: John Stoffel, Hedi Berriche, Alan Cox, Mike Travis, Ingo Molnar,
	Linus Torvalds, Jack Steiner, Andrew Morton, Robin Holt, LKML

On 04/21/2010 06:24 PM, Greg KH wrote:

> Tejun's work has much better long term potential, but this is still an
> issue for large #cpu systems, which we want Linux to support well.  This
> isn't a "specialized niche" for Linux, at all, Linux pretty much
> dominates this hardware area, and it would be nice to ensure that this
> continues.

Yes, the pid_max patch seems like a decent stop gap for
distro kernels right now.  However, Tejun's work is
probably a more appropriate path forward.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2
  2010-04-21 22:49                 ` Rik van Riel
@ 2010-04-21 23:22                   ` Greg KH
  2010-04-22  9:28                     ` Alan Cox
  0 siblings, 1 reply; 32+ messages in thread
From: Greg KH @ 2010-04-21 23:22 UTC (permalink / raw)
  To: Rik van Riel
  Cc: John Stoffel, Hedi Berriche, Alan Cox, Mike Travis, Ingo Molnar,
	Linus Torvalds, Jack Steiner, Andrew Morton, Robin Holt, LKML

On Wed, Apr 21, 2010 at 06:49:22PM -0400, Rik van Riel wrote:
> On 04/21/2010 06:24 PM, Greg KH wrote:
> 
> >Tejun's work has much better long term potential, but this is still an
> >issue for large #cpu systems, which we want Linux to support well.  This
> >isn't a "specialized niche" for Linux, at all, Linux pretty much
> >dominates this hardware area, and it would be nice to ensure that this
> >continues.
> 
> Yes, the pid_max patch seems like a decent stop gap for
> distro kernels right now.  However, Tejun's work is
> probably a more appropriate path forward.

Distros don't want to take a patch that adds a new boot param that is
not accepted upstream, otherwise they will be stuck forward porting it
from now until, well, forever :)

As this solves a problem that people are having today, on the kernel.org
kernel, on a known machine, and we really don't know when the "reduce
the number of processes per cpu" work will be done, or if it really will
solve this issue, then why can't we take it now?  If the work does solve
the problem in the future, then we can take the command line option out,
and everyone is happy.

Sound reasonable?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2
  2010-04-21 23:22                   ` Greg KH
@ 2010-04-22  9:28                     ` Alan Cox
  2010-04-22 12:58                       ` Jack Steiner
                                         ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Alan Cox @ 2010-04-22  9:28 UTC (permalink / raw)
  To: Greg KH
  Cc: Rik van Riel, John Stoffel, Hedi Berriche, Mike Travis,
	Ingo Molnar, Linus Torvalds, Jack Steiner, Andrew Morton,
	Robin Holt, LKML

> Distros don't want to take a patch that adds a new boot param that is
> not accepted upstream, otherwise they will be stuck forward porting it
> from now until, well, forever :)

So for an obscure IA64 specific problem you want the upstream kernel to
port it forward forever instead ?
> 
> As this solves a problem that people are having today, on the kernel.org
> kernel, on a known machine, and we really don't know when the "reduce
> the number of processes per cpu" work will be done, or if it really will
> solve this issue, then why can't we take it now?  If the work does solve
> the problem in the future, then we can take the command line option out,
> and everyone is happy.
> 
> Sound reasonable?

No - to start with it would be far saner for everything involved if the
4096 processor minority fixed it for the moment in their arch code by
doing something like

	if (max_pids < PIDS_PER_CPU * num_cpus) {
		max_pids = ...
		printk(something informative)
	}

in their __init marked code.

Because when Tejun's stuff is in the patch can go away, and also if it's
not sufficient then the patch above should keep it sane when they go to
32000 cpus or whatever is next.

Alan

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2
  2010-04-22  9:28                     ` Alan Cox
@ 2010-04-22 12:58                       ` Jack Steiner
  2010-04-22 13:57                       ` Robin Holt
  2010-04-22 14:48                       ` Linus Torvalds
  2 siblings, 0 replies; 32+ messages in thread
From: Jack Steiner @ 2010-04-22 12:58 UTC (permalink / raw)
  To: Alan Cox
  Cc: Greg KH, Rik van Riel, John Stoffel, Hedi Berriche, Mike Travis,
	Ingo Molnar, Linus Torvalds, Andrew Morton, Robin Holt, LKML

On Thu, Apr 22, 2010 at 10:28:52AM +0100, Alan Cox wrote:
> > Distros don't want to take a patch that adds a new boot param that is
> > not accepted upstream, otherwise they will be stuck forward porting it
> > from now until, well, forever :)
> 
> So for an obscure IA64 specific problem you want the upstream kernel to
> port it forward forever instead ?

FWIW, the problem is occurring on systems that use x86 processors - not
IA64.


> > 
> > As this solves a problem that people are having today, on the kernel.org
> > kernel, on a known machine, and we really don't know when the "reduce
> > the number of processes per cpu" work will be done, or if it really will
> > solve this issue, then why can't we take it now?  If the work does solve
> > the problem in the future, then we can take the command line option out,
> > and everyone is happy.
> > 
> > Sound reasonable?
> 
> No - to start with it would be far saner for everything involved if the
> 4096 processor minority fixed it for the moment in their arch code by
> doing something like
> 
> 	if (max_pids < PIDS_PER_CPU * num_cpus) {
> 		max_pids = ...
> 		printk(something informative)
> 	}
> 
> in their __init marked code.
> 
> Because when Tejun's stuff is in the patch can go away, and also if it's
> not sufficient then the patch above should keep it sane when they go to
> 32000 cpus or whatever is next.
> 
> Alan

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2
  2010-04-22  9:28                     ` Alan Cox
  2010-04-22 12:58                       ` Jack Steiner
@ 2010-04-22 13:57                       ` Robin Holt
  2010-04-22 14:48                       ` Linus Torvalds
  2 siblings, 0 replies; 32+ messages in thread
From: Robin Holt @ 2010-04-22 13:57 UTC (permalink / raw)
  To: Alan Cox
  Cc: Greg KH, Rik van Riel, John Stoffel, Hedi Berriche, Mike Travis,
	Ingo Molnar, Linus Torvalds, Jack Steiner, Andrew Morton,
	Robin Holt, LKML

> No - to start with it would be far saner for everything involved if the
> 4096 processor minority fixed it for the moment in their arch code by
> doing something like
> 
> 	if (max_pids < PIDS_PER_CPU * num_cpus) {
> 		max_pids = ...
> 		printk(something informative)
> 	}
> 
> in their __init marked code.

I don't understand how it would be possible for the arch maintainers
to predict what a particular machine's configuration would need for
PIDS_PER_CPU.  Many of the extra pids needed on a per-cpu basis are
brought in by device drivers or subsystems.

Are you proposing a typical configuration be used for the basis or an
extreme configuration?

If your basis is the typical configuration, how would an administrator
of the extreme configuration get themselves out of the situation of
pids_max being too small without the same command line option.

If we use the extreme case, then we end up with a lot of extraneous pids,
however I don't see that as being too terrible of a situation.

Robin

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2
  2010-04-22  9:28                     ` Alan Cox
  2010-04-22 12:58                       ` Jack Steiner
  2010-04-22 13:57                       ` Robin Holt
@ 2010-04-22 14:48                       ` Linus Torvalds
  2010-04-22 17:08                         ` Robin Holt
  2010-04-25  7:16                         ` Pavel Machek
  2 siblings, 2 replies; 32+ messages in thread
From: Linus Torvalds @ 2010-04-22 14:48 UTC (permalink / raw)
  To: Alan Cox
  Cc: Greg KH, Rik van Riel, John Stoffel, Hedi Berriche, Mike Travis,
	Ingo Molnar, Jack Steiner, Andrew Morton, Robin Holt, LKML



On Thu, 22 Apr 2010, Alan Cox wrote:
>
> > Distros don't want to take a patch that adds a new boot param that is
> > not accepted upstream, otherwise they will be stuck forward porting it
> > from now until, well, forever :)
> 
> So for an obscure IA64 specific problem you want the upstream kernel to
> port it forward forever instead ?

Ehh. Nobody does ia64 any more. It's dead, Jim.

This is x86. SGI finally long ago gave up on the Intel/HP clusterf*ck.

Which I'm not entirely sure makes the case for the kernel parameter much 
stronger, though. I wonder if it's not more appropriate to just have a 
total hack saying

	if (max_pids < N * max_cpus) {
		printk("We have %d CPUs, increasing max_pids to %d\n");
		max_pids = N*max_cpus;
	}

where "N" is just some random fudge-factor. It's reasonable to expect a 
certain minimum number of processes per CPU, after all.

		Linus

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2
  2010-04-22 14:48                       ` Linus Torvalds
@ 2010-04-22 17:08                         ` Robin Holt
  2010-04-22 18:10                           ` John Stoffel
  2010-04-22 20:35                           ` Andrew Morton
  2010-04-25  7:16                         ` Pavel Machek
  1 sibling, 2 replies; 32+ messages in thread
From: Robin Holt @ 2010-04-22 17:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan Cox, Greg KH, Rik van Riel, John Stoffel, Hedi Berriche,
	Mike Travis, Ingo Molnar, Jack Steiner, Andrew Morton,
	Robin Holt, LKML

> Which I'm not entirely sure makes the case for the kernel parameter much 
> stronger, though. I wonder if it's not more appropriate to just have a 
> total hack saying
> 
> 	if (max_pids < N * max_cpus) {
> 		printk("We have %d CPUs, increasing max_pids to %d\n");
> 		max_pids = N*max_cpus;
> 	}
> 
> where "N" is just some random fudge-factor. It's reasonable to expect a 
> certain minimum number of processes per CPU, after all.

How about:

	pid_max_min = max(pid_max_min, 19 * num_possible_cpus());
	pid_max_baseline = 2048 * num_possible_cpus();

	if (pid_max < pid_max_baseline) {
		printk("We have %d CPUs, increasing pid_max to %d\n"...
		pid_max = pid_max_baseline;
	}


This would scale pid_max_min by a sane amount, leave the default value
of pid_max_min and pid_max untouched below 16 cpus and then scale both
up linearly beyond that.

Robin

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2
  2010-04-22 17:08                         ` Robin Holt
@ 2010-04-22 18:10                           ` John Stoffel
  2010-04-22 20:35                           ` Andrew Morton
  1 sibling, 0 replies; 32+ messages in thread
From: John Stoffel @ 2010-04-22 18:10 UTC (permalink / raw)
  To: Robin Holt
  Cc: Linus Torvalds, Alan Cox, Greg KH, Rik van Riel, John Stoffel,
	Hedi Berriche, Mike Travis, Ingo Molnar, Jack Steiner,
	Andrew Morton, LKML

>>>>> "Robin" == Robin Holt <holt@sgi.com> writes:

>> Which I'm not entirely sure makes the case for the kernel parameter much 
>> stronger, though. I wonder if it's not more appropriate to just have a 
>> total hack saying
>> 
>> if (max_pids < N * max_cpus) {
>> printk("We have %d CPUs, increasing max_pids to %d\n");
>> max_pids = N*max_cpus;
>> }
>> 
>> where "N" is just some random fudge-factor. It's reasonable to expect a 
>> certain minimum number of processes per CPU, after all.

Robin> How about:

Robin> 	pid_max_min = max(pid_max_min, 19 * num_possible_cpus());
Robin> 	pid_max_baseline = 2048 * num_possible_cpus();

Robin> 	if (pid_max < pid_max_baseline) {
Robin> 		printk("We have %d CPUs, increasing pid_max to %d\n"...
Robin> 		pid_max = pid_max_baseline;
Robin> 	}


Robin> This would scale pid_max_min by a sane amount, leave the default value
Robin> of pid_max_min and pid_max untouched below 16 cpus and then scale both
Robin> up linearly beyond that.

Looks good, but how about some comments and some defines for the magic
numbers of 2048 and 19?  

John


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2
  2010-04-22 17:08                         ` Robin Holt
  2010-04-22 18:10                           ` John Stoffel
@ 2010-04-22 20:35                           ` Andrew Morton
  1 sibling, 0 replies; 32+ messages in thread
From: Andrew Morton @ 2010-04-22 20:35 UTC (permalink / raw)
  To: Robin Holt
  Cc: Linus Torvalds, Alan Cox, Greg KH, Rik van Riel, John Stoffel,
	Hedi Berriche, Mike Travis, Ingo Molnar, Jack Steiner, LKML

On Thu, 22 Apr 2010 12:08:02 -0500
Robin Holt <holt@sgi.com> wrote:

> > Which I'm not entirely sure makes the case for the kernel parameter much 
> > stronger, though. I wonder if it's not more appropriate to just have a 
> > total hack saying
> > 
> > 	if (max_pids < N * max_cpus) {
> > 		printk("We have %d CPUs, increasing max_pids to %d\n");
> > 		max_pids = N*max_cpus;
> > 	}
> > 
> > where "N" is just some random fudge-factor. It's reasonable to expect a 
> > certain minimum number of processes per CPU, after all.
> 
> How about:
> 
> 	pid_max_min = max(pid_max_min, 19 * num_possible_cpus());
> 	pid_max_baseline = 2048 * num_possible_cpus();
> 
> 	if (pid_max < pid_max_baseline) {
> 		printk("We have %d CPUs, increasing pid_max to %d\n"...
> 		pid_max = pid_max_baseline;
> 	}
> 
> 
> This would scale pid_max_min by a sane amount, leave the default value
> of pid_max_min and pid_max untouched below 16 cpus and then scale both
> up linearly beyond that.

Something like that would work.  We shouild ensure that pid_max cannot
end up being less than the current PID_MAX_DEFAULT.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2
  2010-04-22 14:48                       ` Linus Torvalds
  2010-04-22 17:08                         ` Robin Holt
@ 2010-04-25  7:16                         ` Pavel Machek
  2010-04-25 17:15                           ` Linus Torvalds
  1 sibling, 1 reply; 32+ messages in thread
From: Pavel Machek @ 2010-04-25  7:16 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan Cox, Greg KH, Rik van Riel, John Stoffel, Hedi Berriche,
	Mike Travis, Ingo Molnar, Jack Steiner, Andrew Morton,
	Robin Holt, LKML

Hi!

> > > Distros don't want to take a patch that adds a new boot param that is
> > > not accepted upstream, otherwise they will be stuck forward porting it
> > > from now until, well, forever :)
> > 
> > So for an obscure IA64 specific problem you want the upstream kernel to
> > port it forward forever instead ?
> 
> Ehh. Nobody does ia64 any more. It's dead, Jim.
> 
> This is x86. SGI finally long ago gave up on the Intel/HP clusterf*ck.
> 
> Which I'm not entirely sure makes the case for the kernel parameter much 
> stronger, though. I wonder if it's not more appropriate to just have a 
> total hack saying
> 
> 	if (max_pids < N * max_cpus) {
> 		printk("We have %d CPUs, increasing max_pids to %d\n");
> 		max_pids = N*max_cpus;
> 	}
> 
> where "N" is just some random fudge-factor. It's reasonable to expect a 
> certain minimum number of processes per CPU, after all.

Issue with max_pids is that it can break userspace, right?

At that point it seems saner to require a parameter --- just adding
cpus to the system should not do it... 

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2
  2010-04-25 17:27                             ` Linus Torvalds
@ 2010-04-25 12:13                               ` Pavel Machek
  0 siblings, 0 replies; 32+ messages in thread
From: Pavel Machek @ 2010-04-25 12:13 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan Cox, Greg KH, Rik van Riel, John Stoffel, Hedi Berriche,
	Mike Travis, Ingo Molnar, Jack Steiner, Andrew Morton,
	Robin Holt, LKML

Hi!

> > Iirc, some _really_ old code used 'short' for pid_t, and we wanted to be 
> > really safe when we raised the limits. 
> 
> .. I dug into the history, and this is from August 2002..
> 
> We used to limit it to sixteen bits, but that was too tight even then for 
> some people, so first we did this:
> 
>     Author: Linus Torvalds <torvalds@home.transmeta.com>
>     Date:   Thu Aug 8 03:57:42 2002 -0700
>     
>         Make pid allocation use 30 of the 32 bits, instead of 15.
...
> which just upped the limits.  That, in turn, _did_ end up breaking some
> silly old binaries, so then a month later Ingo did a "pid-max" patch
> that made the maximum dynamic, with a default of the old 15-bit limit,
> and a sysctl to raise it. 
> 
> And then a couple of weeks later, Ingo did another patch to fix the
> scalability problems we had with lots of pids (avoiding the whole
> "for_each_task()" crud to figure out which pids were ok, and using a
> 'struct pid' instead).
> 
> So the whole worry about > 15-bit pids goes back to 2002.  I think we're
> pretty safe now. 

>From principle of least surprise PoV: breaking old userspace when you
pass special config option is less surpising than breaking old
userspace when you add more CPUs.

Whether the breakage will be common enough that this matters is other
question.
								Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2
  2010-04-25  7:16                         ` Pavel Machek
@ 2010-04-25 17:15                           ` Linus Torvalds
  2010-04-25 17:27                             ` Linus Torvalds
  2010-04-26 19:48                             ` [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v3 Mike Travis
  0 siblings, 2 replies; 32+ messages in thread
From: Linus Torvalds @ 2010-04-25 17:15 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Alan Cox, Greg KH, Rik van Riel, John Stoffel, Hedi Berriche,
	Mike Travis, Ingo Molnar, Jack Steiner, Andrew Morton,
	Robin Holt, LKML



On Sun, 25 Apr 2010, Pavel Machek wrote:
> 
> Issue with max_pids is that it can break userspace, right?

Iirc, some _really_ old code used 'short' for pid_t, and we wanted to be 
really safe when we raised the limits. 

I seriously doubt we need to worry about old binaries like that on any 16+ 
CPU machines, though.

The other issue is just the size of the pidmap[] array. Instead of walking 
all the processes to see "is this pid in use" (like I think the original 
Linux kernel did), we have a bitmap of used pids. When you raise pid_max, 
that bitmap obviously still needs to be big enough. Right now we allocate 
that statically (rather than growing it dynamically), so we end up having 
a _hard_ limit of PID_MAX_LIMIT too.

On 32-bit, I think that still maximum limit ends up being basically 32767. 
So again, on a _legacy_ system, you end up being limited in the number of 
pid_t entries.

				Linus

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2
  2010-04-25 17:15                           ` Linus Torvalds
@ 2010-04-25 17:27                             ` Linus Torvalds
  2010-04-25 12:13                               ` Pavel Machek
  2010-04-26 19:48                             ` [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v3 Mike Travis
  1 sibling, 1 reply; 32+ messages in thread
From: Linus Torvalds @ 2010-04-25 17:27 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Alan Cox, Greg KH, Rik van Riel, John Stoffel, Hedi Berriche,
	Mike Travis, Ingo Molnar, Jack Steiner, Andrew Morton,
	Robin Holt, LKML


On Sun, 25 Apr 2010, Linus Torvalds wrote:
> 
> Iirc, some _really_ old code used 'short' for pid_t, and we wanted to be 
> really safe when we raised the limits. 

.. I dug into the history, and this is from August 2002..

We used to limit it to sixteen bits, but that was too tight even then for 
some people, so first we did this:

    Author: Linus Torvalds <torvalds@home.transmeta.com>
    Date:   Thu Aug 8 03:57:42 2002 -0700
    
        Make pid allocation use 30 of the 32 bits, instead of 15.
    
    diff --git a/include/linux/threads.h b/include/linux/threads.h
    index 880b990..6804ee7 100644
    --- a/include/linux/threads.h
    +++ b/include/linux/threads.h
    @@ -19,6 +19,7 @@
     /*
      * This controls the maximum pid allocated to a process
      */
    -#define PID_MAX 0x8000
    +#define PID_MASK 0x3fffffff
    +#define PID_MAX (PID_MASK+1)
     
     #endif
    diff --git a/kernel/fork.c b/kernel/fork.c
    index d40d246..017740d 100644
    --- a/kernel/fork.c
    +++ b/kernel/fork.c
    @@ -142,7 +142,7 @@ static int get_pid(unsigned long flags)
                    return 0;
     
            spin_lock(&lastpid_lock);
    -       if((++last_pid) & 0xffff8000) {
    +       if((++last_pid) & ~PID_MASK) {
                    last_pid = 300;         /* Skip daemons etc. */
                    goto inside;
            }
    @@ -157,7 +157,7 @@ inside:
                               p->tgid == last_pid  ||
                               p->session == last_pid) {
                                    if(++last_pid >= next_safe) {
    -                                       if(last_pid & 0xffff8000)
    +                                       if(last_pid & ~PID_MASK)
                                                    last_pid = 300;
                                            next_safe = PID_MAX;
                                    }

which just upped the limits.  That, in turn, _did_ end up breaking some
silly old binaries, so then a month later Ingo did a "pid-max" patch
that made the maximum dynamic, with a default of the old 15-bit limit,
and a sysctl to raise it. 

And then a couple of weeks later, Ingo did another patch to fix the
scalability problems we had with lots of pids (avoiding the whole
"for_each_task()" crud to figure out which pids were ok, and using a
'struct pid' instead).

So the whole worry about > 15-bit pids goes back to 2002.  I think we're
pretty safe now. 

			Linus


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v3
  2010-04-25 17:15                           ` Linus Torvalds
  2010-04-25 17:27                             ` Linus Torvalds
@ 2010-04-26 19:48                             ` Mike Travis
  2010-04-26 20:46                               ` Greg KH
  2010-04-27  0:42                               ` [Patch 1/1] init: Increase pid_max based on num_possible_cpus v4 Mike Travis
  1 sibling, 2 replies; 32+ messages in thread
From: Mike Travis @ 2010-04-26 19:48 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar
  Cc: Pavel Machek, Alan Cox, Greg KH, Rik van Riel, John Stoffel,
	Hedi Berriche, Jack Steiner, Andrew Morton, Robin Holt, LKML

Subject: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v3
From: Hedi Berriche <hedi@sgi.com>

On a system with a substantial number of processors, the early default pid_max
of 32k will not be enough.  A system with 1664 CPU's, there are 25163 processes
started before the login prompt.  It's estimated that with 2048 CPU's we will pass
the 32k limit.  With 4096, we'll reach that limit very early during the boot cycle,
and processes would stall waiting for an available pid.

This patch increases the early maximum number of pids available, and increases
the minimum number of pids that can be set during runtime.

Signed-off-by: Hedi Berriche <hedi@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Robin Holt <holt@sgi.com>

---
 include/linux/threads.h |    9 +++++++++
 kernel/pid.c            |    7 +++++++
 2 files changed, 16 insertions(+)

--- linux-2.6.32.orig/include/linux/threads.h
+++ linux-2.6.32/include/linux/threads.h
@@ -33,4 +33,13 @@
 #define PID_MAX_LIMIT (CONFIG_BASE_SMALL ? PAGE_SIZE * 8 : \
 	(sizeof(long) > 4 ? 4 * 1024 * 1024 : PID_MAX_DEFAULT))
 
+/*
+ * Define a minimum number of pids per cpu.  Heuristically based
+ * on original pid max of 32k for 32 cpus.  Also, increase the
+ * minimum settable value for pid_max on the running system based
+ * on similar defaults.  See kernel/pid.c:pidmap_init() for details.
+ */
+#define PIDS_PER_CPU_DEFAULT	1024
+#define PIDS_PER_CPU_MIN	8
+
 #endif
--- linux-2.6.32.orig/kernel/pid.c
+++ linux-2.6.32/kernel/pid.c
@@ -511,6 +511,13 @@ void __init pidhash_init(void)
 
 void __init pidmap_init(void)
 {
+	/* bump default and minimum pid_max based on number of cpus */
+	pid_max = min(pid_max_max, max(pid_max,
+				PIDS_PER_CPU_DEFAULT * num_possible_cpus()));
+	pid_max_min = max(pid_max_min,
+				PIDS_PER_CPU_MIN * num_possible_cpus());
+	pr_info("pid_max: default: %u minimum: %u\n", pid_max, pid_max_min);
+
 	init_pid_ns.pidmap[0].page = kzalloc(PAGE_SIZE, GFP_KERNEL);
 	/* Reserve PID 0. We never call free_pidmap(0) */
 	set_bit(0, init_pid_ns.pidmap[0].page);


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v3
  2010-04-26 19:48                             ` [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v3 Mike Travis
@ 2010-04-26 20:46                               ` Greg KH
  2010-04-27  0:43                                 ` Mike Travis
  2010-04-27  0:42                               ` [Patch 1/1] init: Increase pid_max based on num_possible_cpus v4 Mike Travis
  1 sibling, 1 reply; 32+ messages in thread
From: Greg KH @ 2010-04-26 20:46 UTC (permalink / raw)
  To: Mike Travis
  Cc: Linus Torvalds, Ingo Molnar, Pavel Machek, Alan Cox,
	Rik van Riel, John Stoffel, Hedi Berriche, Jack Steiner,
	Andrew Morton, Robin Holt, LKML

On Mon, Apr 26, 2010 at 12:48:09PM -0700, Mike Travis wrote:
> Subject: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v3

Your subject is now incorrect, based on the patch.  You should also
adjust the body of the changelog to reflect the code change.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Patch 1/1] init: Increase pid_max based on num_possible_cpus v4
  2010-04-26 19:48                             ` [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v3 Mike Travis
  2010-04-26 20:46                               ` Greg KH
@ 2010-04-27  0:42                               ` Mike Travis
  1 sibling, 0 replies; 32+ messages in thread
From: Mike Travis @ 2010-04-27  0:42 UTC (permalink / raw)
  To: Linus Torvalds, Ingo Molnar
  Cc: Pavel Machek, Alan Cox, Greg KH, Rik van Riel, John Stoffel,
	Hedi Berriche, Jack Steiner, Andrew Morton, Robin Holt, LKML

Subject: [Patch 1/1] init: Increase pid_max based on num_possible_cpus v4
From: Hedi Berriche <hedi@sgi.com>

On a system with a substantial number of processors, the early default pid_max
of 32k will not be enough.  A system with 1664 CPU's, there are 25163 processes
started before the login prompt.  It's estimated that with 2048 CPU's we will pass
the 32k limit.  With 4096, we'll reach that limit very early during the boot cycle,
and processes would stall waiting for an available pid.

This patch increases the early maximum number of pids available, and increases
the minimum number of pids that can be set during runtime.

Signed-off-by: Hedi Berriche <hedi@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Robin Holt <holt@sgi.com>

---
Version 4: Fix subject line

Version 3: Automatically increase pid_max based on number of cpus instead of
adding a cmdline option for the operator to set it.
---
 include/linux/threads.h |    9 +++++++++
 kernel/pid.c            |    7 +++++++
 2 files changed, 16 insertions(+)

--- linux-2.6.32.orig/include/linux/threads.h
+++ linux-2.6.32/include/linux/threads.h
@@ -33,4 +33,13 @@
 #define PID_MAX_LIMIT (CONFIG_BASE_SMALL ? PAGE_SIZE * 8 : \
 	(sizeof(long) > 4 ? 4 * 1024 * 1024 : PID_MAX_DEFAULT))
 
+/*
+ * Define a minimum number of pids per cpu.  Heuristically based
+ * on original pid max of 32k for 32 cpus.  Also, increase the
+ * minimum settable value for pid_max on the running system based
+ * on similar defaults.  See kernel/pid.c:pidmap_init() for details.
+ */
+#define PIDS_PER_CPU_DEFAULT	1024
+#define PIDS_PER_CPU_MIN	8
+
 #endif
--- linux-2.6.32.orig/kernel/pid.c
+++ linux-2.6.32/kernel/pid.c
@@ -511,6 +511,13 @@ void __init pidhash_init(void)
 
 void __init pidmap_init(void)
 {
+	/* bump default and minimum pid_max based on number of cpus */
+	pid_max = min(pid_max_max, max(pid_max,
+				PIDS_PER_CPU_DEFAULT * num_possible_cpus()));
+	pid_max_min = max(pid_max_min,
+				PIDS_PER_CPU_MIN * num_possible_cpus());
+	pr_info("pid_max: default: %u minimum: %u\n", pid_max, pid_max_min);
+
 	init_pid_ns.pidmap[0].page = kzalloc(PAGE_SIZE, GFP_KERNEL);
 	/* Reserve PID 0. We never call free_pidmap(0) */
 	set_bit(0, init_pid_ns.pidmap[0].page);


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v3
  2010-04-26 20:46                               ` Greg KH
@ 2010-04-27  0:43                                 ` Mike Travis
  0 siblings, 0 replies; 32+ messages in thread
From: Mike Travis @ 2010-04-27  0:43 UTC (permalink / raw)
  To: Greg KH
  Cc: Linus Torvalds, Ingo Molnar, Pavel Machek, Alan Cox,
	Rik van Riel, John Stoffel, Hedi Berriche, Jack Steiner,
	Andrew Morton, Robin Holt, LKML



Greg KH wrote:
> On Mon, Apr 26, 2010 at 12:48:09PM -0700, Mike Travis wrote:
>> Subject: [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v3
> 
> Your subject is now incorrect, based on the patch.  You should also
> adjust the body of the changelog to reflect the code change.
> 
> thanks,
> 
> greg k-h

Thanks for that catch.  I had changed the name of the patch, but not the subject.

-Mike

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2010-04-27  0:43 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-04-21  1:40 [Patch 1/1] init: Provide a kernel start parameter to increase pid_max Mike Travis
2010-04-21  1:52 ` [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2 Mike Travis
2010-04-21  9:23   ` Alan Cox
2010-04-21 16:59     ` Hedi Berriche
2010-04-21 17:18       ` Rik van Riel
2010-04-21 17:54         ` Mike Travis
2010-04-21 19:14         ` John Stoffel
2010-04-21 19:33           ` Hedi Berriche
2010-04-21 20:10             ` John Stoffel
2010-04-21 22:24               ` Greg KH
2010-04-21 22:49                 ` Rik van Riel
2010-04-21 23:22                   ` Greg KH
2010-04-22  9:28                     ` Alan Cox
2010-04-22 12:58                       ` Jack Steiner
2010-04-22 13:57                       ` Robin Holt
2010-04-22 14:48                       ` Linus Torvalds
2010-04-22 17:08                         ` Robin Holt
2010-04-22 18:10                           ` John Stoffel
2010-04-22 20:35                           ` Andrew Morton
2010-04-25  7:16                         ` Pavel Machek
2010-04-25 17:15                           ` Linus Torvalds
2010-04-25 17:27                             ` Linus Torvalds
2010-04-25 12:13                               ` Pavel Machek
2010-04-26 19:48                             ` [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v3 Mike Travis
2010-04-26 20:46                               ` Greg KH
2010-04-27  0:43                                 ` Mike Travis
2010-04-27  0:42                               ` [Patch 1/1] init: Increase pid_max based on num_possible_cpus v4 Mike Travis
2010-04-21 17:58       ` [Patch 1/1] init: Provide a kernel start parameter to increase pid_max v2 Alan Cox
2010-04-21 19:12         ` Hedi Berriche
2010-04-21 19:51           ` Greg KH
2010-04-21 20:12             ` Hedi Berriche
2010-04-21 22:05           ` Jack Steiner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.