From: Hubertus Franke <frankeh@watson.ibm.com>
To: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>,
davej@suse.de, marcelo@cnectiva.com.br
Cc: "Rajan Ravindran" <rajancr@us.ibm.com>,
linux-kernel@vger.kernel.org, lse-tech@lists.sourceforge.net
Subject: [PATCH] get_pid() performance fix
Date: Fri, 22 Mar 2002 17:14:03 -0500 [thread overview]
Message-ID: <20020322221318.5F6083FE06@smtp.linux.ibm.com> (raw)
In-Reply-To: <OF810580E6.8672B341-ON85256B73.005AF9B8@pok.ibm.com> <20020315183610.212993FE06@smtp.linux.ibm.com> <87zo19jdu7.fsf@devron.myhome.or.jp>
I implemented an alternative version of getpid, that for large thread counts
( > 220000), provides "significantly" better performance as shown in attached
performance analysis. This is particulary viable for PID_MAX=32768.
Note under the current algorithm, allocating the last pid will take 32
seconds (!!!) on a 500MHZ P-III.
I addressed the correctnes issue that Hirofumi brought up with the last
submission (now test case "-c 4")
Attached patch is against 2.5.7.
-- Hubertus Franke <frankeh@watson.ibm.com>
------------------------------------------------------------------------------
Currently the getpid algorithm works as follows:
At any given time an interval of [ last_pid .. next_safe ) is known to hold
unused pids. Initially the interval is set to [0 .. 32767]
Hence to allocate a new pid the following is sufficient:
if (++last_pid < next_safe) return last_pid;
However, if we move out of the safe interval, the next safe interval needs
to be established first.
This is currently done by a repetive search
repeat:
foralltasks(p) {
if (p uses lastpid) { last_pid++; goto repeat; }
/* narrow [ last_pid .. next_safe ) */
if (p->pids in [ last_pid .. next_safe ) ) next_safe = p->pid
}
Particulary for large number of tasks, this can lead to frequent exercise of
the repeat resulting in a O(N^2) algorithm. We call this : <algo-0>.
Instead I have provided an alternative mechanism that at the time
of determining the next interval marks a bitmask by walking the tasklist
once [ O(N) ] and then finding the proper bit offsets to mark the next free
interval starting from last_pid. The bitmap requires 4096 bytes.
This is <algo-1>.
An optimization to this to keep the last bitmap instead of clearing it
with every search. Only if we fail to obtain a free range, then we have
to go back and clear the bitmap and redo the search one more time.
This is <algo-2>. Here I omit the results, as they have only shown
improvements for the very last few pids.
I dragged the various algorithms into a userlevel test program to figure
out where the cut off points are with PID_MAX=32768. In this testprogram
I maintain <TH> tasks, and for 10 rounds (delete <DEL> random tasks and
reallocate <DEL> tasks again) resulting in T=10*D total measured allocations.
Si states how many interval searches where needed for algo-i.
Gi states the average overhead per get_pid for algo-i in usecs.
Based on that one should use the current algorithm until ~ 22-25K tasks and
beyond that use algo-2. Only the last 15 tasks are a bit faster under algo-1.
We can safely ignore that case.
Based on that providing an adaptive implementation seems the right choice.
The patch for 2.5.7 is attached.
executed program example: getpid -c 2 -s 10 -e 100 -D 10 -t <0,1>
0 is old 1 is new algo 2.
TH: average thread count in system
DEL: number of randomly deleted threads and then reallocated
TAL: total number of getpid invocation
C-NEW: number of times search was invoked
T-NEW: per getpid() overhead
C-OLD: number of times search was invoked
T-OLD: per getpid() overhead
TH DEL TAL | C-NEW T-NEW | C-OLD T-OLD
10 10 80 | 1 0.79 | 1 0.41
20 10 100 | 1 0.56 | 1 0.36
30 10 100 | 1 0.56 | 1 0.38
40 10 100 | 1 0.58 | 1 0.42
50 10 100 | 1 0.59 | 1 0.43
60 10 100 | 2 0.84 | 2 0.52
70 10 100 | 1 0.72 | 1 0.43
80 10 100 | 1 0.72 | 1 0.48
100 50 500 | 2 0.38 | 2 0.27
150 50 500 | 4 0.56 | 3 0.32
200 50 500 | 5 0.79 | 4 0.40
250 50 500 | 6 1.03 | 2 0.35
300 50 500 | 9 1.68 | 2 0.38
350 50 500 | 6 1.47 | 4 0.58
400 50 500 | 10 2.54 | 8 1.05
450 50 500 | 7 2.10 | 7 1.02
500 50 500 | 7 2.32 | 4 0.75
550 50 500 | 9 3.13 | 5 0.95
600 50 500 | 14 5.14 | 6 1.18
650 50 500 | 13 5.23 | 9 1.79
700 50 500 | 10 4.35 | 9 1.87
750 50 500 | 15 6.91 | 8 2.02
800 50 500 | 12 5.98 | 8 1.95
850 50 500 | 13 6.85 | 9 2.29
900 50 500 | 27 14.55 | 18 4.46
1000 1000 9980 | 1 0.21 | 1 0.21
2000 1000 10000 | 322 14.36 | 76 2.03
3000 1000 10000 | 606 46.10 | 161 6.63
4000 1000 10000 | 901 97.58 | 359 18.87
5000 1000 10000 | 1165 164.19 | 539 38.75
6000 1000 10000 | 1498 266.58 | 724 66.96
7000 1000 10000 | 1835 396.91 | 1026 122.35
8000 1000 10000 | 2084 531.83 | 1228 179.70
9000 1000 10000 | 2489 746.99 | 1516 264.16
10000 1000 10000 | 2763 946.99 | 1818 375.22
11000 1000 10000 | 3095 1199.73 | 2093 502.47
12000 1000 10000 | 3277 1422.02 | 2305 648.17
13000 1000 10000 | 3711 1776.28 | 2680 854.89
14000 1000 10000 | 3878 2033.30 | 2959 1061.61
15000 1000 10000 | 4301 2452.35 | 3167 1266.78
16000 1000 10000 | 4485 2757.00 | 3466 1554.22
17000 1000 10000 | 4844 3210.19 | 3743 1857.74
18000 1000 10000 | 5218 3681.90 | 4069 2247.13
19000 1000 10000 | 5501 4118.69 | 4293 2605.90
20000 1000 10000 | 5770 4594.99 | 4616 3095.39
21000 1000 10000 | 6161 5172.44 | 4974 3686.87
22000 1000 10000 | 6389 5637.80 | 5177 4258.81
23000 1000 10000 | 6665 6191.73 | 5483 4949.61
24000 1000 10000 | 6982 6777.02 | 5838 5831.25
25000 1000 10000 | 7209 7318.15 | 6183 6905.34
----------------> Break even range <------------------------
26000 1000 10000 | 7533 7954.33 | 6413 7955.66
27000 1000 10000 | 7959 8749.97 | 6748 9444.79
28000 1000 10000 | 8140 9302.36 | 7139 11617.75
29000 1000 10000 | 8501 10085.77 | 7638 14960.53
30000 1000 10000 | 8911 10946.80 | 8178 19584.24
32000 50 500 | 487 12265.89 | 486 94498.36
32050 50 500 | 486 12314.17 | 486 95832.03
32100 50 500 | 486 12389.38 | 488 108895.28
32150 50 500 | 492 12599.58 | 484 111742.39
32200 50 500 | 497 12792.36 | 491 124440.45
32250 50 500 | 490 12659.33 | 490 134499.09
32300 50 500 | 495 12843.78 | 489 145873.72
32350 50 500 | 495 12915.18 | 493 158940.66
32400 50 500 | 496 12988.64 | 494 183092.45
32450 50 500 | 492 12924.86 | 490 196135.28
32500 50 500 | 495 13043.85 | 488 223226.98
32550 50 500 | 498 13164.09 | 495 265431.10
32600 50 500 | 498 13222.56 | 495 326363.36
32650 50 500 | 498 13279.90 | 497 441002.09
32700 50 500 | 499 13368.66 | 499 656269.52
32751 1 10 | 10 12836.40 | 10 4031660.40
32752 1 10 | 10 12831.70 | 10 4620214.70
32753 1 10 | 10 12832.60 | 10 4188605.70
32754 1 10 | 10 12837.60 | 10 2972975.40
32755 1 10 | 10 12835.90 | 10 4434635.70
32756 1 10 | 10 12833.80 | 10 3892058.30
32757 1 10 | 10 12839.10 | 10 5002365.30
32758 1 10 | 10 12840.20 | 10 6332786.20
32759 1 10 | 10 12840.20 | 10 5269462.90
32760 1 10 | 10 14127.80 | 10 8234780.40
32761 1 10 | 10 14134.90 | 10 7558043.20
32762 1 10 | 10 14129.70 | 10 9117119.40
32763 1 10 | 10 14134.70 | 10 13895498.10
32764 1 10 | 10 14140.10 | 10 13608972.90
32765 1 10 | 10 15427.30 | 10 12930469.20
32766 1 10 | 10 16708.30 | 10 23576610.90
32767 1 10 | 10 17997.90 | 10 32603396.10
--
-- Hubertus Franke (frankeh@watson.ibm.com)
diff -urbN linux-2.5.7/kernel/fork.c linux-2.5.7-pid/kernel/fork.c
--- linux-2.5.7/kernel/fork.c Mon Mar 18 15:37:05 2002
+++ linux-2.5.7-pid/kernel/fork.c Fri Mar 22 16:38:29 2002
@@ -125,24 +125,111 @@
/* Protects next_safe and last_pid. */
spinlock_t lastpid_lock = SPIN_LOCK_UNLOCKED;
+/* this should be provided in every architecture */
+#ifndef SHIFT_PER_LONG
+#if BITS_PER_LONG == 64
+# define SHIFT_PER_LONG 6
+#elif BITS_PER_LONG == 32
+# define SHIFT_PER_LONG 5
+#else
+#error "SHIFT_PER_LONG"
+#endif
+#endif
+
+#define RESERVED_PIDS (300)
+#define GETPID_THRESHOLD (26000) /* when to switch to a mapped algo */
+#define PID_MAP_SIZE (PID_MAX >> SHIFT_PER_LONG)
+static unsigned long pid_map[PID_MAP_SIZE];
+static int next_safe = PID_MAX;
+
+static inline void mark_pid(int pid)
+{
+ __set_bit(pid,pid_map);
+}
+
+static int get_pid_by_map(int lastpid)
+{
+ static int mark_and_sweep = 0;
+
+ int round = 0;
+ struct task_struct *p;
+ int i;
+ unsigned long mask;
+ int fpos;
+
+
+ if (mark_and_sweep) {
+repeat:
+ mark_and_sweep = 0;
+ memset(pid_map, 0, PID_MAP_SIZE * sizeof(unsigned long));
+ lastpid = RESERVED_PIDS;
+ }
+ for_each_task(p) {
+ mark_pid(p->pid);
+ mark_pid(p->pgrp);
+ mark_pid(p->tgid);
+ mark_pid(p->session);
+ }
+
+ /* find next free pid */
+ i = (lastpid >> SHIFT_PER_LONG);
+ mask = pid_map[i] | (((lastpid & (BITS_PER_LONG-1)) << 1) - 1);
+
+ while ((mask == ~0) && (++i < PID_MAP_SIZE))
+ mask = pid_map[i];
+
+ if (i == PID_MAP_SIZE) {
+ if (round == 0) {
+ round = 1;
+ goto repeat;
+ }
+ next_safe = RESERVED_PIDS;
+ mark_and_sweep = 1; /* mark next time */
+ return 0;
+ }
+
+ fpos = ffz(mask);
+ i &= (PID_MAX-1);
+ lastpid = (i << SHIFT_PER_LONG) + fpos;
+
+ /* find next save pid */
+ mask &= ~((1 << fpos) - 1);
+
+ while ((mask == 0) && (++i < PID_MAP_SIZE))
+ mask = pid_map[i];
+
+ if (i==PID_MAP_SIZE)
+ next_safe = PID_MAX;
+ else
+ next_safe = (i << SHIFT_PER_LONG) + ffs(mask) - 1;
+ return lastpid;
+}
+
static int get_pid(unsigned long flags)
{
- static int next_safe = PID_MAX;
struct task_struct *p;
- int pid;
+ int pid,beginpid;
if (flags & CLONE_PID)
return current->pid;
spin_lock(&lastpid_lock);
+ beginpid = last_pid;
if((++last_pid) & 0xffff8000) {
- last_pid = 300; /* Skip daemons etc. */
+ last_pid = RESERVED_PIDS; /* Skip daemons etc. */
goto inside;
}
if(last_pid >= next_safe) {
inside:
next_safe = PID_MAX;
read_lock(&tasklist_lock);
+ if (nr_threads > GETPID_THRESHOLD) {
+ last_pid = get_pid_by_map(last_pid);
+ if (unlikely(last_pid == 0)) {
+ last_pid = PID_MAX;
+ goto nomorepids;
+ }
+ } else {
repeat:
for_each_task(p) {
if(p->pid == last_pid ||
@@ -151,9 +238,11 @@
p->session == last_pid) {
if(++last_pid >= next_safe) {
if(last_pid & 0xffff8000)
- last_pid = 300;
+ last_pid = RESERVED_PIDS;
next_safe = PID_MAX;
}
+ if(unlikely(last_pid == beginpid))
+ goto nomorepids;
goto repeat;
}
if(p->pid > last_pid && next_safe > p->pid)
@@ -162,6 +251,9 @@
next_safe = p->pgrp;
if(p->session > last_pid && next_safe > p->session)
next_safe = p->session;
+ if(p->tgid > last_pid && next_safe > p->tgid)
+ next_safe = p->tgid;
+ }
}
read_unlock(&tasklist_lock);
}
@@ -169,6 +261,10 @@
spin_unlock(&lastpid_lock);
return pid;
+nomorepids:
+ read_unlock(&tasklist_lock);
+ spin_unlock(&lastpid_lock);
+ return 0;
}
static inline int dup_mmap(struct mm_struct * mm)
next prev parent reply other threads:[~2002-03-22 22:13 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-03-05 16:43 Fwd: [Lse-tech] get_pid() performance fix Rajan Ravindran
2002-03-05 17:57 ` OGAWA Hirofumi
2002-03-05 19:53 ` Hubertus Franke
2002-03-05 20:10 ` OGAWA Hirofumi
2002-03-05 21:59 ` Hubertus Franke
2002-03-05 22:48 ` OGAWA Hirofumi
2002-03-05 23:40 ` Hubertus Franke
2002-03-14 23:18 ` [PATCH] " Hubertus Franke
2002-03-15 14:57 ` OGAWA Hirofumi
2002-03-15 15:16 ` OGAWA Hirofumi
2002-03-15 18:37 ` Hubertus Franke
2002-03-16 5:12 ` OGAWA Hirofumi
2002-03-18 21:44 ` Hubertus Franke
2002-03-22 22:14 ` Hubertus Franke [this message]
2002-03-22 22:28 ` Davide Libenzi
2002-03-22 22:26 ` Hubertus Franke
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20020322221318.5F6083FE06@smtp.linux.ibm.com \
--to=frankeh@watson.ibm.com \
--cc=davej@suse.de \
--cc=hirofumi@mail.parknet.co.jp \
--cc=linux-kernel@vger.kernel.org \
--cc=lse-tech@lists.sourceforge.net \
--cc=marcelo@cnectiva.com.br \
--cc=rajancr@us.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).