linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* pre6 BUG oops
@ 2001-10-31 17:25 Jeff Garzik
  2001-10-31 17:41 ` David S. Miller
  0 siblings, 1 reply; 10+ messages in thread
From: Jeff Garzik @ 2001-10-31 17:25 UTC (permalink / raw)
  To: Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 355 bytes --]

2.4.14-pre6 on UP alpha, newly reformatted and reinstalled :)

Machine was nowhere near OOM, and no other adverse messages appeared in
dmesg.  An rpm build and an rpm install were running in parallel.

-- 
Jeff Garzik      | Only so many songs can be sung
Building 1024    | with two lips, two lungs, and one tongue.
MandrakeSoft     |         - nomeansno

[-- Attachment #2: oops.txt --]
[-- Type: text/plain, Size: 3054 bytes --]

ksymoops 2.4.1 on alpha 2.4.14-pre6.  Options used
     -v /home/jgarzik/cvs/linux_2_4/vmlinux (specified)
     -k /proc/ksyms (specified)
     -l /proc/modules (specified)
     -o /lib/modules/2.4.14-pre6/ (specified)
     -m /boot/System.map-2.4.14-pre6 (specified)

Warning (compare_maps): mismatch on symbol packet_socks_nr  , af_packet says fffffffc002c2584, /lib/modules/2.4.14-pre6/kernel/net/packet/af_packet.o says fffffffc002be004.  Ignoring /lib/modules/2.4.14-pre6/kernel/net/packet/af_packet.o entry
kernel BUG at page_alloc.c:82!
rpmd(20216): Kernel Bug 1
pc = [__free_pages_ok+288/1040]  ra = [__free_pages_ok+276/1040]  ps = 0000    Tainted: P 
pc = [<fffffc0000841b50>]  ra = [<fffffc0000841b44>]  ps = 0000    Tainted: P 
Using defaults from ksymoops -t elf64-alpha -a alpha
v0 = 000000000000001f  t0 = 0000000000000001  t1 = fffffc0016a0fec8
t2 = 0000000000000056  t3 = 0000000000000001  t4 = fffffc0000a3fcf8
t5 = fffffc0017d5a2c0  t6 = 0000000000000065  t7 = fffffc0008624000
a0 = 0000000000000000  a1 = 0000000000000001  a2 = 0000000000000001
a3 = 0000000000000001  a4 = 000000000000000e  a5 = 0000000000000001
t8 = fffffc0000a407f0  t9 = 0000000000004000  t10= fffffc0000a407f8
t11= fffffc0000a40808  pv = fffffc000081e1a0  at = 0000000000000000
gp = fffffc0000a3a8b8  sp = fffffc0008627d58
Trace:fffffc0000836160 fffffc0000835e28 fffffc0000836160 fffffc0000836338 fffffc00008364b4 fffffc0000865768 fffffc0000862e3c fffffc0000859c30 fffffc0000859dc0 fffffc0000810b10 
Code: 225f0052  a77da6e0  6b5b4083  27ba0020  23bd8d74  00000081 <a04a0000> 44481001 

>>PC;  fffffc0000841b50 <__free_pages_ok+120/410>   <=====
Trace; fffffc0000836160 <truncate_complete_page+90/b0>
Trace; fffffc0000835e28 <remove_inode_page+48/60>
Trace; fffffc0000836160 <truncate_complete_page+90/b0>
Trace; fffffc0000836338 <truncate_list_pages+1b8/2d0>
Trace; fffffc00008364b4 <truncate_inode_pages+64/d0>
Trace; fffffc0000865768 <iput+108/270>
Trace; fffffc0000862e3c <d_delete+7c/c0>
Trace; fffffc0000859c30 <vfs_unlink+190/230>
Trace; fffffc0000859dc0 <sys_unlink+f0/1f0>
Trace; fffffc0000810b10 <entSys+a8/c0>
Code;  fffffc0000841b38 <__free_pages_ok+108/410>
0000000000000000 <_PC>:
Code;  fffffc0000841b38 <__free_pages_ok+108/410>
   0:   52 00 5f 22       lda  a2,82(zero)
Code;  fffffc0000841b3c <__free_pages_ok+10c/410>
   4:   e0 a6 7d a7       ldq  t12,-22816(gp)
Code;  fffffc0000841b40 <__free_pages_ok+110/410>
   8:   83 40 5b 6b       jsr  ra,(t12),218 <_PC+0x218> fffffc0000841d50 <__free_pages_ok+320/410>
Code;  fffffc0000841b44 <__free_pages_ok+114/410>
   c:   20 00 ba 27       ldah gp,32(ra)
Code;  fffffc0000841b48 <__free_pages_ok+118/410>
  10:   74 8d bd 23       lda  gp,-29324(gp)
Code;  fffffc0000841b4c <__free_pages_ok+11c/410>
  14:   81 00 00 00       call_pal     0x81
Code;  fffffc0000841b50 <__free_pages_ok+120/410>   <=====
  18:   00 00 4a a0       ldl  t1,0(s1)   <=====
Code;  fffffc0000841b54 <__free_pages_ok+124/410>
  1c:   01 10 48 44       and  t1,0x40,t0


1 warning issued.  Results may not be reliable.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: pre6 BUG oops
  2001-10-31 17:25 pre6 BUG oops Jeff Garzik
@ 2001-10-31 17:41 ` David S. Miller
  2001-10-31 18:19   ` Linus Torvalds
  0 siblings, 1 reply; 10+ messages in thread
From: David S. Miller @ 2001-10-31 17:41 UTC (permalink / raw)
  To: jgarzik; +Cc: linux-kernel

   From: Jeff Garzik <jgarzik@mandrakesoft.com>
   Date: Wed, 31 Oct 2001 12:25:21 -0500

   2.4.14-pre6 on UP alpha, newly reformatted and reinstalled :)
   
   Machine was nowhere near OOM, and no other adverse messages appeared in
   dmesg.  An rpm build and an rpm install were running in parallel.
   
Hmmm... the oops suggests that truncate_complete_page() gets a page
not in the page cache.  The page count should be at least 2 when
it gets passed there.

Franks a lot,
David S. Miller
davem@redhat.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: pre6 BUG oops
  2001-10-31 17:41 ` David S. Miller
@ 2001-10-31 18:19   ` Linus Torvalds
  2001-10-31 18:30     ` Jeff Garzik
  0 siblings, 1 reply; 10+ messages in thread
From: Linus Torvalds @ 2001-10-31 18:19 UTC (permalink / raw)
  To: linux-kernel

In article <20011031.094112.125896630.davem@redhat.com>,
David S. Miller <davem@redhat.com> wrote:
>   From: Jeff Garzik <jgarzik@mandrakesoft.com>
>   Date: Wed, 31 Oct 2001 12:25:21 -0500
>
>   2.4.14-pre6 on UP alpha, newly reformatted and reinstalled :)
>   
>   Machine was nowhere near OOM, and no other adverse messages appeared in
>   dmesg.  An rpm build and an rpm install were running in parallel.
>   
>Hmmm... the oops suggests that truncate_complete_page() gets a page
>not in the page cache.  The page count should be at least 2 when
>it gets passed there.

Indeed.  Looks like somebody removed the page from the page cache
without removing it from the inode queue. 

Which sounds rather impossible, as the two are always done together.

Maybe it's just the page count that is buggered, and we free it too
early as a result.  Is this the same machine that had interesting
trouble before?

		Linus

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: pre6 BUG oops
  2001-10-31 18:19   ` Linus Torvalds
@ 2001-10-31 18:30     ` Jeff Garzik
  2001-10-31 19:53       ` Linus Torvalds
  0 siblings, 1 reply; 10+ messages in thread
From: Jeff Garzik @ 2001-10-31 18:30 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

Linus Torvalds wrote:
> Maybe it's just the page count that is buggered, and we free it too
> early as a result.  Is this the same machine that had interesting
> trouble before?

yes, a UP alpha running 2.4.14-pre6, that was described in the false oom
killer report.

-- 
Jeff Garzik      | Only so many songs can be sung
Building 1024    | with two lips, two lungs, and one tongue.
MandrakeSoft     |         - nomeansno


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: pre6 BUG oops
  2001-10-31 18:30     ` Jeff Garzik
@ 2001-10-31 19:53       ` Linus Torvalds
  2001-10-31 21:48         ` 2.4.13 (and 2.4.5) performance drops under load and does not recover David Lang
  2001-11-03 20:46         ` pre6 BUG oops Richard Henderson
  0 siblings, 2 replies; 10+ messages in thread
From: Linus Torvalds @ 2001-10-31 19:53 UTC (permalink / raw)
  To: linux-kernel

In article <3BE04338.8F0AF9D4@mandrakesoft.com>,
Jeff Garzik  <jgarzik@mandrakesoft.com> wrote:
>Linus Torvalds wrote:
>> Maybe it's just the page count that is buggered, and we free it too
>> early as a result.  Is this the same machine that had interesting
>> trouble before?
>
>yes, a UP alpha running 2.4.14-pre6, that was described in the false oom
>killer report.

Ok, I think it's the same bug, and that we're just freeing the wrong
page somehow. 

I wonder if the "VALID_PAGE(page)" macro is reliable on alpha? You seem
to be able to trigger this too easily for it to not being something
specific to the setup..

		Linus

^ permalink raw reply	[flat|nested] 10+ messages in thread

* 2.4.13 (and 2.4.5) performance drops under load and does not recover
  2001-10-31 19:53       ` Linus Torvalds
@ 2001-10-31 21:48         ` David Lang
  2001-10-31 23:22           ` David Lang
  2001-11-03 20:46         ` pre6 BUG oops Richard Henderson
  1 sibling, 1 reply; 10+ messages in thread
From: David Lang @ 2001-10-31 21:48 UTC (permalink / raw)
  To: torvalds; +Cc: linux-kernel

symptoms: when the firewall is freshly booted it supports >200
connections/sec, when the firewall is hit with lots of simultanious
requests it slows down, after the flood of requests stops and the machine
sits idle it never recovers back to it's origional performance. I can
drive the performance down to <50 connections/sec permanently. The CPU
is eaten up more and more by the system, leaving less time for userspace.

the single threaded requests do outperform the massivly parallel test (as
can be expected as the scheduler isn't optimized for long runqueues :-) I
am not asking to optimize things for the large runqueue, just to be able
to recover after the long runqueue had finished.

killing and restarting the proxy does not fix the problem. The only way I
have discovered to resotre performance is a reboot of the firewall.

test setup:

client---firewall---server

all boxes are 1.2GHz athlon, 512MB ram, IDE drive, D-link quad ethernet,
2GB swap space (although it never comes close to useing it)

ethernet switches are cisco 3548

firewall is running 4 proxies, each proxy accepts incoming connections,
forks to handle that connection, relays the data to/from the server and
then exits. (yes I know this is not the most efficiant way to do things,
but this _is_ a real world use)

I did a vmstat 1 on the firewall during the test and exerpts from it are
listed below.

server is running apache (maxclients set to 250 after prior runs noted
that it ran up against the 150 limit)

client is running ab (apache benchmark tool)

testing procedure:

1. boot firewall

2. get performance data
run ab -n 400 firewall:port/index.html.en twice (first to get past
initalization delays, second to get a performace value)
This does 400 sequential connections and reports the time to get the
responses

3. flood the firewall
run 5 copies of ab -n 5000 -c 200 firewall:port/index.html.en
this does 5000 connections attempting to make up to 200 connections at a
time

4. wait for all the connections to clear on the firewall (at least,
possibly longer if I am in a meeting). The idea here is to return
everything to a stable mode.

5. run ab -n 400 firewall:port/index.html and record the connections/sec

loop back to 3.

after 4 iterations of load reboot the firewall and do two more runs of ab
-n 400 to make sure that performance is back up and the slowdown is not
due to large logfiles anywhere.

at the start of the test vmstat looks like this


ab -n 200 (219 connections/sec) this is two runs sequentially
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 0  0  0      0 496744    604   8076   0   0     0     0  101    11   0   0 100
 1  0  1      0 496548    604   8124   0   0    48     0 3814  1759  36  41  24
 0  0  0      0 496588    604   8124   0   0     0     0 4007  1899  42  34  25
 0  0  0      0 496576    604   8128   0   0     0     0  103    16   0   0 100
 1  0  0      0 496452    604   8128   0   0     0     0  924   405  11   6  83
 2  0  0      0 496360    604   8128   0   0     0     0 4252  2015  41  40  20
 0  0  0      0 496492    604   8128   0   0     0     0 2628  1240  26  21  53
 0  0  0      0 496492    604   8128   0   0     0     0  101     7   0   0 100


5xab -n 5000 -c 200 (~210 connections/sec)
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
19  0  1      0 493192    604   8124   0   0     0     0 5938  1139  55  45   0
13  0  1      0 492980    604   8124   0   0     0     0 4956  1059  58  42   0
11  0  1      0 492692    604   8124   0   0     0     0 4878  1057  45  55   0
17  0  1      0 492900    604   8124   0   0     0     0 4890  1072  57  43   0
29  0  1      0 492572    604   8124   0   0     0     0 4648  1049  46  54   0
13  0  1      0 492608    604   8124   0   0     0     0 4650  1041  50  50   0
11  0  1      0 492484    604   8124   0   0     0     0 4645  1043  38  62   0
18  0  1      0 492472    604   8128   0   0     0     0 4779  1029  56  44   0
17  0  1      0 492440    604   8128   0   0     0     0 4691  1057  46  54   0
18  0  1      0 492476    604   8128   0   0     0     0 4598  1074  54  46   0
18  0  1      0 492488    604   8128   0   0     0     0 4625  1051  53  47   0
22  0  1      0 492388    604   8128   0   0     0     0 4661  1057  50  50   0
10  0  1      0 492448    604   8128   0   0     0     0 4569  1033  56  44   0
22  0  1      0 492364    604   8128   0   0     0     0 4589  1036  48  52   0
18  0  1      0 492384    604   8128   0   0     0     0 4536  1031  48  52   0
15  0  1      0 492236    604   8128   0   0     0     0 4528  1034  43  57   0
26  0  1      0 492132    604   8128   0   0     0     0 4554  1037  50  50   0
24  0  1      0 492016    604   8128   0   0     0     0 4518  1037  48  52   0

at the end of 4 runs

5xab -n 5000 -c 200 (~40 connections/sec)
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
23  0  1      0 482356    624   8524   0   0     0     0 1053   181   7  93   0
29  0  1      0 481724    624   8528   0   0     0     0  960   206   7  93   0
23  0  1      0 482300    624   8528   0   0     0     0 1059   176   5  95   0
30  0  1      0 482104    624   8528   0   0     0    72 1095   197   9  91   0
19  0  1      0 482760    624   8528   0   0     0     0 1014   179   5  95   0
31  0  1      0 481892    624   8528   0   0     0     0 1010   198   6  94   0
16  0  1      0 482488    624   8524   0   0     0     0 1001   176   8  92   0
29  0  1      0 482236    624   8524   0   0     0     0 1037   179   9  91   0
12  0  1      0 483008    624   8528   0   0     0     0 1182   188   8  92   0
25  0  1      0 482620    624   8528   0   0     0     0  988   173   8  92   0
19  0  1      0 482696    624   8528   0   0     0     0  931   173   7  93   0
20  0  1      0 482776    624   8528   0   0     0     0  985   171   9  91   0
20  0  1      0 482116    624   8528   0   0     0     0 1122   119   5  95   0
21  0  1      0 482888    624   8528   0   0     0     0  830   144   3  97   0
16  0  1      0 481916    624   8528   0   0     0     0 1054   155   9  91   0
18  0  1      0 482892    624   8528   0   0     0     0  875   143  12  88   0
20  0  1      0 483100    624   8528   0   0     0     0  875   146   7  93   0
27  0  1      0 482428    624   8528   0   0     0     0  859   153   6  94   0
26  0  1      0 482688    624   8528   0   0     0     0  874   151   8  92   0

ab -n 400 (~48 connections/sec) this is a single run
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 0  0  0      0 483552    624   8544   0   0     0     0  103    13   0   0 100
 2  0  0      0 483388    624   8544   0   0     0     0  592   225  10  38  52
 1  0  0      0 483392    624   8544   0   0     0     0 1088   488  13  87   0
 2  0  0      0 483392    624   8544   0   0     0     0 1108   468  10  89   1
 1  0  0      0 483404    624   8548   0   0     0     0 1081   452   9  89   2
 1  0  0      0 483508    624   8548   0   0     0     0 1096   486   7  90   3
 2  0  1      0 483368    624   8548   0   0     0     0 1091   458  11  87   2
 1  0  0      0 483364    624   8548   0   0     0     0 1085   480  13  85   2
 1  0  0      0 483356    624   8552   0   0     0     0 1098   478   9  87   4
 0  0  0      0 483520    624   8548   0   0     0     0  796   330   9  59  32
 0  0  0      0 483520    624   8548   0   0     0     0  103    13   0   0 100

Please tell me what additional things I should try to identify what the
kernel is getting tied up doing.

I was able to duplicate this problem with 2.4.5 as well as 2.4.13. the
performance of 2.4.5 is ~4% lower then 2.4.13 in all cases.

David Lang

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.4.13 (and 2.4.5) performance drops under load and does not recover
  2001-10-31 21:48         ` 2.4.13 (and 2.4.5) performance drops under load and does not recover David Lang
@ 2001-10-31 23:22           ` David Lang
  2001-11-01 21:37             ` David Lang
  0 siblings, 1 reply; 10+ messages in thread
From: David Lang @ 2001-10-31 23:22 UTC (permalink / raw)
  To: torvalds; +Cc: linux-kernel

This problem is also happening on 2.4.14pre6.

however one thing that I did find

in this firewall setup I am useing ipchains REDIRECT to map port 1433 on
four IP addresses to ports 1434-1437 to get to the proxies.

if I do the same test, but instead of having all 5 copies of ab hitting
port 1433 I make them connect to the real port of the proxy the slowdown
does not happen (I have run 5 iterations of the load test without any
slowdown)

So this slowdown appears to be in the ipchains emulation code or the
iptables port mapping code.

I think my next step is going to be to try to use iptables instead of
ipchains to do the port mapping.

David Lang

On Wed, 31 Oct 2001, David Lang wrote:

> Date: Wed, 31 Oct 2001 13:48:43 -0800 (PST)
> From: David Lang <david.lang@digitalinsight.com>
> To: torvalds@transmeta.com
> Cc: linux-kernel@vger.kernel.org
> Subject: 2.4.13 (and 2.4.5) performance drops under load and does not
>     recover
>
> symptoms: when the firewall is freshly booted it supports >200
> connections/sec, when the firewall is hit with lots of simultanious
> requests it slows down, after the flood of requests stops and the machine
> sits idle it never recovers back to it's origional performance. I can
> drive the performance down to <50 connections/sec permanently. The CPU
> is eaten up more and more by the system, leaving less time for userspace.
>
> the single threaded requests do outperform the massivly parallel test (as
> can be expected as the scheduler isn't optimized for long runqueues :-) I
> am not asking to optimize things for the large runqueue, just to be able
> to recover after the long runqueue had finished.
>
> killing and restarting the proxy does not fix the problem. The only way I
> have discovered to resotre performance is a reboot of the firewall.
>
> test setup:
>
> client---firewall---server
>
> all boxes are 1.2GHz athlon, 512MB ram, IDE drive, D-link quad ethernet,
> 2GB swap space (although it never comes close to useing it)
>
> ethernet switches are cisco 3548
>
> firewall is running 4 proxies, each proxy accepts incoming connections,
> forks to handle that connection, relays the data to/from the server and
> then exits. (yes I know this is not the most efficiant way to do things,
> but this _is_ a real world use)
>
> I did a vmstat 1 on the firewall during the test and exerpts from it are
> listed below.
>
> server is running apache (maxclients set to 250 after prior runs noted
> that it ran up against the 150 limit)
>
> client is running ab (apache benchmark tool)
>
> testing procedure:
>
> 1. boot firewall
>
> 2. get performance data
> run ab -n 400 firewall:port/index.html.en twice (first to get past
> initalization delays, second to get a performace value)
> This does 400 sequential connections and reports the time to get the
> responses
>
> 3. flood the firewall
> run 5 copies of ab -n 5000 -c 200 firewall:port/index.html.en
> this does 5000 connections attempting to make up to 200 connections at a
> time
>
> 4. wait for all the connections to clear on the firewall (at least,
> possibly longer if I am in a meeting). The idea here is to return
> everything to a stable mode.
>
> 5. run ab -n 400 firewall:port/index.html and record the connections/sec
>
> loop back to 3.
>
> after 4 iterations of load reboot the firewall and do two more runs of ab
> -n 400 to make sure that performance is back up and the slowdown is not
> due to large logfiles anywhere.
>
> at the start of the test vmstat looks like this
>
>
> ab -n 200 (219 connections/sec) this is two runs sequentially
>    procs                      memory    swap          io     system         cpu
>  r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
>  0  0  0      0 496744    604   8076   0   0     0     0  101    11   0   0 100
>  1  0  1      0 496548    604   8124   0   0    48     0 3814  1759  36  41  24
>  0  0  0      0 496588    604   8124   0   0     0     0 4007  1899  42  34  25
>  0  0  0      0 496576    604   8128   0   0     0     0  103    16   0   0 100
>  1  0  0      0 496452    604   8128   0   0     0     0  924   405  11   6  83
>  2  0  0      0 496360    604   8128   0   0     0     0 4252  2015  41  40  20
>  0  0  0      0 496492    604   8128   0   0     0     0 2628  1240  26  21  53
>  0  0  0      0 496492    604   8128   0   0     0     0  101     7   0   0 100
>
>
> 5xab -n 5000 -c 200 (~210 connections/sec)
>    procs                      memory    swap          io     system         cpu
>  r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
> 19  0  1      0 493192    604   8124   0   0     0     0 5938  1139  55  45   0
> 13  0  1      0 492980    604   8124   0   0     0     0 4956  1059  58  42   0
> 11  0  1      0 492692    604   8124   0   0     0     0 4878  1057  45  55   0
> 17  0  1      0 492900    604   8124   0   0     0     0 4890  1072  57  43   0
> 29  0  1      0 492572    604   8124   0   0     0     0 4648  1049  46  54   0
> 13  0  1      0 492608    604   8124   0   0     0     0 4650  1041  50  50   0
> 11  0  1      0 492484    604   8124   0   0     0     0 4645  1043  38  62   0
> 18  0  1      0 492472    604   8128   0   0     0     0 4779  1029  56  44   0
> 17  0  1      0 492440    604   8128   0   0     0     0 4691  1057  46  54   0
> 18  0  1      0 492476    604   8128   0   0     0     0 4598  1074  54  46   0
> 18  0  1      0 492488    604   8128   0   0     0     0 4625  1051  53  47   0
> 22  0  1      0 492388    604   8128   0   0     0     0 4661  1057  50  50   0
> 10  0  1      0 492448    604   8128   0   0     0     0 4569  1033  56  44   0
> 22  0  1      0 492364    604   8128   0   0     0     0 4589  1036  48  52   0
> 18  0  1      0 492384    604   8128   0   0     0     0 4536  1031  48  52   0
> 15  0  1      0 492236    604   8128   0   0     0     0 4528  1034  43  57   0
> 26  0  1      0 492132    604   8128   0   0     0     0 4554  1037  50  50   0
> 24  0  1      0 492016    604   8128   0   0     0     0 4518  1037  48  52   0
>
> at the end of 4 runs
>
> 5xab -n 5000 -c 200 (~40 connections/sec)
>    procs                      memory    swap          io     system         cpu
>  r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
> 23  0  1      0 482356    624   8524   0   0     0     0 1053   181   7  93   0
> 29  0  1      0 481724    624   8528   0   0     0     0  960   206   7  93   0
> 23  0  1      0 482300    624   8528   0   0     0     0 1059   176   5  95   0
> 30  0  1      0 482104    624   8528   0   0     0    72 1095   197   9  91   0
> 19  0  1      0 482760    624   8528   0   0     0     0 1014   179   5  95   0
> 31  0  1      0 481892    624   8528   0   0     0     0 1010   198   6  94   0
> 16  0  1      0 482488    624   8524   0   0     0     0 1001   176   8  92   0
> 29  0  1      0 482236    624   8524   0   0     0     0 1037   179   9  91   0
> 12  0  1      0 483008    624   8528   0   0     0     0 1182   188   8  92   0
> 25  0  1      0 482620    624   8528   0   0     0     0  988   173   8  92   0
> 19  0  1      0 482696    624   8528   0   0     0     0  931   173   7  93   0
> 20  0  1      0 482776    624   8528   0   0     0     0  985   171   9  91   0
> 20  0  1      0 482116    624   8528   0   0     0     0 1122   119   5  95   0
> 21  0  1      0 482888    624   8528   0   0     0     0  830   144   3  97   0
> 16  0  1      0 481916    624   8528   0   0     0     0 1054   155   9  91   0
> 18  0  1      0 482892    624   8528   0   0     0     0  875   143  12  88   0
> 20  0  1      0 483100    624   8528   0   0     0     0  875   146   7  93   0
> 27  0  1      0 482428    624   8528   0   0     0     0  859   153   6  94   0
> 26  0  1      0 482688    624   8528   0   0     0     0  874   151   8  92   0
>
> ab -n 400 (~48 connections/sec) this is a single run
>    procs                      memory    swap          io     system         cpu
>  r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
>  0  0  0      0 483552    624   8544   0   0     0     0  103    13   0   0 100
>  2  0  0      0 483388    624   8544   0   0     0     0  592   225  10  38  52
>  1  0  0      0 483392    624   8544   0   0     0     0 1088   488  13  87   0
>  2  0  0      0 483392    624   8544   0   0     0     0 1108   468  10  89   1
>  1  0  0      0 483404    624   8548   0   0     0     0 1081   452   9  89   2
>  1  0  0      0 483508    624   8548   0   0     0     0 1096   486   7  90   3
>  2  0  1      0 483368    624   8548   0   0     0     0 1091   458  11  87   2
>  1  0  0      0 483364    624   8548   0   0     0     0 1085   480  13  85   2
>  1  0  0      0 483356    624   8552   0   0     0     0 1098   478   9  87   4
>  0  0  0      0 483520    624   8548   0   0     0     0  796   330   9  59  32
>  0  0  0      0 483520    624   8548   0   0     0     0  103    13   0   0 100
>
> Please tell me what additional things I should try to identify what the
> kernel is getting tied up doing.
>
> I was able to duplicate this problem with 2.4.5 as well as 2.4.13. the
> performance of 2.4.5 is ~4% lower then 2.4.13 in all cases.
>
> David Lang
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.4.13 (and 2.4.5) performance drops under load and does not recover
  2001-10-31 23:22           ` David Lang
@ 2001-11-01 21:37             ` David Lang
  2001-11-01 22:12               ` ip tables performace problem was " David Lang
  0 siblings, 1 reply; 10+ messages in thread
From: David Lang @ 2001-11-01 21:37 UTC (permalink / raw)
  To: torvalds; +Cc: linux-kernel, Rusty.Russell

testing this with iptables shows a similar performance drop for the first
few thousand connections, but then it stabalizes and does not get worse.
After the firewall is allowed to sit idle long enough for
/proc/net/ip_conntrack to drop back down to zero the performance recoveres
to the freshly booted state.

in doing this test I ran into the 32K limit of connection tracking and
bumped it to 327680 (just to make it large enough :-)

so the ipchains compatability layer appears to have a leak in it that
prevents it from recovering the memory used to track the connections.

under iptables is there a way to cause the tcp connections to timeout from
the ip_conntrack more rapidly? it appears that they sit in the tcp table
for the time_wait interval when closed and then sit in the ip_conntrack
table for an additional time_wait interval as well.

David Lang

On Wed, 31 Oct 2001, David Lang wrote:

> Date: Wed, 31 Oct 2001 15:22:03 -0800 (PST)
> From: David Lang <david.lang@digitalinsight.com>
> To: torvalds@transmeta.com
> Cc: linux-kernel@vger.kernel.org
> Subject: Re: 2.4.13 (and 2.4.5) performance drops under load and does not
>     recover
>
> This problem is also happening on 2.4.14pre6.
>
> however one thing that I did find
>
> in this firewall setup I am useing ipchains REDIRECT to map port 1433 on
> four IP addresses to ports 1434-1437 to get to the proxies.
>
> if I do the same test, but instead of having all 5 copies of ab hitting
> port 1433 I make them connect to the real port of the proxy the slowdown
> does not happen (I have run 5 iterations of the load test without any
> slowdown)
>
> So this slowdown appears to be in the ipchains emulation code or the
> iptables port mapping code.
>
> I think my next step is going to be to try to use iptables instead of
> ipchains to do the port mapping.
>
> David Lang
>
> On Wed, 31 Oct 2001, David Lang wrote:
>
> > Date: Wed, 31 Oct 2001 13:48:43 -0800 (PST)
> > From: David Lang <david.lang@digitalinsight.com>
> > To: torvalds@transmeta.com
> > Cc: linux-kernel@vger.kernel.org
> > Subject: 2.4.13 (and 2.4.5) performance drops under load and does not
> >     recover
> >
> > symptoms: when the firewall is freshly booted it supports >200
> > connections/sec, when the firewall is hit with lots of simultanious
> > requests it slows down, after the flood of requests stops and the machine
> > sits idle it never recovers back to it's origional performance. I can
> > drive the performance down to <50 connections/sec permanently. The CPU
> > is eaten up more and more by the system, leaving less time for userspace.
> >
> > the single threaded requests do outperform the massivly parallel test (as
> > can be expected as the scheduler isn't optimized for long runqueues :-) I
> > am not asking to optimize things for the large runqueue, just to be able
> > to recover after the long runqueue had finished.
> >
> > killing and restarting the proxy does not fix the problem. The only way I
> > have discovered to resotre performance is a reboot of the firewall.
> >
> > test setup:
> >
> > client---firewall---server
> >
> > all boxes are 1.2GHz athlon, 512MB ram, IDE drive, D-link quad ethernet,
> > 2GB swap space (although it never comes close to useing it)
> >
> > ethernet switches are cisco 3548
> >
> > firewall is running 4 proxies, each proxy accepts incoming connections,
> > forks to handle that connection, relays the data to/from the server and
> > then exits. (yes I know this is not the most efficiant way to do things,
> > but this _is_ a real world use)
> >
> > I did a vmstat 1 on the firewall during the test and exerpts from it are
> > listed below.
> >
> > server is running apache (maxclients set to 250 after prior runs noted
> > that it ran up against the 150 limit)
> >
> > client is running ab (apache benchmark tool)
> >
> > testing procedure:
> >
> > 1. boot firewall
> >
> > 2. get performance data
> > run ab -n 400 firewall:port/index.html.en twice (first to get past
> > initalization delays, second to get a performace value)
> > This does 400 sequential connections and reports the time to get the
> > responses
> >
> > 3. flood the firewall
> > run 5 copies of ab -n 5000 -c 200 firewall:port/index.html.en
> > this does 5000 connections attempting to make up to 200 connections at a
> > time
> >
> > 4. wait for all the connections to clear on the firewall (at least,
> > possibly longer if I am in a meeting). The idea here is to return
> > everything to a stable mode.
> >
> > 5. run ab -n 400 firewall:port/index.html and record the connections/sec
> >
> > loop back to 3.
> >
> > after 4 iterations of load reboot the firewall and do two more runs of ab
> > -n 400 to make sure that performance is back up and the slowdown is not
> > due to large logfiles anywhere.
> >
> > at the start of the test vmstat looks like this
> >
> >
> > ab -n 200 (219 connections/sec) this is two runs sequentially
> >    procs                      memory    swap          io     system         cpu
> >  r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
> >  0  0  0      0 496744    604   8076   0   0     0     0  101    11   0   0 100
> >  1  0  1      0 496548    604   8124   0   0    48     0 3814  1759  36  41  24
> >  0  0  0      0 496588    604   8124   0   0     0     0 4007  1899  42  34  25
> >  0  0  0      0 496576    604   8128   0   0     0     0  103    16   0   0 100
> >  1  0  0      0 496452    604   8128   0   0     0     0  924   405  11   6  83
> >  2  0  0      0 496360    604   8128   0   0     0     0 4252  2015  41  40  20
> >  0  0  0      0 496492    604   8128   0   0     0     0 2628  1240  26  21  53
> >  0  0  0      0 496492    604   8128   0   0     0     0  101     7   0   0 100
> >
> >
> > 5xab -n 5000 -c 200 (~210 connections/sec)
> >    procs                      memory    swap          io     system         cpu
> >  r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
> > 19  0  1      0 493192    604   8124   0   0     0     0 5938  1139  55  45   0
> > 13  0  1      0 492980    604   8124   0   0     0     0 4956  1059  58  42   0
> > 11  0  1      0 492692    604   8124   0   0     0     0 4878  1057  45  55   0
> > 17  0  1      0 492900    604   8124   0   0     0     0 4890  1072  57  43   0
> > 29  0  1      0 492572    604   8124   0   0     0     0 4648  1049  46  54   0
> > 13  0  1      0 492608    604   8124   0   0     0     0 4650  1041  50  50   0
> > 11  0  1      0 492484    604   8124   0   0     0     0 4645  1043  38  62   0
> > 18  0  1      0 492472    604   8128   0   0     0     0 4779  1029  56  44   0
> > 17  0  1      0 492440    604   8128   0   0     0     0 4691  1057  46  54   0
> > 18  0  1      0 492476    604   8128   0   0     0     0 4598  1074  54  46   0
> > 18  0  1      0 492488    604   8128   0   0     0     0 4625  1051  53  47   0
> > 22  0  1      0 492388    604   8128   0   0     0     0 4661  1057  50  50   0
> > 10  0  1      0 492448    604   8128   0   0     0     0 4569  1033  56  44   0
> > 22  0  1      0 492364    604   8128   0   0     0     0 4589  1036  48  52   0
> > 18  0  1      0 492384    604   8128   0   0     0     0 4536  1031  48  52   0
> > 15  0  1      0 492236    604   8128   0   0     0     0 4528  1034  43  57   0
> > 26  0  1      0 492132    604   8128   0   0     0     0 4554  1037  50  50   0
> > 24  0  1      0 492016    604   8128   0   0     0     0 4518  1037  48  52   0
> >
> > at the end of 4 runs
> >
> > 5xab -n 5000 -c 200 (~40 connections/sec)
> >    procs                      memory    swap          io     system         cpu
> >  r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
> > 23  0  1      0 482356    624   8524   0   0     0     0 1053   181   7  93   0
> > 29  0  1      0 481724    624   8528   0   0     0     0  960   206   7  93   0
> > 23  0  1      0 482300    624   8528   0   0     0     0 1059   176   5  95   0
> > 30  0  1      0 482104    624   8528   0   0     0    72 1095   197   9  91   0
> > 19  0  1      0 482760    624   8528   0   0     0     0 1014   179   5  95   0
> > 31  0  1      0 481892    624   8528   0   0     0     0 1010   198   6  94   0
> > 16  0  1      0 482488    624   8524   0   0     0     0 1001   176   8  92   0
> > 29  0  1      0 482236    624   8524   0   0     0     0 1037   179   9  91   0
> > 12  0  1      0 483008    624   8528   0   0     0     0 1182   188   8  92   0
> > 25  0  1      0 482620    624   8528   0   0     0     0  988   173   8  92   0
> > 19  0  1      0 482696    624   8528   0   0     0     0  931   173   7  93   0
> > 20  0  1      0 482776    624   8528   0   0     0     0  985   171   9  91   0
> > 20  0  1      0 482116    624   8528   0   0     0     0 1122   119   5  95   0
> > 21  0  1      0 482888    624   8528   0   0     0     0  830   144   3  97   0
> > 16  0  1      0 481916    624   8528   0   0     0     0 1054   155   9  91   0
> > 18  0  1      0 482892    624   8528   0   0     0     0  875   143  12  88   0
> > 20  0  1      0 483100    624   8528   0   0     0     0  875   146   7  93   0
> > 27  0  1      0 482428    624   8528   0   0     0     0  859   153   6  94   0
> > 26  0  1      0 482688    624   8528   0   0     0     0  874   151   8  92   0
> >
> > ab -n 400 (~48 connections/sec) this is a single run
> >    procs                      memory    swap          io     system         cpu
> >  r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
> >  0  0  0      0 483552    624   8544   0   0     0     0  103    13   0   0 100
> >  2  0  0      0 483388    624   8544   0   0     0     0  592   225  10  38  52
> >  1  0  0      0 483392    624   8544   0   0     0     0 1088   488  13  87   0
> >  2  0  0      0 483392    624   8544   0   0     0     0 1108   468  10  89   1
> >  1  0  0      0 483404    624   8548   0   0     0     0 1081   452   9  89   2
> >  1  0  0      0 483508    624   8548   0   0     0     0 1096   486   7  90   3
> >  2  0  1      0 483368    624   8548   0   0     0     0 1091   458  11  87   2
> >  1  0  0      0 483364    624   8548   0   0     0     0 1085   480  13  85   2
> >  1  0  0      0 483356    624   8552   0   0     0     0 1098   478   9  87   4
> >  0  0  0      0 483520    624   8548   0   0     0     0  796   330   9  59  32
> >  0  0  0      0 483520    624   8548   0   0     0     0  103    13   0   0 100
> >
> > Please tell me what additional things I should try to identify what the
> > kernel is getting tied up doing.
> >
> > I was able to duplicate this problem with 2.4.5 as well as 2.4.13. the
> > performance of 2.4.5 is ~4% lower then 2.4.13 in all cases.
> >
> > David Lang
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> >
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* ip tables performace problem was Re: 2.4.13 (and 2.4.5) performance drops under load and does not recover
  2001-11-01 21:37             ` David Lang
@ 2001-11-01 22:12               ` David Lang
  0 siblings, 0 replies; 10+ messages in thread
From: David Lang @ 2001-11-01 22:12 UTC (permalink / raw)
  To: Rusty.Russell; +Cc: linux-kernel

in further investigation of my options for this firewall I considered
replacing the user-space proxy with NAT (with the idea that it would be
faster then the fork-for-every-connection proxy)

to do this I reconfigured the setup below to forward packets through the
firewall and setup routes on the client and server to talk to each other.

I then started my flood of http requests from the client to the server (5x
sb -n 5000 -c 200)

with 2.4.14pre6 iptables, connection tracking turned on, no NAT rules used
performance was ~1800 connections/sec for the first few seconds (about
the limits of the server) and then plummeted down to between 40-120
connections/sec (very sporatic) and the firewall bogged down so badly that
a vmstat 1 was dropping 2 out of every 3 seconds. doing a wc -l on
/proc/net/ip_conntrack (which took 45 or so seconds to complete) reported
~17000 (doing this in the test from my last e-mail with the proxy reported
~38000) and vmstat was showing a consistant 100% system.

with 2.4.14pre6 with no iptables compiled into the system and doing the
same test maintained a steady ~1800 connections/sec while useing ~50% of
the CPU on the firewall

so there is a serious bottleneck in the connection tracking code that
makes it so that under load it's better to run the ancient user-space
proxy then to use the iptables NAT.

David Lang



 On Thu, 1 Nov 2001, David Lang wrote:

> Date: Thu, 1 Nov 2001 13:37:05 -0800 (PST)
> From: David Lang <david.lang@digitalinsight.com>
> To: torvalds@transmeta.com
> Cc: linux-kernel@vger.kernel.org, Rusty.Russell@rustcorp.com.au
> Subject: Re: 2.4.13 (and 2.4.5) performance drops under load and does not
>     recover
>
> testing this with iptables shows a similar performance drop for the first
> few thousand connections, but then it stabalizes and does not get worse.
> After the firewall is allowed to sit idle long enough for
> /proc/net/ip_conntrack to drop back down to zero the performance recoveres
> to the freshly booted state.
>
> in doing this test I ran into the 32K limit of connection tracking and
> bumped it to 327680 (just to make it large enough :-)
>
> so the ipchains compatability layer appears to have a leak in it that
> prevents it from recovering the memory used to track the connections.
>
> under iptables is there a way to cause the tcp connections to timeout from
> the ip_conntrack more rapidly? it appears that they sit in the tcp table
> for the time_wait interval when closed and then sit in the ip_conntrack
> table for an additional time_wait interval as well.
>
> David Lang
>
> On Wed, 31 Oct 2001, David Lang wrote:
>
> > Date: Wed, 31 Oct 2001 15:22:03 -0800 (PST)
> > From: David Lang <david.lang@digitalinsight.com>
> > To: torvalds@transmeta.com
> > Cc: linux-kernel@vger.kernel.org
> > Subject: Re: 2.4.13 (and 2.4.5) performance drops under load and does not
> >     recover
> >
> > This problem is also happening on 2.4.14pre6.
> >
> > however one thing that I did find
> >
> > in this firewall setup I am useing ipchains REDIRECT to map port 1433 on
> > four IP addresses to ports 1434-1437 to get to the proxies.
> >
> > if I do the same test, but instead of having all 5 copies of ab hitting
> > port 1433 I make them connect to the real port of the proxy the slowdown
> > does not happen (I have run 5 iterations of the load test without any
> > slowdown)
> >
> > So this slowdown appears to be in the ipchains emulation code or the
> > iptables port mapping code.
> >
> > I think my next step is going to be to try to use iptables instead of
> > ipchains to do the port mapping.
> >
> > David Lang
> >
> > On Wed, 31 Oct 2001, David Lang wrote:
> >
> > > Date: Wed, 31 Oct 2001 13:48:43 -0800 (PST)
> > > From: David Lang <david.lang@digitalinsight.com>
> > > To: torvalds@transmeta.com
> > > Cc: linux-kernel@vger.kernel.org
> > > Subject: 2.4.13 (and 2.4.5) performance drops under load and does not
> > >     recover
> > >
> > > symptoms: when the firewall is freshly booted it supports >200
> > > connections/sec, when the firewall is hit with lots of simultanious
> > > requests it slows down, after the flood of requests stops and the machine
> > > sits idle it never recovers back to it's origional performance. I can
> > > drive the performance down to <50 connections/sec permanently. The CPU
> > > is eaten up more and more by the system, leaving less time for userspace.
> > >
> > > the single threaded requests do outperform the massivly parallel test (as
> > > can be expected as the scheduler isn't optimized for long runqueues :-) I
> > > am not asking to optimize things for the large runqueue, just to be able
> > > to recover after the long runqueue had finished.
> > >
> > > killing and restarting the proxy does not fix the problem. The only way I
> > > have discovered to resotre performance is a reboot of the firewall.
> > >
> > > test setup:
> > >
> > > client---firewall---server
> > >
> > > all boxes are 1.2GHz athlon, 512MB ram, IDE drive, D-link quad ethernet,
> > > 2GB swap space (although it never comes close to useing it)
> > >
> > > ethernet switches are cisco 3548
> > >
> > > firewall is running 4 proxies, each proxy accepts incoming connections,
> > > forks to handle that connection, relays the data to/from the server and
> > > then exits. (yes I know this is not the most efficiant way to do things,
> > > but this _is_ a real world use)
> > >
> > > I did a vmstat 1 on the firewall during the test and exerpts from it are
> > > listed below.
> > >
> > > server is running apache (maxclients set to 250 after prior runs noted
> > > that it ran up against the 150 limit)
> > >
> > > client is running ab (apache benchmark tool)
> > >
> > > testing procedure:
> > >
> > > 1. boot firewall
> > >
> > > 2. get performance data
> > > run ab -n 400 firewall:port/index.html.en twice (first to get past
> > > initalization delays, second to get a performace value)
> > > This does 400 sequential connections and reports the time to get the
> > > responses
> > >
> > > 3. flood the firewall
> > > run 5 copies of ab -n 5000 -c 200 firewall:port/index.html.en
> > > this does 5000 connections attempting to make up to 200 connections at a
> > > time
> > >
> > > 4. wait for all the connections to clear on the firewall (at least,
> > > possibly longer if I am in a meeting). The idea here is to return
> > > everything to a stable mode.
> > >
> > > 5. run ab -n 400 firewall:port/index.html and record the connections/sec
> > >
> > > loop back to 3.
> > >
> > > after 4 iterations of load reboot the firewall and do two more runs of ab
> > > -n 400 to make sure that performance is back up and the slowdown is not
> > > due to large logfiles anywhere.
> > >
> > > at the start of the test vmstat looks like this
> > >
> > >
> > > ab -n 200 (219 connections/sec) this is two runs sequentially
> > >    procs                      memory    swap          io     system         cpu
> > >  r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
> > >  0  0  0      0 496744    604   8076   0   0     0     0  101    11   0   0 100
> > >  1  0  1      0 496548    604   8124   0   0    48     0 3814  1759  36  41  24
> > >  0  0  0      0 496588    604   8124   0   0     0     0 4007  1899  42  34  25
> > >  0  0  0      0 496576    604   8128   0   0     0     0  103    16   0   0 100
> > >  1  0  0      0 496452    604   8128   0   0     0     0  924   405  11   6  83
> > >  2  0  0      0 496360    604   8128   0   0     0     0 4252  2015  41  40  20
> > >  0  0  0      0 496492    604   8128   0   0     0     0 2628  1240  26  21  53
> > >  0  0  0      0 496492    604   8128   0   0     0     0  101     7   0   0 100
> > >
> > >
> > > 5xab -n 5000 -c 200 (~210 connections/sec)
> > >    procs                      memory    swap          io     system         cpu
> > >  r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
> > > 19  0  1      0 493192    604   8124   0   0     0     0 5938  1139  55  45   0
> > > 13  0  1      0 492980    604   8124   0   0     0     0 4956  1059  58  42   0
> > > 11  0  1      0 492692    604   8124   0   0     0     0 4878  1057  45  55   0
> > > 17  0  1      0 492900    604   8124   0   0     0     0 4890  1072  57  43   0
> > > 29  0  1      0 492572    604   8124   0   0     0     0 4648  1049  46  54   0
> > > 13  0  1      0 492608    604   8124   0   0     0     0 4650  1041  50  50   0
> > > 11  0  1      0 492484    604   8124   0   0     0     0 4645  1043  38  62   0
> > > 18  0  1      0 492472    604   8128   0   0     0     0 4779  1029  56  44   0
> > > 17  0  1      0 492440    604   8128   0   0     0     0 4691  1057  46  54   0
> > > 18  0  1      0 492476    604   8128   0   0     0     0 4598  1074  54  46   0
> > > 18  0  1      0 492488    604   8128   0   0     0     0 4625  1051  53  47   0
> > > 22  0  1      0 492388    604   8128   0   0     0     0 4661  1057  50  50   0
> > > 10  0  1      0 492448    604   8128   0   0     0     0 4569  1033  56  44   0
> > > 22  0  1      0 492364    604   8128   0   0     0     0 4589  1036  48  52   0
> > > 18  0  1      0 492384    604   8128   0   0     0     0 4536  1031  48  52   0
> > > 15  0  1      0 492236    604   8128   0   0     0     0 4528  1034  43  57   0
> > > 26  0  1      0 492132    604   8128   0   0     0     0 4554  1037  50  50   0
> > > 24  0  1      0 492016    604   8128   0   0     0     0 4518  1037  48  52   0
> > >
> > > at the end of 4 runs
> > >
> > > 5xab -n 5000 -c 200 (~40 connections/sec)
> > >    procs                      memory    swap          io     system         cpu
> > >  r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
> > > 23  0  1      0 482356    624   8524   0   0     0     0 1053   181   7  93   0
> > > 29  0  1      0 481724    624   8528   0   0     0     0  960   206   7  93   0
> > > 23  0  1      0 482300    624   8528   0   0     0     0 1059   176   5  95   0
> > > 30  0  1      0 482104    624   8528   0   0     0    72 1095   197   9  91   0
> > > 19  0  1      0 482760    624   8528   0   0     0     0 1014   179   5  95   0
> > > 31  0  1      0 481892    624   8528   0   0     0     0 1010   198   6  94   0
> > > 16  0  1      0 482488    624   8524   0   0     0     0 1001   176   8  92   0
> > > 29  0  1      0 482236    624   8524   0   0     0     0 1037   179   9  91   0
> > > 12  0  1      0 483008    624   8528   0   0     0     0 1182   188   8  92   0
> > > 25  0  1      0 482620    624   8528   0   0     0     0  988   173   8  92   0
> > > 19  0  1      0 482696    624   8528   0   0     0     0  931   173   7  93   0
> > > 20  0  1      0 482776    624   8528   0   0     0     0  985   171   9  91   0
> > > 20  0  1      0 482116    624   8528   0   0     0     0 1122   119   5  95   0
> > > 21  0  1      0 482888    624   8528   0   0     0     0  830   144   3  97   0
> > > 16  0  1      0 481916    624   8528   0   0     0     0 1054   155   9  91   0
> > > 18  0  1      0 482892    624   8528   0   0     0     0  875   143  12  88   0
> > > 20  0  1      0 483100    624   8528   0   0     0     0  875   146   7  93   0
> > > 27  0  1      0 482428    624   8528   0   0     0     0  859   153   6  94   0
> > > 26  0  1      0 482688    624   8528   0   0     0     0  874   151   8  92   0
> > >
> > > ab -n 400 (~48 connections/sec) this is a single run
> > >    procs                      memory    swap          io     system         cpu
> > >  r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
> > >  0  0  0      0 483552    624   8544   0   0     0     0  103    13   0   0 100
> > >  2  0  0      0 483388    624   8544   0   0     0     0  592   225  10  38  52
> > >  1  0  0      0 483392    624   8544   0   0     0     0 1088   488  13  87   0
> > >  2  0  0      0 483392    624   8544   0   0     0     0 1108   468  10  89   1
> > >  1  0  0      0 483404    624   8548   0   0     0     0 1081   452   9  89   2
> > >  1  0  0      0 483508    624   8548   0   0     0     0 1096   486   7  90   3
> > >  2  0  1      0 483368    624   8548   0   0     0     0 1091   458  11  87   2
> > >  1  0  0      0 483364    624   8548   0   0     0     0 1085   480  13  85   2
> > >  1  0  0      0 483356    624   8552   0   0     0     0 1098   478   9  87   4
> > >  0  0  0      0 483520    624   8548   0   0     0     0  796   330   9  59  32
> > >  0  0  0      0 483520    624   8548   0   0     0     0  103    13   0   0 100
> > >
> > > Please tell me what additional things I should try to identify what the
> > > kernel is getting tied up doing.
> > >
> > > I was able to duplicate this problem with 2.4.5 as well as 2.4.13. the
> > > performance of 2.4.5 is ~4% lower then 2.4.13 in all cases.
> > >
> > > David Lang
> > > -
> > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > Please read the FAQ at  http://www.tux.org/lkml/
> > >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> >
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: pre6 BUG oops
  2001-10-31 19:53       ` Linus Torvalds
  2001-10-31 21:48         ` 2.4.13 (and 2.4.5) performance drops under load and does not recover David Lang
@ 2001-11-03 20:46         ` Richard Henderson
  1 sibling, 0 replies; 10+ messages in thread
From: Richard Henderson @ 2001-11-03 20:46 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

On Wed, Oct 31, 2001 at 07:53:37PM +0000, Linus Torvalds wrote:
> I wonder if the "VALID_PAGE(page)" macro is reliable on alpha? You seem
> to be able to trigger this too easily for it to not being something
> specific to the setup..

Should be.  I've never seen an alpha with discontiguous memory.
(Well, I've not looked at a numa box, but that's not at issue here.)

FWIW, we dump the HWRPB memory table early in the boot process, e.g.

memcluster 0, usage 1, start        0, end      192
memcluster 1, usage 0, start      192, end    16267
memcluster 2, usage 1, start    16267, end    16384

where "usage 1" is reserved by the console.


r~

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2001-11-03 20:47 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-10-31 17:25 pre6 BUG oops Jeff Garzik
2001-10-31 17:41 ` David S. Miller
2001-10-31 18:19   ` Linus Torvalds
2001-10-31 18:30     ` Jeff Garzik
2001-10-31 19:53       ` Linus Torvalds
2001-10-31 21:48         ` 2.4.13 (and 2.4.5) performance drops under load and does not recover David Lang
2001-10-31 23:22           ` David Lang
2001-11-01 21:37             ` David Lang
2001-11-01 22:12               ` ip tables performace problem was " David Lang
2001-11-03 20:46         ` pre6 BUG oops Richard Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).