All of lore.kernel.org
 help / color / mirror / Atom feed
* Poor UDP performance using 2.6.21-rc5-rt5
@ 2007-04-01 19:15 Dave Sperry
  2007-04-01 20:07 ` Nivedita Singhvi
  2007-04-02  7:21 ` Ingo Molnar
  0 siblings, 2 replies; 16+ messages in thread
From: Dave Sperry @ 2007-04-01 19:15 UTC (permalink / raw)
  To: linux-rt-users; +Cc: linux-kernel

Hi
I have a dual core Opteron machine that exhibits poor UDP performance 
(RT consumes more than 2X cpu) with the 2.6.21-rc5-rt5 as compared to 
2.6.21-rc5. Top shows the IRQ handler consuming a lot of CPU.

The mother board is a Supermicro H8DME-2 with one dual core Opteron 
installed. The networking is provided by the on board nVidia MCP55Pro chip.

The RT test is done using netperf 2.4.3 with the server on an IBM LS20 
blade running RHEL4U2 and the Supermicro running netperf under RHEL5 
with 2.6.21-rc5-rt5. 

The Non-RT test was done on the exact same setup except 2.6.21-rc5-rt5 
was loaded on the SuperMicro board.

Cyclesoak was used to measure CPU utilization in all cases.



Here are the RT results
########################################################3
## 2.6.21-rc5-rt5
#######################################################
$ !netper
netperf -l 100 -H 192.168.70.11 -t UDP_STREAM -- -m 1025
UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
192.168.70.11 (192.168.70.11) port 0 AF_INET
Socket  Message  Elapsed      Messages
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

126976    1025   100.00    8676376      0     711.46
135168           100.00    8676376            711.46

########## cyclesoak during test
$ ./cyclesoak
using 2 CPUs
System load: -0.1%
System load: 40.5%
System load: 51.6%
System load: 51.5%
System load: 50.9%
System load: 50.7%
System load: 50.8%
System load: 50.7%
System load: 50.6%

######## top during test
top - 13:26:48 up 8 min,  4 users,  load average: 1.74, 0.46, 0.15
Tasks: 149 total,   4 running, 145 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.7%us, 16.8%sy, 50.6%ni,  0.0%id,  0.0%wa, 25.6%hi,  6.3%si,  
0.0%st
Mem:   2035444k total,   465888k used,  1569556k free,    28840k buffers
Swap:  3068372k total,        0k used,  3068372k free,   318668k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 3865 eadi      39  19  6804 1164  108 R  100  0.1   0:38.25 cyclesoak
 2715 root     -51  -5     0    0    0 S   51  0.0   0:09.52 IRQ-8406
 3867 eadi      25   0  6440  632  480 R   34  0.0   0:06.03 netperf       
   19 root     -51   0     0    0    0 S   13  0.0   0:02.33 softirq-net-tx/
 3866 eadi      39  19  6804 1164  108 R    1  0.1   0:20.47 cyclesoak
 3167 root      25   0 29888 1180  888 S    0  0.1   0:00.93 automount
 3861 eadi      15   0 12712 1076  788 R    0  0.1   0:00.19 top
    1 root      18   0 10308  668  552 S    0  0.0   0:00.67 init      
    2 root      RT   0     0    0    0 S    0  0.0   0:00.00 migration/0   
    3 root      RT   0     0    0    0 S    0  0.0   0:00.00 posix_cpu_timer
    4 root     -51   0     0    0    0 S    0  0.0   0:00.00 softirq-high/0
    5 root     -51   0     0    0    0 S    0  0.0   0:00.00 softirq-timer/0
    6 root     -51   0     0    0    0 S    0  0.0   0:00.00 softirq-net-tx/
    7 root     -51   0     0    0    0 S    0  0.0   0:00.00 softirq-net-rx/
    8 root     -51   0     0    0    0 S    0  0.0   0:00.00 softirq-block/0
 
########################
The baseline results:
RHEL5 with 2.6.21-rc5 kernel
##############################

$  netperf -l 100 -H 192.168.70.11 -t UDP_STREAM -- -m 1025

UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
192.168.70.11 (192.168.70.11) port 0 AF_INET
Socket  Message  Elapsed      Messages
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

126976    1025   100.00    11405485      0     935.24
135168           100.00    11405485            935.24

#######################################
$ ./cyclesoak
using 2 CPUs
System load:  7.6%
System load: 29.6%
System load: 29.6%
System load: 28.9%
System load: 24.9%
System load: 25.0%
System load: 24.8%
System load: 24.9%

#######################################
top:top - 13:52:22 up 10 min,  6 users,  load average: 1.46, 0.43, 0.17
Tasks: 118 total,   4 running, 114 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.5%us,  9.8%sy, 75.7%ni,  0.0%id,  0.0%wa,  5.8%hi,  8.1%si,  
0.0%st
Mem:   2057200k total,   459128k used,  1598072k free,    29020k buffers
Swap:  3068372k total,        0k used,  3068372k free,   318968k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 3882 eadi      39  19  6804 1164  108 R  100  0.1   0:52.11 cyclesoak
 3881 eadi      39  19  6804 1164  108 R   65  0.1   0:38.47 cyclesoak
 3883 eadi      15   0  6436  632  480 R   35  0.0   0:18.26 netperf
 3879 eadi      15   0 12580 1052  788 R    0  0.1   0:00.15 top
    1 root      18   0 10308  664  552 S    0  0.0   0:00.48 init      
    2 root      RT   0     0    0    0 S    0  0.0   0:00.00 migration/0
    3 root      34  19     0    0    0 S    0  0.0   0:00.01 ksoftirqd/0
    4 root      RT   0     0    0    0 S    0  0.0   0:00.00 watchdog/0
    5 root      RT   0     0    0    0 S    0  0.0   0:00.00 migration/1
    6 root      34  19     0    0    0 S    0  0.0   0:00.00 ksoftirqd/1
 
Any thoughts on how to fix this?

Thanks,
-Dave












^ permalink raw reply	[flat|nested] 16+ messages in thread
* Re: Poor UDP performance using 2.6.21-rc5-rt5
@ 2007-04-02 14:09 dave_sperry@ieee.org
  2007-04-02 14:23 ` Ingo Molnar
  0 siblings, 1 reply; 16+ messages in thread
From: dave_sperry@ieee.org @ 2007-04-02 14:09 UTC (permalink / raw)
  To: Ingo Molnar, Dave Sperry; +Cc: linux-rt-users, linux-kernel

Thanks for all the input Ingo, Here's a list of all the permutations I've tried:

setup                    Thruput      CPU% from cyclesoak
2.6.21-rc5 vanilla       935          29%

2.6.21-rc5-rt5           711          50% //basically all of 1 cpu

2.6.21-rc5-rt8           733          52%

2.6.21-rc5-rt8           824          64%
   netperf @50        
   hardirq @50
   softirq @50

2.6.21-rc5-rt8           937          74%
   netperf @51
   hardirq @50
   softirq @50

2.6.21-rc5-rt8           106          8%
   netperf @51
   hardirq @49
   softirq @50

2.6.21-rc5-rt8           233          14%
   netperf @51
   hardirq @49
   softirq @48

2.6.21-rc5-rt8           67           5%
   netperf @batch
   hardirq @batch
   softirq @batch

2.6.21-rc5-rt8           331           OFF
   netperf @batch
   hardirq @batch
   softirq @batch
   cyclesoak off   


Any thoughts?

-Dave



 -------------- Original message ----------------------
From: Ingo Molnar <mingo@elte.hu>
> 
> * Dave Sperry <dave_sperry@ieee.org> wrote:
> 
> > I checked the clock source and in both the vanilla and rt cases and 
> > they were both acpi_pm
> 
> ok, thanks for double-checking that.
> 
> > Here's the oprofile for my vanilla case:
> 
> i tried your workload and i think i managed to optimize it some more: i 
> have uploaded the -rt8 kernel with these improvements included - could 
> you try it? Is there any measurable improvement relative to -rt5?
> 
> one more thing to improve netperf performance is to do this before 
> running it:
> 
>   chrt -f -p 50 $$
> 
> this will put netperf on the same priority level as the net hardirq and 
> the net softirq (which both default to SCHED_FIFO:50), and should result 
> in a (much) reduced context-switch rate.
> 
> Or, if networking is not latency-critical, then you could move the net 
> hardirq and softirq threads to SCHED_BATCH, and run netperf under 
> SCHED_BATCH as well, using:
> 
>   chrt -b -p 0 $$
> 
> and figuring out the active softirq hardirq thread PIDs and "chrt -b" 
> -ing them too.
> 
> 	Ingo


^ permalink raw reply	[flat|nested] 16+ messages in thread
* Re: Poor UDP performance using 2.6.21-rc5-rt5
@ 2007-04-02 17:17 dave_sperry@ieee.org
  0 siblings, 0 replies; 16+ messages in thread
From: dave_sperry@ieee.org @ 2007-04-02 17:17 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Dave Sperry, linux-rt-users, linux-kernel


 -------------- Original message ----------------------
From: Ingo Molnar <mingo@elte.hu>
> 
> * dave_sperry@ieee.org <dasperry@comcast.net> wrote:
> 
> > Thanks for all the input Ingo, Here's a list of all the permutations 
> > I've tried:
> 
> one thing i noticed is that cyclesoak interferes with the netperf 
> workload. This is quite surprising. 'top' and 'vmstat' output is 
> reliable on -rt, and the overhead goes down if i stop cyclesoak.
> 
> 	Ingo

I see the same effect 

setup                    Thruput      CPU% 
cyclesoak on
2.6.21-rc5-rt8           937          74%
   netperf @51
   hardirq @50
   softirq @50

cyclesoak off
2.6.21-rc5-rt8           938          65%
   netperf @51
   hardirq @50
   softirq @50


Dave

^ permalink raw reply	[flat|nested] 16+ messages in thread
* Re: Poor UDP performance using 2.6.21-rc5-rt5
@ 2007-04-02 17:50 dave_sperry@ieee.org
  2007-04-02 19:04 ` Ingo Molnar
  0 siblings, 1 reply; 16+ messages in thread
From: dave_sperry@ieee.org @ 2007-04-02 17:50 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Dave Sperry, linux-rt-users, linux-kernel

I have a new data point to add to the confusion.

The box I'm testing has 2 nics in it. The one I have been testing so far is:

00:08.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3)

The other NIC is an Intel 1gig fiber 

09:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06)

The Intel controller did not work 2.6.20 but under 2.6.21-rc5-rt8 it does.

Running the same netperf tests with the Intel NIC produces very different results for the RT case:


setup                    Thruput      CPU% 
Nvidia
2.6.21-rc5 vanilla       935          29%
   netperf @51
   hardirq @50
   softirq @50

Intel
2.6.21-rc5 vanilla       933          30%
   netperf @51
   hardirq @50
   softirq @50

##################################################
Nvidia
2.6.21-rc5-rt8           938          65%
   netperf @51
   hardirq @50
   softirq @50

Intel
2.6.21-rc5-rt8           938          34%
   netperf @51
   hardirq @50
   softirq @50

The Intel NIC seems to behave better under RT 

top for each NIC under test yields some interesting results:

Nvidia 2.6.21-rc5-rt8:
top - 11:03:46 up  1:23,  2 users,  load average: 1.75, 1.31, 0.70
Tasks: 167 total,   2 running, 165 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.8%us, 29.7%sy,  0.0%ni, 34.8%id,  0.0%wa, 22.1%hi, 10.6%si,  0.0%st
Mem:   2035436k total,   618276k used,  1417160k free,    31744k buffers
Swap:  3068372k total,        0k used,  3068372k free,   386060k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 4139 root     -52   0  6440  636  480 S   60  0.0   0:19.14 netperf
 2706 root     -51  -5     0    0    0 R   44  0.0   2:00.70 IRQ-8406
   19 root     -51   0     0    0    0 S   21  0.0   0:38.50 softirq-net-tx/
 4012 eadi      16   0  229m 6852 5584 S    1  0.3   0:05.39 multiload-apple
    6 root     -51   0     0    0    0 S    1  0.0   0:00.95 softirq-net-tx/
 3014 dbus      15   0 21352  896  548 S    1  0.0   0:00.04 dbus-daemon
    1 root      15   0 10308  668  552 S    0  0.0   0:00.74 init
    2 root      RT   0     0    0    0 S    0  0.0   0:00.00 migration/0
    3 root      RT   0     0    0    0 S    0  0.0   0:00.00 posix_cpu_timer
    4 root     -51   0     0    0    0 S    0  0.0   0:00.00 softirq-high/0
    5 root     -51   0     0    0    0 S    0  0.0   0:00.00 softirq-timer/0
    7 root     -51   0     0    0    0 S    0  0.0   0:00.00 softirq-net-rx/
    8 root     -51   0     0    0    0 S    0  0.0   0:00.01 softirq-block/0

########################
Intel 2.6.21-rc5-rt8
top - 11:10:27 up 3 min,  2 users,  load average: 0.00, 0.00, 0.00
Tasks: 167 total,   1 running, 166 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.5%us, 21.6%sy,  0.0%ni, 65.9%id,  0.0%wa,  3.0%hi,  9.0%si,  0.0%st
Mem:   2035436k total,   618012k used,  1417424k free,    29972k buffers
Swap:  3068372k total,        0k used,  3068372k free,   386084k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 3955 root     -52   0  6436  636  480 S   43  0.0   0:19.74 netperf
   20 root     -51   0     0    0    0 S   11  0.0   0:04.86 softirq-net-rx/
    7 root     -51   0     0    0    0 S    7  0.0   0:03.28 softirq-net-rx/
 2370 root     -51  -5     0    0    0 S    6  0.0   0:02.62 IRQ-8408
 3858 eadi      16   0  229m 6848 5584 S    1  0.3   0:01.45 multiload-apple
 3954 eadi      15   0 12712 1096  788 R    0  0.1   0:00.19 top
    1 root      15   0 10304  664  552 S    0  0.0   0:00.75 init
    2 root      RT   0     0    0    0 S    0  0.0   0:00.00 migration/0
    3 root      RT   0     0    0    0 S    0  0.0   0:00.00 posix_cpu_timer
    4 root     -51   0     0    0    0 S    0  0.0   0:00.00 softirq-high/0
    5 root     -51   0     0    0    0 S    0  0.0   0:00.00 softirq-timer/0
    6 root     -51   0     0    0    0 S    0  0.0   0:00.00 softirq-net-tx/
    8 root     -51   0     0    0    0 S    0  0.0   0:00.01 softirq-block/0
    9 root     -51   0     0    0    0 S    0  0.0   0:00.00 softirq-tasklet
   10 root     -51   0     0    0    0 S    0  0.0   0:00.00 softirq-sched/0
############################

I think there is some kind of bad behavior happening in the Nvidia driver
 with respect to softirq-net-tx and IRQ-8406.

Any thoughts? If you want I can post the oprofile from both test cases.

-Dave


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2007-04-03  8:51 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-04-01 19:15 Poor UDP performance using 2.6.21-rc5-rt5 Dave Sperry
2007-04-01 20:07 ` Nivedita Singhvi
2007-04-01 22:00   ` Dave Sperry
2007-04-02  5:55   ` Ingo Molnar
2007-04-02  6:30     ` Ingo Molnar
2007-04-02  7:21 ` Ingo Molnar
2007-04-02  8:17   ` Dave Sperry
2007-04-02  9:37     ` Ingo Molnar
2007-04-02 14:09 dave_sperry@ieee.org
2007-04-02 14:23 ` Ingo Molnar
2007-04-02 17:17 dave_sperry@ieee.org
2007-04-02 17:50 dave_sperry@ieee.org
2007-04-02 19:04 ` Ingo Molnar
2007-04-03  0:09   ` David Sperry
2007-04-03  6:43     ` Ingo Molnar
2007-04-03  8:51     ` Ingo Molnar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.