All of lore.kernel.org
 help / color / mirror / Atom feed
* IPoIB performance benchmarking
@ 2010-04-12 18:35 Tom Ammon
       [not found] ` <4BC367DD.30606-wbocuHtxKic@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Tom Ammon @ 2010-04-12 18:35 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Brian Haymore

Hi,

I'm trying to do some performance benchmarking of IPoIB on a DDR IB 
cluster, and I am having a hard time understanding what I am seeing.

When I do a simple netperf, I get results like these:

[root@gateway3 ~]# netperf -H 192.168.23.252
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.23.252 
(192.168.23.252) port 0 AF_INET
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

  87380  65536  65536    10.01    4577.70


Which is disappointing since it is simply two DDR IB-connected nodes 
plugged in to a DDR switch - I would expect much higher throughput than 
that. When I do a test with ibv_srq_pingpong (using the same message 
size reported above), here's what I get:

[root@gateway3 ~]# ibv_srq_pingpong 192.168.23.252 -m 4096 -s 65536
   local address:  LID 0x012b, QPN 0x000337, PSN 0x19cc85
   local address:  LID 0x012b, QPN 0x000338, PSN 0x956fc2
...
[output omitted]
...
   remote address: LID 0x0129, QPN 0x00032e, PSN 0x891ce3
131072000 bytes in 0.08 seconds = 12763.08 Mbit/sec
1000 iters in 0.08 seconds = 82.16 usec/iter

Which is much closer to what I would expect with DDR.

The MTU on both of the QLogic DDR HCAs is set to 4096, as it is on the 
QLogic switch.

I know the above is not completely apples-to-apples, since the 
ibv_srq_pingpong is layer2 and is using 16 QPs. So I ran it again with 
only a single QP, to make it more roughly equivalent of my single-stream 
netperf test, and I still get almost double the performance:

[root@gateway3 ~]# ibv_srq_pingpong 192.168.23.252 -m 4096 -s 65536 -q 1
   local address:  LID 0x012b, QPN 0x000347, PSN 0x65fb56
   remote address: LID 0x0129, QPN 0x00032f, PSN 0x5e52f9
131072000 bytes in 0.13 seconds = 8323.22 Mbit/sec
1000 iters in 0.13 seconds = 125.98 usec/iter


Is there something that I am not understanding, here? Is there any way 
to make single-stream TCP IPoIB performance better than 4.5Gb/s on a DDR 
network? Am I just not using the benchmarking tools correctly?

Thanks,

Tom

-- 

--------------------------------------------------------------------
Tom Ammon
Network Engineer
Office: 801.587.0976
Mobile: 801.674.9273

Center for High Performance Computing
University of Utah
http://www.chpc.utah.edu

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: IPoIB performance benchmarking
       [not found] ` <4BC367DD.30606-wbocuHtxKic@public.gmane.org>
@ 2010-04-12 20:19   ` Dave Olson
       [not found]     ` <alpine.LFD.1.10.1004121317270.21537-vxnkQ4oxbxUi9g6yJnKVd0EOCMrvLtNR@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Dave Olson @ 2010-04-12 20:19 UTC (permalink / raw)
  To: Tom Ammon; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Brian Haymore

On Mon, 12 Apr 2010, Tom Ammon wrote:
| I'm trying to do some performance benchmarking of IPoIB on a DDR IB 
| cluster, and I am having a hard time understanding what I am seeing.
| 
| When I do a simple netperf, I get results like these:
| 
| [root@gateway3 ~]# netperf -H 192.168.23.252
| TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.23.252 
| (192.168.23.252) port 0 AF_INET
| Recv   Send    Send
| Socket Socket  Message  Elapsed
| Size   Size    Size     Time     Throughput
| bytes  bytes   bytes    secs.    10^6bits/sec
| 
|   87380  65536  65536    10.01    4577.70

Are you using connected mode, or UD?  Since you say you have a 4K MTU,
I'm guessing you are using UD.  Change to use connected mode (edit 
/etc/infiniband/openib.conf), or as a quick test

	 echo connected > /sys/class/net/ib0/mode

and then the mtu should show as 65520.  That should help
the bandwidth a fair amount.


Dave Olson
dave.olson-h88ZbnxC6KDQT0dZR+AlfA@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: IPoIB performance benchmarking
       [not found]     ` <alpine.LFD.1.10.1004121317270.21537-vxnkQ4oxbxUi9g6yJnKVd0EOCMrvLtNR@public.gmane.org>
@ 2010-04-12 20:52       ` Tom Ammon
       [not found]         ` <4BC3882B.4070200-wbocuHtxKic@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Tom Ammon @ 2010-04-12 20:52 UTC (permalink / raw)
  To: Dave Olson; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Brian Dale Haymore

Dave,

Thanks for the pointer. I thought it was running in connected mode, and 
looking at that variable that you mentioned confirms it:

[root@gateway3 ~]# cat /sys/class/net/ib0/mode
connected

And the IP MTU shows up as:

[root@gateway3 ~]# ifconfig ib0
ib0       Link encap:InfiniBand  HWaddr 
80:00:00:02:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
           inet addr:192.168.23.253  Bcast:192.168.23.255  
Mask:255.255.254.0
           inet6 addr: fe80::211:7500:ff:6edc/64 Scope:Link
           UP BROADCAST RUNNING MULTICAST  MTU:65520  Metric:1
           RX packets:2319010 errors:0 dropped:0 overruns:0 frame:0
           TX packets:4512605 errors:0 dropped:33011 overruns:0 carrier:0
           collisions:0 txqueuelen:256
           RX bytes:5450805352 (5.0 GiB)  TX bytes:154353169896 (143.7 GiB)


This is partly why I'm stumped - I've seen threads about how connected 
mode is supposed to improve IPoIB performance, but I'm not seeing as 
much performance as I'd like.

Tom

On 04/12/2010 02:19 PM, Dave Olson wrote:
> On Mon, 12 Apr 2010, Tom Ammon wrote:
> | I'm trying to do some performance benchmarking of IPoIB on a DDR IB
> | cluster, and I am having a hard time understanding what I am seeing.
> |
> | When I do a simple netperf, I get results like these:
> |
> | [root@gateway3 ~]# netperf -H 192.168.23.252
> | TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.23.252
> | (192.168.23.252) port 0 AF_INET
> | Recv   Send    Send
> | Socket Socket  Message  Elapsed
> | Size   Size    Size     Time     Throughput
> | bytes  bytes   bytes    secs.    10^6bits/sec
> |
> |   87380  65536  65536    10.01    4577.70
>
> Are you using connected mode, or UD?  Since you say you have a 4K MTU,
> I'm guessing you are using UD.  Change to use connected mode (edit
> /etc/infiniband/openib.conf), or as a quick test
>
> 	 echo connected>  /sys/class/net/ib0/mode
>
> and then the mtu should show as 65520.  That should help
> the bandwidth a fair amount.
>
>
> Dave Olson
> dave.olson-h88ZbnxC6KDQT0dZR+AlfA@public.gmane.org
>    

-- 
--------------------------------------------------------------------
Tom Ammon
Network Engineer
Office: 801.587.0976
Mobile: 801.674.9273

Center for High Performance Computing
University of Utah
http://www.chpc.utah.edu

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: IPoIB performance benchmarking
       [not found]         ` <4BC3882B.4070200-wbocuHtxKic@public.gmane.org>
@ 2010-04-12 22:25           ` Dave Olson
  0 siblings, 0 replies; 4+ messages in thread
From: Dave Olson @ 2010-04-12 22:25 UTC (permalink / raw)
  To: Tom Ammon; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Brian Dale Haymore

On Mon, 12 Apr 2010, Tom Ammon wrote:
| Thanks for the pointer. I thought it was running in connected mode, and 
| looking at that variable that you mentioned confirms it:


| [root@gateway3 ~]# ifconfig ib0
| ib0       Link encap:InfiniBand  HWaddr 
| 80:00:00:02:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
|            inet addr:192.168.23.253  Bcast:192.168.23.255  Mask:255.255.254.0
|            RX packets:2319010 errors:0 dropped:0 overruns:0 frame:0
|            TX packets:4512605 errors:0 dropped:33011 overruns:0 carrier:0

That's a lot of packets dropped on the tx side.

If you have the qlogic software installed, running ipathstats -c1 while
you are running the test would be useful, otherwise perfquery -r at
start and another perfquery at the end on both nodes might point to
something.

Oh, and depending on your tcp stack tuning, setting the receive and/or
send buffer size might help.   These are all ddr results, on a more
or less OFED 1.5.1 stack (completely unofficial, blah blah).

And yes, multi-thread will bring the results up (iperf, rather than netperf).

# netperf -H ib-host TCP_STREAM -- -m 65536      
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to ib-host (172.29.9.46) port 0 AF_INET
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 87380  65536  65536    10.03    5150.24   
# netperf -H ib-host TCP_STREAM -- -m 65536 -S 131072
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to ib-host (172.29.9.46) port 0 AF_INET
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

262144  65536  65536    10.03    5401.83   

# netperf -H ib-host TCP_STREAM -- -m 65536 -S 262144
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to ib-host (172.29.9.46) port 0 AF_INET
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

524288  65536  65536    10.01    5478.28   


Dave Olson
dave.olson-h88ZbnxC6KDQT0dZR+AlfA@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2010-04-12 22:25 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-04-12 18:35 IPoIB performance benchmarking Tom Ammon
     [not found] ` <4BC367DD.30606-wbocuHtxKic@public.gmane.org>
2010-04-12 20:19   ` Dave Olson
     [not found]     ` <alpine.LFD.1.10.1004121317270.21537-vxnkQ4oxbxUi9g6yJnKVd0EOCMrvLtNR@public.gmane.org>
2010-04-12 20:52       ` Tom Ammon
     [not found]         ` <4BC3882B.4070200-wbocuHtxKic@public.gmane.org>
2010-04-12 22:25           ` Dave Olson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.