* IPoIB performance @ 2012-08-29 19:35 Atchley, Scott [not found] ` <FFD82983-4A73-4A52-B6BE-C63DA16A507C-1Heg1YXhbW8@public.gmane.org> 0 siblings, 1 reply; 18+ messages in thread From: Atchley, Scott @ 2012-08-29 19:35 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA Hi all, I am benchmarking a sockets based application and I want a sanity check on IPoIB performance expectations when using connected mode (65520 MTU). I am using the tuning tips in Documentation/infiniband/ipoib.txt. The machines have Mellanox QDR cards (see below for the verbose ibv_devinfo output). I am using a 2.6.36 kernel. The hosts have single socket Intel E5520 (4 core with hyper-threading on) at 2.27 GHz. I am using netperf's TCP_STREAM and binding cores. The best I have seen is ~13 Gbps. Is this the best I can expect from these cards? What should I expect as a max for ipoib with FDR cards? Thanks, Scott hca_id: mlx4_0 transport: InfiniBand (0) fw_ver: 2.7.626 node_guid: 0002:c903:000b:6520 sys_image_guid: 0002:c903:000b:6523 vendor_id: 0x02c9 vendor_part_id: 26428 hw_ver: 0xB0 board_id: MT_0D90110009 phys_port_cnt: 1 max_mr_size: 0xffffffffffffffff page_size_cap: 0xfffffe00 max_qp: 65464 max_qp_wr: 16384 device_cap_flags: 0x006c9c76 max_sge: 32 max_sge_rd: 0 max_cq: 65408 max_cqe: 4194303 max_mr: 131056 max_pd: 32764 max_qp_rd_atom: 16 max_ee_rd_atom: 0 max_res_rd_atom: 1047424 max_qp_init_rd_atom: 128 max_ee_init_rd_atom: 0 atomic_cap: ATOMIC_HCA (1) max_ee: 0 max_rdd: 0 max_mw: 0 max_raw_ipv6_qp: 0 max_raw_ethy_qp: 0 max_mcast_grp: 8192 max_mcast_qp_attach: 56 max_total_mcast_qp_attach: 458752 max_ah: 0 max_fmr: 0 max_srq: 65472 max_srq_wr: 16383 max_srq_sge: 31 max_pkeys: 128 local_ca_ack_delay: 15 port: 1 state: PORT_ACTIVE (4) max_mtu: 2048 (4) active_mtu: 2048 (4) sm_lid: 6 port_lid: 8 port_lmc: 0x00 link_layer: InfiniBand max_msg_sz: 0x40000000 port_cap_flags: 0x02510868 max_vl_num: 8 (4) bad_pkey_cntr: 0x0 qkey_viol_cntr: 0x0 sm_sl: 0 pkey_tbl_len: 128 gid_tbl_len: 128 subnet_timeout: 18 init_type_reply: 0 active_width: 4X (2) active_speed: 10.0 Gbps (4) phys_state: LINK_UP (5) GID[ 0]: fe80:0000:0000:0000:0002:c903:000b:6521 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <FFD82983-4A73-4A52-B6BE-C63DA16A507C-1Heg1YXhbW8@public.gmane.org>]
* Re: IPoIB performance [not found] ` <FFD82983-4A73-4A52-B6BE-C63DA16A507C-1Heg1YXhbW8@public.gmane.org> @ 2012-09-05 15:51 ` Christoph Lameter [not found] ` <0000013997217ec3-f6e2593f-2ff8-408d-814e-0345582b31ca-000000-p/GC64/jrecnJqMo6gzdpkEOCMrvLtNR@public.gmane.org> 2012-09-05 17:50 ` Reeted 1 sibling, 1 reply; 18+ messages in thread From: Christoph Lameter @ 2012-09-05 15:51 UTC (permalink / raw) To: Atchley, Scott; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA On Wed, 29 Aug 2012, Atchley, Scott wrote: > I am benchmarking a sockets based application and I want a sanity check > on IPoIB performance expectations when using connected mode (65520 MTU). > I am using the tuning tips in Documentation/infiniband/ipoib.txt. The > machines have Mellanox QDR cards (see below for the verbose ibv_devinfo > output). I am using a 2.6.36 kernel. The hosts have single socket Intel > E5520 (4 core with hyper-threading on) at 2.27 GHz. > > I am using netperf's TCP_STREAM and binding cores. The best I have seen > is ~13 Gbps. Is this the best I can expect from these cards? Sounds about right, This is not a hardware limitation but a limitation of the socket I/O layer / PCI-E bus. The cards generally can process more data than the PCI bus and the OS can handle. PCI-E on PCI 2.0 should give you up to about 2.3 Gbytes/sec with these nics. So there is like something that the network layer does to you that limits the bandwidth. > What should I expect as a max for ipoib with FDR cards? More of the same. You may want to A) increase the block size handled by the socket layer B) Increase the bandwidth by using PCI-E 3 or more PCI-E lanes. C) Bypass the socket layer. Look at Sean's rsockets layer f.e. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <0000013997217ec3-f6e2593f-2ff8-408d-814e-0345582b31ca-000000-p/GC64/jrecnJqMo6gzdpkEOCMrvLtNR@public.gmane.org>]
* Re: IPoIB performance [not found] ` <0000013997217ec3-f6e2593f-2ff8-408d-814e-0345582b31ca-000000-p/GC64/jrecnJqMo6gzdpkEOCMrvLtNR@public.gmane.org> @ 2012-09-05 17:09 ` Atchley, Scott [not found] ` <3F476926-8618-4233-A150-C5D487B55C68-1Heg1YXhbW8@public.gmane.org> 2012-09-05 17:52 ` Reeted 1 sibling, 1 reply; 18+ messages in thread From: Atchley, Scott @ 2012-09-05 17:09 UTC (permalink / raw) To: Christoph Lameter; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA On Sep 5, 2012, at 11:51 AM, Christoph Lameter wrote: > On Wed, 29 Aug 2012, Atchley, Scott wrote: > >> I am benchmarking a sockets based application and I want a sanity check >> on IPoIB performance expectations when using connected mode (65520 MTU). >> I am using the tuning tips in Documentation/infiniband/ipoib.txt. The >> machines have Mellanox QDR cards (see below for the verbose ibv_devinfo >> output). I am using a 2.6.36 kernel. The hosts have single socket Intel >> E5520 (4 core with hyper-threading on) at 2.27 GHz. >> >> I am using netperf's TCP_STREAM and binding cores. The best I have seen >> is ~13 Gbps. Is this the best I can expect from these cards? > > Sounds about right, This is not a hardware limitation but > a limitation of the socket I/O layer / PCI-E bus. The cards generally can > process more data than the PCI bus and the OS can handle. > > PCI-E on PCI 2.0 should give you up to about 2.3 Gbytes/sec with these > nics. So there is like something that the network layer does to you that > limits the bandwidth. First, thanks for the reply. I am not sure where are are getting the 2.3 GB/s value. When using verbs natively, I can get ~3.4 GB/s. I am assuming that these HCAs lack certain TCP offloads that might allow higher Socket performance. Ethtool reports: # ethtool -k ib0 Offload parameters for ib0: rx-checksumming: off tx-checksumming: off scatter-gather: off tcp segmentation offload: off udp fragmentation offload: off generic segmentation offload: on generic-receive-offload: off There is no checksum support which I would expect to lower performance. Since checksums need to be calculated in the host, I would expect faster processors to help performance some. So basically, am I in the ball park given this hardware? > >> What should I expect as a max for ipoib with FDR cards? > > More of the same. You may want to > > A) increase the block size handled by the socket layer Do you mean altering sysctl with something like: # increase TCP max buffer size setable using setsockopt() net.core.rmem_max = 16777216 net.core.wmem_max = 16777216 # increase Linux autotuning TCP buffer limit net.ipv4.tcp_rmem = 4096 87380 16777216 net.ipv4.tcp_wmem = 4096 65536 16777216 # increase the length of the processor input queue net.core.netdev_max_backlog = 30000 or something increasing the SO_SNFBUF and SO_RCVBUF sizes or something else? > B) Increase the bandwidth by using PCI-E 3 or more PCI-E lanes. > > C) Bypass the socket layer. Look at Sean's rsockets layer f.e. We actually want to test the socket stack and not bypass it. Thanks again! Scott -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <3F476926-8618-4233-A150-C5D487B55C68-1Heg1YXhbW8@public.gmane.org>]
* Re: IPoIB performance [not found] ` <3F476926-8618-4233-A150-C5D487B55C68-1Heg1YXhbW8@public.gmane.org> @ 2012-09-05 18:20 ` Christoph Lameter [not found] ` <0000013997a928ab-36ad5a02-3c82-4daf-8e8a-a86c65e92376-000000-p/GC64/jrecnJqMo6gzdpkEOCMrvLtNR@public.gmane.org> 0 siblings, 1 reply; 18+ messages in thread From: Christoph Lameter @ 2012-09-05 18:20 UTC (permalink / raw) To: Atchley, Scott; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA On Wed, 5 Sep 2012, Atchley, Scott wrote: > # ethtool -k ib0 > Offload parameters for ib0: > rx-checksumming: off > tx-checksumming: off > scatter-gather: off > tcp segmentation offload: off > udp fragmentation offload: off > generic segmentation offload: on > generic-receive-offload: off > > There is no checksum support which I would expect to lower performance. > Since checksums need to be calculated in the host, I would expect faster > processors to help performance some. K that is a major problem. Both are on by default here. What NIC is this? > > A) increase the block size handled by the socket layer > > Do you mean altering sysctl with something like: Nope increase mtu. Connected mode supports up to 64k mtu size I believe. > or something increasing the SO_SNFBUF and SO_RCVBUF sizes or something else? That does nothing for performance. The problem is that the handling of the data by the kernel causes too much latency so that you cannot reach the full bw of the hardware. > We actually want to test the socket stack and not bypass it. AFAICT the network stack is useful up to 1Gbps and after that more and more band-aid comes into play. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <0000013997a928ab-36ad5a02-3c82-4daf-8e8a-a86c65e92376-000000-p/GC64/jrecnJqMo6gzdpkEOCMrvLtNR@public.gmane.org>]
* Re: IPoIB performance [not found] ` <0000013997a928ab-36ad5a02-3c82-4daf-8e8a-a86c65e92376-000000-p/GC64/jrecnJqMo6gzdpkEOCMrvLtNR@public.gmane.org> @ 2012-09-05 18:30 ` Atchley, Scott [not found] ` <78D8717C-0505-408B-8625-A9124AB33C9E-1Heg1YXhbW8@public.gmane.org> 0 siblings, 1 reply; 18+ messages in thread From: Atchley, Scott @ 2012-09-05 18:30 UTC (permalink / raw) To: Christoph Lameter; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA On Sep 5, 2012, at 2:20 PM, Christoph Lameter wrote: > On Wed, 5 Sep 2012, Atchley, Scott wrote: > >> # ethtool -k ib0 >> Offload parameters for ib0: >> rx-checksumming: off >> tx-checksumming: off >> scatter-gather: off >> tcp segmentation offload: off >> udp fragmentation offload: off >> generic segmentation offload: on >> generic-receive-offload: off >> >> There is no checksum support which I would expect to lower performance. >> Since checksums need to be calculated in the host, I would expect faster >> processors to help performance some. > > K that is a major problem. Both are on by default here. What NIC is this? These are Mellanox QDR HCAs (board id is MT_0D90110009). The full output of ibv_devinfo is in my original post. >>> A) increase the block size handled by the socket layer >> >> Do you mean altering sysctl with something like: > > Nope increase mtu. Connected mode supports up to 64k mtu size I believe. Yes, I am using the max MTU (65520). >> or something increasing the SO_SNFBUF and SO_RCVBUF sizes or something else? > > That does nothing for performance. The problem is that the handling of the > data by the kernel causes too much latency so that you cannot reach the > full bw of the hardware. > >> We actually want to test the socket stack and not bypass it. > > AFAICT the network stack is useful up to 1Gbps and > after that more and more band-aid comes into play. Hmm, many 10G Ethernet NICs can reach line rate. I have not yet tested any 40G Ethernet NICs, but I hope that they will get close to line rate. If not, what is the point? ;-) Scott-- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <78D8717C-0505-408B-8625-A9124AB33C9E-1Heg1YXhbW8@public.gmane.org>]
* Re: IPoIB performance [not found] ` <78D8717C-0505-408B-8625-A9124AB33C9E-1Heg1YXhbW8@public.gmane.org> @ 2012-09-05 19:06 ` Christoph Lameter [not found] ` <0000013997d35848-57319f33-839b-4480-a075-53b36f67bfe2-000000-p/GC64/jrecnJqMo6gzdpkEOCMrvLtNR@public.gmane.org> 2012-09-05 19:13 ` Christoph Lameter 1 sibling, 1 reply; 18+ messages in thread From: Christoph Lameter @ 2012-09-05 19:06 UTC (permalink / raw) To: Atchley, Scott; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA On Wed, 5 Sep 2012, Atchley, Scott wrote: > > AFAICT the network stack is useful up to 1Gbps and > > after that more and more band-aid comes into play. > > Hmm, many 10G Ethernet NICs can reach line rate. I have not yet tested any 40G Ethernet NICs, but I hope that they will get close to line rate. If not, what is the point? ;-) Oh yes they can under restricted circumstances. Large packets, multiple cores etc. With the band-aids.... -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <0000013997d35848-57319f33-839b-4480-a075-53b36f67bfe2-000000-p/GC64/jrecnJqMo6gzdpkEOCMrvLtNR@public.gmane.org>]
* Re: IPoIB performance [not found] ` <0000013997d35848-57319f33-839b-4480-a075-53b36f67bfe2-000000-p/GC64/jrecnJqMo6gzdpkEOCMrvLtNR@public.gmane.org> @ 2012-09-05 19:48 ` Atchley, Scott [not found] ` <E88DEA9F-416B-4663-A292-5780DFF9B641-1Heg1YXhbW8@public.gmane.org> 0 siblings, 1 reply; 18+ messages in thread From: Atchley, Scott @ 2012-09-05 19:48 UTC (permalink / raw) To: Christoph Lameter; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA On Sep 5, 2012, at 3:06 PM, Christoph Lameter wrote: > On Wed, 5 Sep 2012, Atchley, Scott wrote: > >>> AFAICT the network stack is useful up to 1Gbps and >>> after that more and more band-aid comes into play. >> >> Hmm, many 10G Ethernet NICs can reach line rate. I have not yet tested any 40G Ethernet NICs, but I hope that they will get close to line rate. If not, what is the point? ;-) > > Oh yes they can under restricted circumstances. Large packets, multiple > cores etc. With the band-aids…. With Myricom 10G NICs, for example, you just need one core and it can do line rate with 1500 byte MTU. Do you count the stateless offloads as band-aids? Or something else? I have not tested any 40G NICs yet, but I imagine that one core will not be enough. Thanks, Scott-- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <E88DEA9F-416B-4663-A292-5780DFF9B641-1Heg1YXhbW8@public.gmane.org>]
* Re: IPoIB performance [not found] ` <E88DEA9F-416B-4663-A292-5780DFF9B641-1Heg1YXhbW8@public.gmane.org> @ 2012-09-05 19:53 ` Christoph Lameter 2012-09-05 20:12 ` Ezra Kissel 1 sibling, 0 replies; 18+ messages in thread From: Christoph Lameter @ 2012-09-05 19:53 UTC (permalink / raw) To: Atchley, Scott; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA On Wed, 5 Sep 2012, Atchley, Scott wrote: > With Myricom 10G NICs, for example, you just need one core and it can do > line rate with 1500 byte MTU. Do you count the stateless offloads as > band-aids? Or something else? The stateless aids also have certain limitations. Its a grey zone if you want to call them band aids. It gets there at some point because stateless offload can only get you so far. The need to send larger sized packets through the kernel increases the latency and forces the app to do larger batching. Its not very useful if you need to send small packets to a variety of receivers. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: IPoIB performance [not found] ` <E88DEA9F-416B-4663-A292-5780DFF9B641-1Heg1YXhbW8@public.gmane.org> 2012-09-05 19:53 ` Christoph Lameter @ 2012-09-05 20:12 ` Ezra Kissel [not found] ` <5047B23C.6080604-GZvvpLG7cYSVc3sceRu5cw@public.gmane.org> 1 sibling, 1 reply; 18+ messages in thread From: Ezra Kissel @ 2012-09-05 20:12 UTC (permalink / raw) To: Atchley, Scott; +Cc: Christoph Lameter, linux-rdma-u79uwXL29TY76Z2rM5mHXA On 9/5/2012 3:48 PM, Atchley, Scott wrote: > On Sep 5, 2012, at 3:06 PM, Christoph Lameter wrote: > >> On Wed, 5 Sep 2012, Atchley, Scott wrote: >> >>>> AFAICT the network stack is useful up to 1Gbps and >>>> after that more and more band-aid comes into play. >>> >>> Hmm, many 10G Ethernet NICs can reach line rate. I have not yet tested any 40G Ethernet NICs, but I hope that they will get close to line rate. If not, what is the point? ;-) >> >> Oh yes they can under restricted circumstances. Large packets, multiple >> cores etc. With the band-aids…. > > With Myricom 10G NICs, for example, you just need one core and it can do line rate with 1500 byte MTU. Do you count the stateless offloads as band-aids? Or something else? > > I have not tested any 40G NICs yet, but I imagine that one core will not be enough. > Since you are using netperf, you might also considering experimenting with the TCP_SENDFILE test. Using sendfile/splice calls can have a significant impact for sockets-based apps. Using 40G NICs (Mellanox ConnectX-3 EN), I've seen our applications hit 22Gb/s single core/stream while fully CPU bound. With sendfile/splice, there is no issue saturating a 40G link with about 40-50% core utilization. That being said, binding to the right core/node, message size and memory alignment, interrupt handling, and proper host/NIC tuning all have an impact on the performance. The state of high-performance networking is certainly not plug-and-play. - ezra -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <5047B23C.6080604-GZvvpLG7cYSVc3sceRu5cw@public.gmane.org>]
* Re: IPoIB performance [not found] ` <5047B23C.6080604-GZvvpLG7cYSVc3sceRu5cw@public.gmane.org> @ 2012-09-05 20:32 ` Atchley, Scott 0 siblings, 0 replies; 18+ messages in thread From: Atchley, Scott @ 2012-09-05 20:32 UTC (permalink / raw) To: Ezra Kissel; +Cc: Christoph Lameter, linux-rdma-u79uwXL29TY76Z2rM5mHXA On Sep 5, 2012, at 4:12 PM, Ezra Kissel wrote: > On 9/5/2012 3:48 PM, Atchley, Scott wrote: >> On Sep 5, 2012, at 3:06 PM, Christoph Lameter wrote: >> >>> On Wed, 5 Sep 2012, Atchley, Scott wrote: >>> >>>>> AFAICT the network stack is useful up to 1Gbps and >>>>> after that more and more band-aid comes into play. >>>> >>>> Hmm, many 10G Ethernet NICs can reach line rate. I have not yet tested any 40G Ethernet NICs, but I hope that they will get close to line rate. If not, what is the point? ;-) >>> >>> Oh yes they can under restricted circumstances. Large packets, multiple >>> cores etc. With the band-aids…. >> >> With Myricom 10G NICs, for example, you just need one core and it can do line rate with 1500 byte MTU. Do you count the stateless offloads as band-aids? Or something else? >> >> I have not tested any 40G NICs yet, but I imagine that one core will not be enough. >> > Since you are using netperf, you might also considering experimenting > with the TCP_SENDFILE test. Using sendfile/splice calls can have a > significant impact for sockets-based apps. > > Using 40G NICs (Mellanox ConnectX-3 EN), I've seen our applications hit > 22Gb/s single core/stream while fully CPU bound. With sendfile/splice, > there is no issue saturating a 40G link with about 40-50% core > utilization. That being said, binding to the right core/node, message > size and memory alignment, interrupt handling, and proper host/NIC > tuning all have an impact on the performance. The state of > high-performance networking is certainly not plug-and-play. Thanks for the tip. The app we want to test does not use sendfile() or splice(). I do bind to the "best" core (determined by testing all combinations on client and server). I have heard others within DOE reach ~16 Gb/s on a 40G Mellanox NIC. I'm glad to hear that you got to 22 Gb/s for a single stream. That is more reassuring. Scott-- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: IPoIB performance [not found] ` <78D8717C-0505-408B-8625-A9124AB33C9E-1Heg1YXhbW8@public.gmane.org> 2012-09-05 19:06 ` Christoph Lameter @ 2012-09-05 19:13 ` Christoph Lameter [not found] ` <0000013997da4014-44dda0e8-01f3-48b8-b0cd-fe41164d590c-000000-p/GC64/jrecnJqMo6gzdpkEOCMrvLtNR@public.gmane.org> 1 sibling, 1 reply; 18+ messages in thread From: Christoph Lameter @ 2012-09-05 19:13 UTC (permalink / raw) To: Atchley, Scott; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA On Wed, 5 Sep 2012, Atchley, Scott wrote: > These are Mellanox QDR HCAs (board id is MT_0D90110009). The full output of ibv_devinfo is in my original post. Hmmm... You are running an old kernel. What version of OFED do you use? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <0000013997da4014-44dda0e8-01f3-48b8-b0cd-fe41164d590c-000000-p/GC64/jrecnJqMo6gzdpkEOCMrvLtNR@public.gmane.org>]
* Re: IPoIB performance [not found] ` <0000013997da4014-44dda0e8-01f3-48b8-b0cd-fe41164d590c-000000-p/GC64/jrecnJqMo6gzdpkEOCMrvLtNR@public.gmane.org> @ 2012-09-05 19:52 ` Atchley, Scott [not found] ` <F90894CF-B55C-4AF7-845C-279FDF44351E-1Heg1YXhbW8@public.gmane.org> 0 siblings, 1 reply; 18+ messages in thread From: Atchley, Scott @ 2012-09-05 19:52 UTC (permalink / raw) To: Christoph Lameter; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA On Sep 5, 2012, at 3:13 PM, Christoph Lameter wrote: > On Wed, 5 Sep 2012, Atchley, Scott wrote: > >> These are Mellanox QDR HCAs (board id is MT_0D90110009). The full output of ibv_devinfo is in my original post. > > Hmmm... You are running an old kernel. What version of OFED do you use? Hah, if you think my kernel is old, you should see my userland (RHEL5.5). ;-) Does the version of OFED impact the kernel modules? I am using the modules that came with the kernel. I don't believe that libibverbs or librdmacm are used by the kernel's socket stack. That said, I am using source builds with tags libibverbs-1.1.6 and v1.0.16 (librdmacm). Scott-- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <F90894CF-B55C-4AF7-845C-279FDF44351E-1Heg1YXhbW8@public.gmane.org>]
* Re: IPoIB performance [not found] ` <F90894CF-B55C-4AF7-845C-279FDF44351E-1Heg1YXhbW8@public.gmane.org> @ 2012-09-05 20:26 ` Christoph Lameter 0 siblings, 0 replies; 18+ messages in thread From: Christoph Lameter @ 2012-09-05 20:26 UTC (permalink / raw) To: Atchley, Scott; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA On Wed, 5 Sep 2012, Atchley, Scott wrote: > > Hmmm... You are running an old kernel. What version of OFED do you > > use? > > Hah, if you think my kernel is old, you should see my userland > (RHEL5.5). ;-) My condolences. > Does the version of OFED impact the kernel modules? I am using the > modules that came with the kernel. I don't believe that libibverbs or > librdmacm are used by the kernel's socket stack. That said, I am using > source builds with tags libibverbs-1.1.6 and v1.0.16 (librdmacm). OFED includes kernel modules which provides the drivers that you need. Installing a new OFED release on RH5 is possible and would give you up to date drivers. Check with RH: They may have them somewhere easy to install for your version of RH. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: IPoIB performance [not found] ` <0000013997217ec3-f6e2593f-2ff8-408d-814e-0345582b31ca-000000-p/GC64/jrecnJqMo6gzdpkEOCMrvLtNR@public.gmane.org> 2012-09-05 17:09 ` Atchley, Scott @ 2012-09-05 17:52 ` Reeted 1 sibling, 0 replies; 18+ messages in thread From: Reeted @ 2012-09-05 17:52 UTC (permalink / raw) To: Christoph Lameter; +Cc: Atchley, Scott, linux-rdma-u79uwXL29TY76Z2rM5mHXA On 09/05/12 17:51, Christoph Lameter wrote: > PCI-E on PCI 2.0 should give you up to about 2.3 Gbytes/sec with these > nics. So there is like something that the network layer does to you that > limits the bandwidth. I think those are 8 lane PCI-e 2.0 so that would be 500MB/sec x 8 that's 4 GBytes/sec. Or you really mean there is almost 50% overhead? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: IPoIB performance [not found] ` <FFD82983-4A73-4A52-B6BE-C63DA16A507C-1Heg1YXhbW8@public.gmane.org> 2012-09-05 15:51 ` Christoph Lameter @ 2012-09-05 17:50 ` Reeted [not found] ` <504790D8.7010307-9AbUPqfR1/2XDw4h08c5KA@public.gmane.org> 1 sibling, 1 reply; 18+ messages in thread From: Reeted @ 2012-09-05 17:50 UTC (permalink / raw) To: Atchley, Scott; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA On 08/29/12 21:35, Atchley, Scott wrote: > Hi all, > > I am benchmarking a sockets based application and I want a sanity check on IPoIB performance expectations when using connected mode (65520 MTU)..... I have read that with newer cards the datagram (unconnected) mode is faster at IPoIB than connected mode. Do you want to check? What benchmark program are you using? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <504790D8.7010307-9AbUPqfR1/2XDw4h08c5KA@public.gmane.org>]
* Re: IPoIB performance [not found] ` <504790D8.7010307-9AbUPqfR1/2XDw4h08c5KA@public.gmane.org> @ 2012-09-05 17:59 ` Atchley, Scott [not found] ` <64A9A0CD-00E0-4455-A641-324FA9BB8BC2-1Heg1YXhbW8@public.gmane.org> 0 siblings, 1 reply; 18+ messages in thread From: Atchley, Scott @ 2012-09-05 17:59 UTC (permalink / raw) To: Reeted; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA On Sep 5, 2012, at 1:50 PM, Reeted wrote: > On 08/29/12 21:35, Atchley, Scott wrote: >> Hi all, >> >> I am benchmarking a sockets based application and I want a sanity check on IPoIB performance expectations when using connected mode (65520 MTU)..... > > I have read that with newer cards the datagram (unconnected) mode is > faster at IPoIB than connected mode. Do you want to check? I have read that the latency is lower (better) but the bandwidth is lower. Using datagram mode limits the MTU to 2044 and the throughput to ~3 Gb/s on these machines/cards. Connected mode at the same MTU performs roughly the same. The win in connected mode comes with larger MTUs. With a 9000 MTU, I see ~6 Gb/s. Pushing the MTU to 655120 (the maximum for ipoib), I can get ~13 Gb/s. > What benchmark program are you using? netperf with process binding (-T). I tune sysctl per the DOE FasterData specs: http://fasterdata.es.net/host-tuning/linux/ Scott-- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <64A9A0CD-00E0-4455-A641-324FA9BB8BC2-1Heg1YXhbW8@public.gmane.org>]
* Re: IPoIB performance [not found] ` <64A9A0CD-00E0-4455-A641-324FA9BB8BC2-1Heg1YXhbW8@public.gmane.org> @ 2012-09-05 19:04 ` Reeted [not found] ` <5047A233.2040602-9AbUPqfR1/2XDw4h08c5KA@public.gmane.org> 0 siblings, 1 reply; 18+ messages in thread From: Reeted @ 2012-09-05 19:04 UTC (permalink / raw) To: Atchley, Scott; +Cc: linux-rdma On 09/05/12 19:59, Atchley, Scott wrote: > On Sep 5, 2012, at 1:50 PM, Reeted wrote: > >> >> I have read that with newer cards the datagram (unconnected) mode is >> faster at IPoIB than connected mode. Do you want to check? > I have read that the latency is lower (better) but the bandwidth is lower. > > Using datagram mode limits the MTU to 2044 and the throughput to ~3 Gb/s on these machines/cards. Connected mode at the same MTU performs roughly the same. The win in connected mode comes with larger MTUs. With a 9000 MTU, I see ~6 Gb/s. Pushing the MTU to 655120 (the maximum for ipoib), I can get ~13 Gb/s. > Have a look at an old thread in this ML by Sebastien Dugue "IPoIB to Ethernet routing performance" He had numbers much higher than yours on similar hardware, and was suggested to use datagram to achieve offloading and even higher speeds. Keep me informed if you can fix this, I am interested but can't test infiniband myself right now. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <5047A233.2040602-9AbUPqfR1/2XDw4h08c5KA@public.gmane.org>]
* Re: IPoIB performance [not found] ` <5047A233.2040602-9AbUPqfR1/2XDw4h08c5KA@public.gmane.org> @ 2012-09-05 19:46 ` Atchley, Scott 0 siblings, 0 replies; 18+ messages in thread From: Atchley, Scott @ 2012-09-05 19:46 UTC (permalink / raw) To: Reeted; +Cc: linux-rdma On Sep 5, 2012, at 3:04 PM, Reeted wrote: > On 09/05/12 19:59, Atchley, Scott wrote: >> On Sep 5, 2012, at 1:50 PM, Reeted wrote: >> >>> >>> I have read that with newer cards the datagram (unconnected) mode is >>> faster at IPoIB than connected mode. Do you want to check? >> I have read that the latency is lower (better) but the bandwidth is lower. >> >> Using datagram mode limits the MTU to 2044 and the throughput to ~3 Gb/s on these machines/cards. Connected mode at the same MTU performs roughly the same. The win in connected mode comes with larger MTUs. With a 9000 MTU, I see ~6 Gb/s. Pushing the MTU to 655120 (the maximum for ipoib), I can get ~13 Gb/s. >> > > Have a look at an old thread in this ML by Sebastien Dugue "IPoIB to > Ethernet routing performance" > He had numbers much higher than yours on similar hardware, and was > suggested to use datagram to achieve offloading and even higher speeds. > Keep me informed if you can fix this, I am interested but can't test > infiniband myself right now. He claims 20 Gb/s and Or replies that one should also get near 20 Gb/s using datagram mode. I checked and datagram mode shows support via ethtool for more offloads. In my case, I still see better performance with connected mode. Thanks, Scott-- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2012-09-05 20:32 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-08-29 19:35 IPoIB performance Atchley, Scott [not found] ` <FFD82983-4A73-4A52-B6BE-C63DA16A507C-1Heg1YXhbW8@public.gmane.org> 2012-09-05 15:51 ` Christoph Lameter [not found] ` <0000013997217ec3-f6e2593f-2ff8-408d-814e-0345582b31ca-000000-p/GC64/jrecnJqMo6gzdpkEOCMrvLtNR@public.gmane.org> 2012-09-05 17:09 ` Atchley, Scott [not found] ` <3F476926-8618-4233-A150-C5D487B55C68-1Heg1YXhbW8@public.gmane.org> 2012-09-05 18:20 ` Christoph Lameter [not found] ` <0000013997a928ab-36ad5a02-3c82-4daf-8e8a-a86c65e92376-000000-p/GC64/jrecnJqMo6gzdpkEOCMrvLtNR@public.gmane.org> 2012-09-05 18:30 ` Atchley, Scott [not found] ` <78D8717C-0505-408B-8625-A9124AB33C9E-1Heg1YXhbW8@public.gmane.org> 2012-09-05 19:06 ` Christoph Lameter [not found] ` <0000013997d35848-57319f33-839b-4480-a075-53b36f67bfe2-000000-p/GC64/jrecnJqMo6gzdpkEOCMrvLtNR@public.gmane.org> 2012-09-05 19:48 ` Atchley, Scott [not found] ` <E88DEA9F-416B-4663-A292-5780DFF9B641-1Heg1YXhbW8@public.gmane.org> 2012-09-05 19:53 ` Christoph Lameter 2012-09-05 20:12 ` Ezra Kissel [not found] ` <5047B23C.6080604-GZvvpLG7cYSVc3sceRu5cw@public.gmane.org> 2012-09-05 20:32 ` Atchley, Scott 2012-09-05 19:13 ` Christoph Lameter [not found] ` <0000013997da4014-44dda0e8-01f3-48b8-b0cd-fe41164d590c-000000-p/GC64/jrecnJqMo6gzdpkEOCMrvLtNR@public.gmane.org> 2012-09-05 19:52 ` Atchley, Scott [not found] ` <F90894CF-B55C-4AF7-845C-279FDF44351E-1Heg1YXhbW8@public.gmane.org> 2012-09-05 20:26 ` Christoph Lameter 2012-09-05 17:52 ` Reeted 2012-09-05 17:50 ` Reeted [not found] ` <504790D8.7010307-9AbUPqfR1/2XDw4h08c5KA@public.gmane.org> 2012-09-05 17:59 ` Atchley, Scott [not found] ` <64A9A0CD-00E0-4455-A641-324FA9BB8BC2-1Heg1YXhbW8@public.gmane.org> 2012-09-05 19:04 ` Reeted [not found] ` <5047A233.2040602-9AbUPqfR1/2XDw4h08c5KA@public.gmane.org> 2012-09-05 19:46 ` Atchley, Scott
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.