All of lore.kernel.org
 help / color / mirror / Atom feed
* Soft-RoCE performance
@ 2022-02-10  3:33 Christian Blume
  2022-02-10  5:13 ` Pearson, Robert B
  0 siblings, 1 reply; 7+ messages in thread
From: Christian Blume @ 2022-02-10  3:33 UTC (permalink / raw)
  To: RDMA mailing list

Hello!

I am seeing that Soft-RoCE has much lower throughput than TCP. Is that
expected? If not, are there typical config parameters I can fiddle
with?

When running iperf I am getting around 300MB/s whereas it's only
around 100MB/s using ib_write_bw from perftests.

This is between two machines running Ubuntu20.04 with the 5.11 kernel.

Cheers,
Christian

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: Soft-RoCE performance
  2022-02-10  3:33 Soft-RoCE performance Christian Blume
@ 2022-02-10  5:13 ` Pearson, Robert B
  2022-02-10  5:28   ` Pearson, Robert B
                     ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Pearson, Robert B @ 2022-02-10  5:13 UTC (permalink / raw)
  To: Christian Blume, RDMA mailing list

Christian,

There are two key differences between TCP and soft RoCE. Most importantly TCP can use a 64KiB MTU which is fragmented by TSO or GSO if your NIC doesn't support TSO while soft RoCE is limited by the protocol to a 4KiB payload. With overhead for headers you need a link MTU of about 4096+256. If your application is going between soft RoCE and hard RoCE you have to live with this limit and compute ICRC on each packet. Checking is optional since RoCE packets have a crc32 checksum from ethernet. If you are using soft RoCE to soft RoCE you can ignore both ICRC calculations and with a patch increase the MTU above 4KiB. I have measured write performance up to around 35 GB/s in local loopback on a single 12 core box (AMD 3900x) using 12 IO threads, 16KB MTU, and ICRC disabled for 1MB messages. This is on head of tree with some patches not yet upstream.

Bob Pearson
rpearsonhpe@gmail.com
rpearson@hpe.com


-----Original Message-----
From: Christian Blume <chr.blume@gmail.com> 
Sent: Wednesday, February 9, 2022 9:34 PM
To: RDMA mailing list <linux-rdma@vger.kernel.org>
Subject: Soft-RoCE performance

Hello!

I am seeing that Soft-RoCE has much lower throughput than TCP. Is that expected? If not, are there typical config parameters I can fiddle with?

When running iperf I am getting around 300MB/s whereas it's only around 100MB/s using ib_write_bw from perftests.

This is between two machines running Ubuntu20.04 with the 5.11 kernel.

Cheers,
Christian

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: Soft-RoCE performance
  2022-02-10  5:13 ` Pearson, Robert B
@ 2022-02-10  5:28   ` Pearson, Robert B
  2022-02-10 14:04   ` Yanjun Zhu
  2022-02-11 10:53   ` Bernard Metzler
  2 siblings, 0 replies; 7+ messages in thread
From: Pearson, Robert B @ 2022-02-10  5:28 UTC (permalink / raw)
  To: Pearson, Robert B, Christian Blume, RDMA mailing list

Sorry got distracted and forgot to mention the other thing. Ib_write_bw is limited to a single threaded process. If you run more IO threads you can improve performance. Ib_write_bw will let you have more than one QP but it has to run on a single core so it doesn't scale all that well.

Bob

-----Original Message-----
From: Pearson, Robert B <robert.pearson2@hpe.com> 
Sent: Wednesday, February 9, 2022 11:13 PM
To: Christian Blume <chr.blume@gmail.com>; RDMA mailing list <linux-rdma@vger.kernel.org>
Subject: RE: Soft-RoCE performance

Christian,

There are two key differences between TCP and soft RoCE. Most importantly TCP can use a 64KiB MTU which is fragmented by TSO or GSO if your NIC doesn't support TSO while soft RoCE is limited by the protocol to a 4KiB payload. With overhead for headers you need a link MTU of about 4096+256. If your application is going between soft RoCE and hard RoCE you have to live with this limit and compute ICRC on each packet. Checking is optional since RoCE packets have a crc32 checksum from ethernet. If you are using soft RoCE to soft RoCE you can ignore both ICRC calculations and with a patch increase the MTU above 4KiB. I have measured write performance up to around 35 GB/s in local loopback on a single 12 core box (AMD 3900x) using 12 IO threads, 16KB MTU, and ICRC disabled for 1MB messages. This is on head of tree with some patches not yet upstream.

Bob Pearson
rpearsonhpe@gmail.com
rpearson@hpe.com


-----Original Message-----
From: Christian Blume <chr.blume@gmail.com> 
Sent: Wednesday, February 9, 2022 9:34 PM
To: RDMA mailing list <linux-rdma@vger.kernel.org>
Subject: Soft-RoCE performance

Hello!

I am seeing that Soft-RoCE has much lower throughput than TCP. Is that expected? If not, are there typical config parameters I can fiddle with?

When running iperf I am getting around 300MB/s whereas it's only around 100MB/s using ib_write_bw from perftests.

This is between two machines running Ubuntu20.04 with the 5.11 kernel.

Cheers,
Christian

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Soft-RoCE performance
  2022-02-10  5:13 ` Pearson, Robert B
  2022-02-10  5:28   ` Pearson, Robert B
@ 2022-02-10 14:04   ` Yanjun Zhu
  2022-02-10 22:23     ` Christian Blume
  2022-02-11 17:48     ` Bob Pearson
  2022-02-11 10:53   ` Bernard Metzler
  2 siblings, 2 replies; 7+ messages in thread
From: Yanjun Zhu @ 2022-02-10 14:04 UTC (permalink / raw)
  To: Pearson, Robert B, Christian Blume, RDMA mailing list

在 2022/2/10 13:13, Pearson, Robert B 写道:
> Christian,
> 
> There are two key differences between TCP and soft RoCE. Most importantly TCP can use a 64KiB MTU which is fragmented by TSO or GSO if your NIC doesn't support TSO while soft RoCE is limited by the protocol to a 4KiB payload. With overhead for headers you need a link MTU of about 4096+256. If your application is going between soft RoCE and hard RoCE you have to live with this limit and compute ICRC on each packet. Checking is optional since RoCE packets have a crc32 checksum from ethernet. If you are using soft RoCE to soft RoCE you can ignore both ICRC calculations and with a patch increase the MTU above 4KiB. I have measured write performance up to around 35 GB/s 

Thanks, I have also reached the big bandwidth with the same methods.
How about latency of soft roce?

Zhu Yanjun


in local loopback on a single 12 core box (AMD 3900x) using 12 IO 
threads, 16KB MTU, and ICRC disabled for 1MB messages. This is on head 
of tree with some patches not yet upstream.
> 
> Bob Pearson
> rpearsonhpe@gmail.com
> rpearson@hpe.com
> 
> 
> -----Original Message-----
> From: Christian Blume <chr.blume@gmail.com>
> Sent: Wednesday, February 9, 2022 9:34 PM
> To: RDMA mailing list <linux-rdma@vger.kernel.org>
> Subject: Soft-RoCE performance
> 
> Hello!
> 
> I am seeing that Soft-RoCE has much lower throughput than TCP. Is that expected? If not, are there typical config parameters I can fiddle with?
> 
> When running iperf I am getting around 300MB/s whereas it's only around 100MB/s using ib_write_bw from perftests.
> 
> This is between two machines running Ubuntu20.04 with the 5.11 kernel.
> 
> Cheers,
> Christian


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Soft-RoCE performance
  2022-02-10 14:04   ` Yanjun Zhu
@ 2022-02-10 22:23     ` Christian Blume
  2022-02-11 17:48     ` Bob Pearson
  1 sibling, 0 replies; 7+ messages in thread
From: Christian Blume @ 2022-02-10 22:23 UTC (permalink / raw)
  To: Yanjun Zhu; +Cc: Pearson, Robert B, RDMA mailing list

Hi Bob,

Thanks for clarifying! I am looking forward to testing larger MTUs
with the new functionality!

Cheers,
Christian

On Fri, Feb 11, 2022 at 3:04 AM Yanjun Zhu <yanjun.zhu@linux.dev> wrote:
>
> 在 2022/2/10 13:13, Pearson, Robert B 写道:
> > Christian,
> >
> > There are two key differences between TCP and soft RoCE. Most importantly TCP can use a 64KiB MTU which is fragmented by TSO or GSO if your NIC doesn't support TSO while soft RoCE is limited by the protocol to a 4KiB payload. With overhead for headers you need a link MTU of about 4096+256. If your application is going between soft RoCE and hard RoCE you have to live with this limit and compute ICRC on each packet. Checking is optional since RoCE packets have a crc32 checksum from ethernet. If you are using soft RoCE to soft RoCE you can ignore both ICRC calculations and with a patch increase the MTU above 4KiB. I have measured write performance up to around 35 GB/s
>
> Thanks, I have also reached the big bandwidth with the same methods.
> How about latency of soft roce?
>
> Zhu Yanjun
>
>
> in local loopback on a single 12 core box (AMD 3900x) using 12 IO
> threads, 16KB MTU, and ICRC disabled for 1MB messages. This is on head
> of tree with some patches not yet upstream.
> >
> > Bob Pearson
> > rpearsonhpe@gmail.com
> > rpearson@hpe.com
> >
> >
> > -----Original Message-----
> > From: Christian Blume <chr.blume@gmail.com>
> > Sent: Wednesday, February 9, 2022 9:34 PM
> > To: RDMA mailing list <linux-rdma@vger.kernel.org>
> > Subject: Soft-RoCE performance
> >
> > Hello!
> >
> > I am seeing that Soft-RoCE has much lower throughput than TCP. Is that expected? If not, are there typical config parameters I can fiddle with?
> >
> > When running iperf I am getting around 300MB/s whereas it's only around 100MB/s using ib_write_bw from perftests.
> >
> > This is between two machines running Ubuntu20.04 with the 5.11 kernel.
> >
> > Cheers,
> > Christian
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: Soft-RoCE performance
  2022-02-10  5:13 ` Pearson, Robert B
  2022-02-10  5:28   ` Pearson, Robert B
  2022-02-10 14:04   ` Yanjun Zhu
@ 2022-02-11 10:53   ` Bernard Metzler
  2 siblings, 0 replies; 7+ messages in thread
From: Bernard Metzler @ 2022-02-11 10:53 UTC (permalink / raw)
  To: Pearson, Robert B, Christian Blume, RDMA mailing list; +Cc: krishna2

> -----Original Message-----
> From: Pearson, Robert B <robert.pearson2@hpe.com>
> Sent: Thursday, 10 February 2022 06:13
> To: Christian Blume <chr.blume@gmail.com>; RDMA mailing list <linux-
> rdma@vger.kernel.org>
> Subject: [EXTERNAL] RE: Soft-RoCE performance
> 
> Christian,
> 
> There are two key differences between TCP and soft RoCE. Most importantly
> TCP can use a 64KiB MTU which is fragmented by TSO or GSO if your NIC
> doesn't support TSO while soft RoCE is limited by the protocol to a 4KiB
> payload. With overhead for headers you need a link MTU of about 4096+256.
> If your application is going between soft RoCE and hard RoCE you have to
> live with this limit and compute ICRC on each packet. Checking is optional
> since RoCE packets have a crc32 checksum from ethernet. If you are using
> soft RoCE to soft RoCE you can ignore both ICRC calculations and with a
> patch increase the MTU above 4KiB. I have measured write performance up to
> around 35 GB/s in local loopback on a single 12 core box (AMD 3900x) using
> 12 IO threads, 16KB MTU, and ICRC disabled for 1MB messages. This is on
> head of tree with some patches not yet upstream.
> 
> Bob Pearson
> rpearsonhpe@gmail.com
> rpearson@hpe.com
> 
> 
> -----Original Message-----
> From: Christian Blume <chr.blume@gmail.com>
> Sent: Wednesday, February 9, 2022 9:34 PM
> To: RDMA mailing list <linux-rdma@vger.kernel.org>
> Subject: Soft-RoCE performance
> 
> Hello!
> 
> I am seeing that Soft-RoCE has much lower throughput than TCP. Is that
> expected? If not, are there typical config parameters I can fiddle with?
> 
> When running iperf I am getting around 300MB/s whereas it's only around
> 100MB/s using ib_write_bw from perftests.
> 
> This is between two machines running Ubuntu20.04 with the 5.11 kernel.
> 
> Cheers,
> Christian

It reminds me of a discussion we had a while ago - see https://patchwork.kernel.org/project/linux-rdma/patch/20200414144822.2365-1-bmt@zurich.ibm.com/

Running on TCP and implementing iWarp, siw suffers the same problem. Maybe
it makes sense looking into a generic solution to the problem for software
based RDMA implementations, potentially using the given RDMA core 
infrastructure?

Back then, we proposed using a spare protocol bit to do GSO signaling.
Krishna extended that idea to an MTU size negotiation using multiple spare
bits.

Another idea was to use the rdma netlink protocol for doing those settings.
That may also cover toggling CRC calculation. iWarp allows for that
negotiation, but there is no API. Control could be provided per interface,
or per QP ID, or both (I'd prefer). With the rxe driver coming up with a
similar thing, I tend to prefer such a generic solution, even if it further
complicates common man's RDMA usage.

What do other think?

Thanks,
Bernard.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Soft-RoCE performance
  2022-02-10 14:04   ` Yanjun Zhu
  2022-02-10 22:23     ` Christian Blume
@ 2022-02-11 17:48     ` Bob Pearson
  1 sibling, 0 replies; 7+ messages in thread
From: Bob Pearson @ 2022-02-11 17:48 UTC (permalink / raw)
  To: Yanjun Zhu, Pearson, Robert B, Christian Blume, RDMA mailing list

On 2/10/22 08:04, Yanjun Zhu wrote:

> 
> Thanks, I have also reached the big bandwidth with the same methods.
> How about latency of soft roce?
> 
> Zhu Yanjun

In loopback on my system with ib_write_lat I see the following. It isn't very exciting. The use case I am interested in doesn't require super low latency though.


 #bytes #iterations    t_min[usec]    t_max[usec]  t_typical[usec]    t_avg[usec]    t_stdev[usec]   99% percentile[usec]   99.9% percentile[usec] 

 2       1000          1.57           9.65         1.84     	       2.04        	0.69   		5.02    		9.65   

 4       1000          1.58           7.89         1.82     	       1.99        	0.54   		4.68    		7.89   

 8       1000          1.60           6.86         1.79     	       1.95        	0.51   		4.08    		6.86   

 16      1000          1.59           5.28         1.83     	       1.95        	0.42   		3.72    		5.28   

 32      1000          1.64           8.18         1.84     	       1.99        	0.52   		4.07    		8.18   

 64      1000          1.65           8.58         1.84     	       1.99        	0.48   		3.92    		8.58   

 128     1000          1.80           10.58        1.96     	       2.10        	0.47   		3.94    		10.58  

 256     1000          1.76           12.54        2.00     	       2.17        	0.52   		4.00    		12.54  

 512     1000          1.69           11.09        1.95     	       2.21        	0.66   		3.97    		11.09  

 1024    1000          1.80           12.91        2.00     	       2.22        	0.56   		3.89    		12.91  

 2048    1000          2.04           9.29         2.20     	       2.37        	0.50   		4.03    		9.29   

 4096    1000          2.39           11.66        2.58     	       2.76        	0.56   		4.67    		11.66  


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-02-11 17:48 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-10  3:33 Soft-RoCE performance Christian Blume
2022-02-10  5:13 ` Pearson, Robert B
2022-02-10  5:28   ` Pearson, Robert B
2022-02-10 14:04   ` Yanjun Zhu
2022-02-10 22:23     ` Christian Blume
2022-02-11 17:48     ` Bob Pearson
2022-02-11 10:53   ` Bernard Metzler

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.