All of lore.kernel.org
 help / color / mirror / Atom feed
* Is SCTP throughput really this low compared to TCP?
@ 2014-04-10 19:12 Butler, Peter
  2014-04-10 20:21 ` Vlad Yasevich
                   ` (35 more replies)
  0 siblings, 36 replies; 37+ messages in thread
From: Butler, Peter @ 2014-04-10 19:12 UTC (permalink / raw)
  To: linux-sctp

I've been testing SCTP throughput between two nodes over a 10Gb-Ethernet backplane, and am finding that at best, its throughput is about a third of that of TCP.  Is this number generally accepted for current LKSCTP performance?

All TCP/SCTP tests performed with 1000-byte (payload) messages, between 8-core Xeon nodes @ 2.13GHz, with no CPU throttling (always running at 100%) on otherwise idle systems.  Test applications include netperf, iperf and proprietary in-house stubs.

The latency between nodes is generally 0.2 ms.  Tests were run using this low-latency scenario, as well as using traffic control (tc) to simulate networks with 10 ms, 20 ms and 50 ms latency (i.e. 20 ms, 40 ms and 100 ms RTT, respectively).

In addition, each of these network scenarios were tested using various kernel socket buffer sizes, ranging from the default kernel size (100-200 kB), to several MB for send and receive buffers, and multiple send:receive ratios for these buffer sizes (generally using larger receive buffer sizes, up to a factor of about 6).

Finally, tests were performed on kernels as old as 3.4.2 and as recent as 3.14.

The TCP throughput is about 3x higher than that of SCTP as a best-case scenario (i.e. from an SCTP perspective), and much higher still in worst-case scenarios.



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Is SCTP throughput really this low compared to TCP?
  2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
@ 2014-04-10 20:21 ` Vlad Yasevich
  2014-04-10 20:40 ` Butler, Peter
                   ` (34 subsequent siblings)
  35 siblings, 0 replies; 37+ messages in thread
From: Vlad Yasevich @ 2014-04-10 20:21 UTC (permalink / raw)
  To: linux-sctp

On 04/10/2014 03:12 PM, Butler, Peter wrote:
> I've been testing SCTP throughput between two nodes over a 10Gb-Ethernet backplane, and am finding that at best, its throughput is about a third of that of TCP.  Is this number generally accepted for current LKSCTP performance?
> 
> All TCP/SCTP tests performed with 1000-byte (payload) messages, between 8-core Xeon nodes @ 2.13GHz, with no CPU throttling (always running at 100%) on otherwise idle systems.  Test applications include netperf, iperf and proprietary in-house stubs.
> 
> The latency between nodes is generally 0.2 ms.  Tests were run using this low-latency scenario, as well as using traffic control (tc) to simulate networks with 10 ms, 20 ms and 50 ms latency (i.e. 20 ms, 40 ms and 100 ms RTT, respectively).
> 
> In addition, each of these network scenarios were tested using various kernel socket buffer sizes, ranging from the default kernel size (100-200 kB), to several MB for send and receive buffers, and multiple send:receive ratios for these buffer sizes (generally using larger receive buffer sizes, up to a factor of about 6).
> 
> Finally, tests were performed on kernels as old as 3.4.2 and as recent as 3.14.
> 
> The TCP throughput is about 3x higher than that of SCTP as a best-case scenario (i.e. from an SCTP perspective), and much higher still in worst-case scenarios.
> 
> 

To do a more of apples-to-apples comparison, you need to disable tso/gso
on the sending node.

The reason is that even if you limit buffer sizes, tcp will still try to
do tso on the transmit size, thus coalescing you 1000-byte messages into
something much larger, thus utilizing your MTU much more efficiently.

SCTP, on the other hand, has to preserve message boundaries which
results in sub-optimal mtu utilization when using 1000-byte payloads.

My recommendation is to use 1464 byte message for SCTP on a 1500 byte
MTU nic.

I would be interested to see the results.  There could very well be issues.

-vlad



^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: Is SCTP throughput really this low compared to TCP?
  2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
  2014-04-10 20:21 ` Vlad Yasevich
@ 2014-04-10 20:40 ` Butler, Peter
  2014-04-10 21:00 ` Vlad Yasevich
                   ` (33 subsequent siblings)
  35 siblings, 0 replies; 37+ messages in thread
From: Butler, Peter @ 2014-04-10 20:40 UTC (permalink / raw)
  To: linux-sctp

Thanks - I will give that a try.  What about generic-receive-offload and large-receive-offload ?

-----Original Message-----
From: Vlad Yasevich [mailto:vyasevich@gmail.com] 
Sent: April-10-14 4:21 PM
To: Butler, Peter; linux-sctp@vger.kernel.org
Subject: Re: Is SCTP throughput really this low compared to TCP?

On 04/10/2014 03:12 PM, Butler, Peter wrote:
> I've been testing SCTP throughput between two nodes over a 10Gb-Ethernet backplane, and am finding that at best, its throughput is about a third of that of TCP.  Is this number generally accepted for current LKSCTP performance?
> 
> All TCP/SCTP tests performed with 1000-byte (payload) messages, between 8-core Xeon nodes @ 2.13GHz, with no CPU throttling (always running at 100%) on otherwise idle systems.  Test applications include netperf, iperf and proprietary in-house stubs.
> 
> The latency between nodes is generally 0.2 ms.  Tests were run using this low-latency scenario, as well as using traffic control (tc) to simulate networks with 10 ms, 20 ms and 50 ms latency (i.e. 20 ms, 40 ms and 100 ms RTT, respectively).
> 
> In addition, each of these network scenarios were tested using various kernel socket buffer sizes, ranging from the default kernel size (100-200 kB), to several MB for send and receive buffers, and multiple send:receive ratios for these buffer sizes (generally using larger receive buffer sizes, up to a factor of about 6).
> 
> Finally, tests were performed on kernels as old as 3.4.2 and as recent as 3.14.
> 
> The TCP throughput is about 3x higher than that of SCTP as a best-case scenario (i.e. from an SCTP perspective), and much higher still in worst-case scenarios.
> 
> 

To do a more of apples-to-apples comparison, you need to disable tso/gso on the sending node.

The reason is that even if you limit buffer sizes, tcp will still try to do tso on the transmit size, thus coalescing you 1000-byte messages into something much larger, thus utilizing your MTU much more efficiently.

SCTP, on the other hand, has to preserve message boundaries which results in sub-optimal mtu utilization when using 1000-byte payloads.

My recommendation is to use 1464 byte message for SCTP on a 1500 byte MTU nic.

I would be interested to see the results.  There could very well be issues.

-vlad



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Is SCTP throughput really this low compared to TCP?
  2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
  2014-04-10 20:21 ` Vlad Yasevich
  2014-04-10 20:40 ` Butler, Peter
@ 2014-04-10 21:00 ` Vlad Yasevich
  2014-04-11  7:42 ` Daniel Borkmann
                   ` (32 subsequent siblings)
  35 siblings, 0 replies; 37+ messages in thread
From: Vlad Yasevich @ 2014-04-10 21:00 UTC (permalink / raw)
  To: linux-sctp

On 04/10/2014 04:40 PM, Butler, Peter wrote:
> Thanks - I will give that a try.  What about generic-receive-offload and large-receive-offload ?

They do help tcp a bit by allowing it to ack more data in one shot.
If they are on, might make sense to turn them off.

I suppose sctp could benefit from GRO a bit...

-vlad

> 
> -----Original Message-----
> From: Vlad Yasevich [mailto:vyasevich@gmail.com] 
> Sent: April-10-14 4:21 PM
> To: Butler, Peter; linux-sctp@vger.kernel.org
> Subject: Re: Is SCTP throughput really this low compared to TCP?
> 
> On 04/10/2014 03:12 PM, Butler, Peter wrote:
>> I've been testing SCTP throughput between two nodes over a 10Gb-Ethernet backplane, and am finding that at best, its throughput is about a third of that of TCP.  Is this number generally accepted for current LKSCTP performance?
>>
>> All TCP/SCTP tests performed with 1000-byte (payload) messages, between 8-core Xeon nodes @ 2.13GHz, with no CPU throttling (always running at 100%) on otherwise idle systems.  Test applications include netperf, iperf and proprietary in-house stubs.
>>
>> The latency between nodes is generally 0.2 ms.  Tests were run using this low-latency scenario, as well as using traffic control (tc) to simulate networks with 10 ms, 20 ms and 50 ms latency (i.e. 20 ms, 40 ms and 100 ms RTT, respectively).
>>
>> In addition, each of these network scenarios were tested using various kernel socket buffer sizes, ranging from the default kernel size (100-200 kB), to several MB for send and receive buffers, and multiple send:receive ratios for these buffer sizes (generally using larger receive buffer sizes, up to a factor of about 6).
>>
>> Finally, tests were performed on kernels as old as 3.4.2 and as recent as 3.14.
>>
>> The TCP throughput is about 3x higher than that of SCTP as a best-case scenario (i.e. from an SCTP perspective), and much higher still in worst-case scenarios.
>>
>>
> 
> To do a more of apples-to-apples comparison, you need to disable tso/gso on the sending node.
> 
> The reason is that even if you limit buffer sizes, tcp will still try to do tso on the transmit size, thus coalescing you 1000-byte messages into something much larger, thus utilizing your MTU much more efficiently.
> 
> SCTP, on the other hand, has to preserve message boundaries which results in sub-optimal mtu utilization when using 1000-byte payloads.
> 
> My recommendation is to use 1464 byte message for SCTP on a 1500 byte MTU nic.
> 
> I would be interested to see the results.  There could very well be issues.
> 
> -vlad
> 
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Is SCTP throughput really this low compared to TCP?
  2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
                   ` (2 preceding siblings ...)
  2014-04-10 21:00 ` Vlad Yasevich
@ 2014-04-11  7:42 ` Daniel Borkmann
  2014-04-11 15:07 ` Butler, Peter
                   ` (31 subsequent siblings)
  35 siblings, 0 replies; 37+ messages in thread
From: Daniel Borkmann @ 2014-04-11  7:42 UTC (permalink / raw)
  To: linux-sctp

Hi Peter,

On 04/10/2014 10:21 PM, Vlad Yasevich wrote:
> On 04/10/2014 03:12 PM, Butler, Peter wrote:
>> I've been testing SCTP throughput between two nodes over a 10Gb-Ethernet backplane, and am finding that at best, its throughput is about a third of that of TCP.  Is this number generally accepted for current LKSCTP performance?
>>
>> All TCP/SCTP tests performed with 1000-byte (payload) messages, between 8-core Xeon nodes @ 2.13GHz, with no CPU throttling (always running at 100%) on otherwise idle systems.  Test applications include netperf, iperf and proprietary in-house stubs.
>>
>> The latency between nodes is generally 0.2 ms.  Tests were run using this low-latency scenario, as well as using traffic control (tc) to simulate networks with 10 ms, 20 ms and 50 ms latency (i.e. 20 ms, 40 ms and 100 ms RTT, respectively).
>>
>> In addition, each of these network scenarios were tested using various kernel socket buffer sizes, ranging from the default kernel size (100-200 kB), to several MB for send and receive buffers, and multiple send:receive ratios for these buffer sizes (generally using larger receive buffer sizes, up to a factor of about 6).
>>
>> Finally, tests were performed on kernels as old as 3.4.2 and as recent as 3.14.
>>
>> The TCP throughput is about 3x higher than that of SCTP as a best-case scenario (i.e. from an SCTP perspective), and much higher still in worst-case scenarios.
>
> To do a more of apples-to-apples comparison, you need to disable tso/gso
> on the sending node.
>
> The reason is that even if you limit buffer sizes, tcp will still try to
> do tso on the transmit size, thus coalescing you 1000-byte messages into
> something much larger, thus utilizing your MTU much more efficiently.
>
> SCTP, on the other hand, has to preserve message boundaries which
> results in sub-optimal mtu utilization when using 1000-byte payloads.
>
> My recommendation is to use 1464 byte message for SCTP on a 1500 byte
> MTU nic.
>
> I would be interested to see the results.  There could very well be issues.

Agreed.

Also, what NIC are you using? It seems only Intel provides SCTP checksum
offloading so far, i.e. ixgbe/i40e NICs.

> -vlad

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: Is SCTP throughput really this low compared to TCP?
  2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
                   ` (3 preceding siblings ...)
  2014-04-11  7:42 ` Daniel Borkmann
@ 2014-04-11 15:07 ` Butler, Peter
  2014-04-11 15:21 ` Daniel Borkmann
                   ` (30 subsequent siblings)
  35 siblings, 0 replies; 37+ messages in thread
From: Butler, Peter @ 2014-04-11 15:07 UTC (permalink / raw)
  To: linux-sctp

Yes indeed this is ixgbe and I do have SCTP checksum offloading enabled.  (For what it's worth, the checksum offload gives about a 20% throughput gain - but this is, of course, already included in the numbers I posted to this thread as I've been using the CRC offload all along.)

I re-did all the tests with TSO/GSO/LRO/GRO disabled (on both sides of the association - i.e. on both endpoint nodes), and using 1452-byte messages instead of 1000-byte messages.  With this new setup, the TCP performance drops significantly, as expected, while the SCTP performance is boosted, and the playing field is somewhat more 'level'.   (Note that I could not use 1464-byte messages as suggested by Vlad, as anything above 1452 cut the SCTP performance in half - must have hit the segmentation limit at this slightly lower message size.  MTU is 1500.)

So comparing "apples to apples" now, TCP only out-performs SCTP by approximately 40-70% over the various range of network latencies I tested with (RTTs of 0.2 ms, 10 ms, 20 ms, and 50 ms).  40-70% is still significant, but nowhere near the 200% better (i.e. 3 times the throughput) I was getting before.

Does this value (i.e. 40-70%) sound reasonable?  Is this the more-or-less accepted performance difference with the current LKSCTP implementation?

Also, for what it's worth, I get better SCTP throughput numbers with the older kernel (3.4.2) than with the newer kernel (3.14)...



-----Original Message-----
From: Daniel Borkmann [mailto:dborkman@redhat.com] 
Sent: April-11-14 3:43 AM
To: Vlad Yasevich
Cc: Butler, Peter; linux-sctp@vger.kernel.org
Subject: Re: Is SCTP throughput really this low compared to TCP?

Hi Peter,

On 04/10/2014 10:21 PM, Vlad Yasevich wrote:
> On 04/10/2014 03:12 PM, Butler, Peter wrote:
>> I've been testing SCTP throughput between two nodes over a 10Gb-Ethernet backplane, and am finding that at best, its throughput is about a third of that of TCP.  Is this number generally accepted for current LKSCTP performance?
>>
>> All TCP/SCTP tests performed with 1000-byte (payload) messages, between 8-core Xeon nodes @ 2.13GHz, with no CPU throttling (always running at 100%) on otherwise idle systems.  Test applications include netperf, iperf and proprietary in-house stubs.
>>
>> The latency between nodes is generally 0.2 ms.  Tests were run using this low-latency scenario, as well as using traffic control (tc) to simulate networks with 10 ms, 20 ms and 50 ms latency (i.e. 20 ms, 40 ms and 100 ms RTT, respectively).
>>
>> In addition, each of these network scenarios were tested using various kernel socket buffer sizes, ranging from the default kernel size (100-200 kB), to several MB for send and receive buffers, and multiple send:receive ratios for these buffer sizes (generally using larger receive buffer sizes, up to a factor of about 6).
>>
>> Finally, tests were performed on kernels as old as 3.4.2 and as recent as 3.14.
>>
>> The TCP throughput is about 3x higher than that of SCTP as a best-case scenario (i.e. from an SCTP perspective), and much higher still in worst-case scenarios.
>
> To do a more of apples-to-apples comparison, you need to disable 
> tso/gso on the sending node.
>
> The reason is that even if you limit buffer sizes, tcp will still try 
> to do tso on the transmit size, thus coalescing you 1000-byte messages 
> into something much larger, thus utilizing your MTU much more efficiently.
>
> SCTP, on the other hand, has to preserve message boundaries which 
> results in sub-optimal mtu utilization when using 1000-byte payloads.
>
> My recommendation is to use 1464 byte message for SCTP on a 1500 byte 
> MTU nic.
>
> I would be interested to see the results.  There could very well be issues.

Agreed.

Also, what NIC are you using? It seems only Intel provides SCTP checksum offloading so far, i.e. ixgbe/i40e NICs.

> -vlad

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Is SCTP throughput really this low compared to TCP?
  2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
                   ` (4 preceding siblings ...)
  2014-04-11 15:07 ` Butler, Peter
@ 2014-04-11 15:21 ` Daniel Borkmann
  2014-04-11 15:27 ` Vlad Yasevich
                   ` (29 subsequent siblings)
  35 siblings, 0 replies; 37+ messages in thread
From: Daniel Borkmann @ 2014-04-11 15:21 UTC (permalink / raw)
  To: linux-sctp

On 04/11/2014 05:07 PM, Butler, Peter wrote:
> Yes indeed this is ixgbe and I do have SCTP checksum offloading enabled.  (For what
 > it's worth, the checksum offload gives about a 20% throughput gain - but this is,
 > of course, already included in the numbers I posted to this thread as I've been using
 > the CRC offload all along.)

Ok, understood.

> I re-did all the tests with TSO/GSO/LRO/GRO disabled (on both sides of the association -
 > i.e. on both endpoint nodes), and using 1452-byte messages instead of 1000-byte
 > messages.  With this new setup, the TCP performance drops significantly, as expected,
 > while the SCTP performance is boosted, and the playing field is somewhat more 'level'.
 > (Note that I could not use 1464-byte messages as suggested by Vlad, as anything above
 > 1452 cut the SCTP performance in half - must have hit the segmentation limit at this
 > slightly lower message size.  MTU is 1500.)
>
> So comparing "apples to apples" now, TCP only out-performs SCTP by approximately 40-70%
 > over the various range of network latencies I tested with (RTTs of 0.2 ms, 10 ms, 20 ms,
 > and 50 ms).  40-70% is still significant, but nowhere near the 200% better (i.e. 3
 > times the throughput) I was getting before.
>
> Does this value (i.e. 40-70%) sound reasonable?  Is this the more-or-less accepted
 > performance difference with the current LKSCTP implementation?

Yes, that sounds reasonable to me. There are still a lot of open todos in terms of
performance that we need to tackle over time, e.g. the way chunks are handled, imho,
and copies involved in fast path, also that we're heavily using atomic reference
counting, and other open issues.

> Also, for what it's worth, I get better SCTP throughput numbers with the older kernel
 > (3.4.2) than with the newer kernel (3.14)...

Interesting, a lot of things happened in between; were you able to bisect/identify
a possible commit that causes this? How big is the difference?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Is SCTP throughput really this low compared to TCP?
  2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
                   ` (5 preceding siblings ...)
  2014-04-11 15:21 ` Daniel Borkmann
@ 2014-04-11 15:27 ` Vlad Yasevich
  2014-04-11 15:35 ` Daniel Borkmann
                   ` (28 subsequent siblings)
  35 siblings, 0 replies; 37+ messages in thread
From: Vlad Yasevich @ 2014-04-11 15:27 UTC (permalink / raw)
  To: linux-sctp

On 04/11/2014 11:07 AM, Butler, Peter wrote:
> Yes indeed this is ixgbe and I do have SCTP checksum offloading enabled.  (For what it's worth, the checksum offload gives about a 20% throughput gain - but this is, of course, already included in the numbers I posted to this thread as I've been using the CRC offload all along.)
> 
> I re-did all the tests with TSO/GSO/LRO/GRO disabled (on both sides of the association - i.e. on both endpoint nodes), and using 1452-byte messages instead of 1000-byte messages.  With this new setup, the TCP performance drops significantly, as expected, while the SCTP performance is boosted, and the playing field is somewhat more 'level'.   (Note that I could not use 1464-byte messages as suggested by Vlad, as anything above 1452 cut the SCTP performance in half - must have hit the segmentation limit at this slightly lower message size.  MTU is 1500.)
> 
> So comparing "apples to apples" now, TCP only out-performs SCTP by approximately 40-70% over the various range of network latencies I tested with (RTTs of 0.2 ms, 10 ms, 20 ms, and 50 ms).  40-70% is still significant, but nowhere near the 200% better (i.e. 3 times the throughput) I was getting before.
> 
> Does this value (i.e. 40-70%) sound reasonable? 

This still looks high.  Could you run 'perf record -a' and 'perf report'
to see where we are spending all of our time in sctp.

My guess is that a lot of it is going to be in memcpy(), but I am
curious.

> Is this the more-or-less accepted performance difference with the current LKSCTP implementation?
> 
> Also, for what it's worth, I get better SCTP throughput numbers with the older kernel (3.4.2) than with the newer kernel (3.14)...
> 

That's interesting.  I'll have to look at see what might have changed here.

-vlad

> 
> 
> -----Original Message-----
> From: Daniel Borkmann [mailto:dborkman@redhat.com] 
> Sent: April-11-14 3:43 AM
> To: Vlad Yasevich
> Cc: Butler, Peter; linux-sctp@vger.kernel.org
> Subject: Re: Is SCTP throughput really this low compared to TCP?
> 
> Hi Peter,
> 
> On 04/10/2014 10:21 PM, Vlad Yasevich wrote:
>> On 04/10/2014 03:12 PM, Butler, Peter wrote:
>>> I've been testing SCTP throughput between two nodes over a 10Gb-Ethernet backplane, and am finding that at best, its throughput is about a third of that of TCP.  Is this number generally accepted for current LKSCTP performance?
>>>
>>> All TCP/SCTP tests performed with 1000-byte (payload) messages, between 8-core Xeon nodes @ 2.13GHz, with no CPU throttling (always running at 100%) on otherwise idle systems.  Test applications include netperf, iperf and proprietary in-house stubs.
>>>
>>> The latency between nodes is generally 0.2 ms.  Tests were run using this low-latency scenario, as well as using traffic control (tc) to simulate networks with 10 ms, 20 ms and 50 ms latency (i.e. 20 ms, 40 ms and 100 ms RTT, respectively).
>>>
>>> In addition, each of these network scenarios were tested using various kernel socket buffer sizes, ranging from the default kernel size (100-200 kB), to several MB for send and receive buffers, and multiple send:receive ratios for these buffer sizes (generally using larger receive buffer sizes, up to a factor of about 6).
>>>
>>> Finally, tests were performed on kernels as old as 3.4.2 and as recent as 3.14.
>>>
>>> The TCP throughput is about 3x higher than that of SCTP as a best-case scenario (i.e. from an SCTP perspective), and much higher still in worst-case scenarios.
>>
>> To do a more of apples-to-apples comparison, you need to disable 
>> tso/gso on the sending node.
>>
>> The reason is that even if you limit buffer sizes, tcp will still try 
>> to do tso on the transmit size, thus coalescing you 1000-byte messages 
>> into something much larger, thus utilizing your MTU much more efficiently.
>>
>> SCTP, on the other hand, has to preserve message boundaries which 
>> results in sub-optimal mtu utilization when using 1000-byte payloads.
>>
>> My recommendation is to use 1464 byte message for SCTP on a 1500 byte 
>> MTU nic.
>>
>> I would be interested to see the results.  There could very well be issues.
> 
> Agreed.
> 
> Also, what NIC are you using? It seems only Intel provides SCTP checksum offloading so far, i.e. ixgbe/i40e NICs.
> 
>> -vlad


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Is SCTP throughput really this low compared to TCP?
  2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
                   ` (6 preceding siblings ...)
  2014-04-11 15:27 ` Vlad Yasevich
@ 2014-04-11 15:35 ` Daniel Borkmann
  2014-04-11 18:19 ` Vlad Yasevich
                   ` (27 subsequent siblings)
  35 siblings, 0 replies; 37+ messages in thread
From: Daniel Borkmann @ 2014-04-11 15:35 UTC (permalink / raw)
  To: linux-sctp

On 04/11/2014 05:27 PM, Vlad Yasevich wrote:
> On 04/11/2014 11:07 AM, Butler, Peter wrote:
>> Yes indeed this is ixgbe and I do have SCTP checksum offloading enabled.  (For what it's worth, the checksum offload gives about a 20% throughput gain - but this is, of course, already included in the numbers I posted to this thread as I've been using the CRC offload all along.)
>>
>> I re-did all the tests with TSO/GSO/LRO/GRO disabled (on both sides of the association - i.e. on both endpoint nodes), and using 1452-byte messages instead of 1000-byte messages.  With this new setup, the TCP performance drops significantly, as expected, while the SCTP performance is boosted, and the playing field is somewhat more 'level'.   (Note that I could not use 1464-byte messages as suggested by Vlad, as anything above 1452 cut the SCTP performance in half - must have hit the segmentation limit at this slightly lower message size.  MTU is 1500.)
>>
>> So comparing "apples to apples" now, TCP only out-performs SCTP by approximately 40-70% over the various range of network latencies I tested with (RTTs of 0.2 ms, 10 ms, 20 ms, and 50 ms).  40-70% is still significant, but nowhere near the 200% better (i.e. 3 times the throughput) I was getting before.
>>
>> Does this value (i.e. 40-70%) sound reasonable?
>
> This still looks high.  Could you run 'perf record -a' and 'perf report'
> to see where we are spending all of our time in sctp.

+1

> My guess is that a lot of it is going to be in memcpy(), but I am
> curious.
>
>> Is this the more-or-less accepted performance difference with the current LKSCTP implementation?
>>
>> Also, for what it's worth, I get better SCTP throughput numbers with the older kernel (3.4.2) than with the newer kernel (3.14)...
>>
>
> That's interesting.  I'll have to look at see what might have changed here.

I remember in Fengguang tests reporting about the one below (from 3.11),
but the starting baseline was already quite low ...

commit ef2820a735f74ea60335f8ba3801b844f0cb184d
Author: Matija Glavinic Pecotic <matija.glavinic-pecotic.ext@nsn.com>
Date:   Fri Feb 14 14:51:18 2014 +0100

     net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer

> -vlad
>
>>
>>
>> -----Original Message-----
>> From: Daniel Borkmann [mailto:dborkman@redhat.com]
>> Sent: April-11-14 3:43 AM
>> To: Vlad Yasevich
>> Cc: Butler, Peter; linux-sctp@vger.kernel.org
>> Subject: Re: Is SCTP throughput really this low compared to TCP?
>>
>> Hi Peter,
>>
>> On 04/10/2014 10:21 PM, Vlad Yasevich wrote:
>>> On 04/10/2014 03:12 PM, Butler, Peter wrote:
>>>> I've been testing SCTP throughput between two nodes over a 10Gb-Ethernet backplane, and am finding that at best, its throughput is about a third of that of TCP.  Is this number generally accepted for current LKSCTP performance?
>>>>
>>>> All TCP/SCTP tests performed with 1000-byte (payload) messages, between 8-core Xeon nodes @ 2.13GHz, with no CPU throttling (always running at 100%) on otherwise idle systems.  Test applications include netperf, iperf and proprietary in-house stubs.
>>>>
>>>> The latency between nodes is generally 0.2 ms.  Tests were run using this low-latency scenario, as well as using traffic control (tc) to simulate networks with 10 ms, 20 ms and 50 ms latency (i.e. 20 ms, 40 ms and 100 ms RTT, respectively).
>>>>
>>>> In addition, each of these network scenarios were tested using various kernel socket buffer sizes, ranging from the default kernel size (100-200 kB), to several MB for send and receive buffers, and multiple send:receive ratios for these buffer sizes (generally using larger receive buffer sizes, up to a factor of about 6).
>>>>
>>>> Finally, tests were performed on kernels as old as 3.4.2 and as recent as 3.14.
>>>>
>>>> The TCP throughput is about 3x higher than that of SCTP as a best-case scenario (i.e. from an SCTP perspective), and much higher still in worst-case scenarios.
>>>
>>> To do a more of apples-to-apples comparison, you need to disable
>>> tso/gso on the sending node.
>>>
>>> The reason is that even if you limit buffer sizes, tcp will still try
>>> to do tso on the transmit size, thus coalescing you 1000-byte messages
>>> into something much larger, thus utilizing your MTU much more efficiently.
>>>
>>> SCTP, on the other hand, has to preserve message boundaries which
>>> results in sub-optimal mtu utilization when using 1000-byte payloads.
>>>
>>> My recommendation is to use 1464 byte message for SCTP on a 1500 byte
>>> MTU nic.
>>>
>>> I would be interested to see the results.  There could very well be issues.
>>
>> Agreed.
>>
>> Also, what NIC are you using? It seems only Intel provides SCTP checksum offloading so far, i.e. ixgbe/i40e NICs.
>>
>>> -vlad
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Is SCTP throughput really this low compared to TCP?
  2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
                   ` (7 preceding siblings ...)
  2014-04-11 15:35 ` Daniel Borkmann
@ 2014-04-11 18:19 ` Vlad Yasevich
  2014-04-11 18:22 ` Butler, Peter
                   ` (26 subsequent siblings)
  35 siblings, 0 replies; 37+ messages in thread
From: Vlad Yasevich @ 2014-04-11 18:19 UTC (permalink / raw)
  To: linux-sctp

On 04/11/2014 11:07 AM, Butler, Peter wrote:
> Yes indeed this is ixgbe and I do have SCTP checksum offloading enabled.  (For what it's worth, the checksum offload gives about a 20% throughput gain - but this is, of course, already included in the numbers I posted to this thread as I've been using the CRC offload all along.)
> 
> I re-did all the tests with TSO/GSO/LRO/GRO disabled (on both sides of the association - i.e. on both endpoint nodes), and using 1452-byte messages instead of 1000-byte messages.  With this new setup, the TCP performance drops significantly, as expected, while the SCTP performance is boosted, and the playing field is somewhat more 'level'.   (Note that I could not use 1464-byte messages as suggested by Vlad, as anything above 1452 cut the SCTP performance in half - must have hit the segmentation limit at this slightly lower message size.  MTU is 1500.)
> 
> So comparing "apples to apples" now, TCP only out-performs SCTP by approximately 40-70% over the various range of network latencies I tested with (RTTs of 0.2 ms, 10 ms, 20 ms, and 50 ms).  40-70% is still significant, but nowhere near the 200% better (i.e. 3 times the throughput) I was getting before.
> 
> Does this value (i.e. 40-70%) sound reasonable?  Is this the more-or-less accepted performance difference with the current LKSCTP implementation?
> 
> Also, for what it's worth, I get better SCTP throughput numbers with the older kernel (3.4.2) than with the newer kernel (3.14)...
> 

Hi Peter

Could you run an experiment setting max_burst to 0?

The way the SCTP spec is written, upon every SACK, if the
stack has new data to send, it will only sent max_burst*mtu
bytes.  This may not always fill the current congestion window
and might preclude growth.

Setting max_burst to 0 will disable burst limitation.  I am
curious to see if this would impact throughput on a low-rtt
link like yours.

-vlad


> 
> 
> -----Original Message-----
> From: Daniel Borkmann [mailto:dborkman@redhat.com] 
> Sent: April-11-14 3:43 AM
> To: Vlad Yasevich
> Cc: Butler, Peter; linux-sctp@vger.kernel.org
> Subject: Re: Is SCTP throughput really this low compared to TCP?
> 
> Hi Peter,
> 
> On 04/10/2014 10:21 PM, Vlad Yasevich wrote:
>> On 04/10/2014 03:12 PM, Butler, Peter wrote:
>>> I've been testing SCTP throughput between two nodes over a 10Gb-Ethernet backplane, and am finding that at best, its throughput is about a third of that of TCP.  Is this number generally accepted for current LKSCTP performance?
>>>
>>> All TCP/SCTP tests performed with 1000-byte (payload) messages, between 8-core Xeon nodes @ 2.13GHz, with no CPU throttling (always running at 100%) on otherwise idle systems.  Test applications include netperf, iperf and proprietary in-house stubs.
>>>
>>> The latency between nodes is generally 0.2 ms.  Tests were run using this low-latency scenario, as well as using traffic control (tc) to simulate networks with 10 ms, 20 ms and 50 ms latency (i.e. 20 ms, 40 ms and 100 ms RTT, respectively).
>>>
>>> In addition, each of these network scenarios were tested using various kernel socket buffer sizes, ranging from the default kernel size (100-200 kB), to several MB for send and receive buffers, and multiple send:receive ratios for these buffer sizes (generally using larger receive buffer sizes, up to a factor of about 6).
>>>
>>> Finally, tests were performed on kernels as old as 3.4.2 and as recent as 3.14.
>>>
>>> The TCP throughput is about 3x higher than that of SCTP as a best-case scenario (i.e. from an SCTP perspective), and much higher still in worst-case scenarios.
>>
>> To do a more of apples-to-apples comparison, you need to disable 
>> tso/gso on the sending node.
>>
>> The reason is that even if you limit buffer sizes, tcp will still try 
>> to do tso on the transmit size, thus coalescing you 1000-byte messages 
>> into something much larger, thus utilizing your MTU much more efficiently.
>>
>> SCTP, on the other hand, has to preserve message boundaries which 
>> results in sub-optimal mtu utilization when using 1000-byte payloads.
>>
>> My recommendation is to use 1464 byte message for SCTP on a 1500 byte 
>> MTU nic.
>>
>> I would be interested to see the results.  There could very well be issues.
> 
> Agreed.
> 
> Also, what NIC are you using? It seems only Intel provides SCTP checksum offloading so far, i.e. ixgbe/i40e NICs.
> 
>> -vlad


^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: Is SCTP throughput really this low compared to TCP?
  2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
                   ` (8 preceding siblings ...)
  2014-04-11 18:19 ` Vlad Yasevich
@ 2014-04-11 18:22 ` Butler, Peter
  2014-04-11 18:40 ` Daniel Borkmann
                   ` (25 subsequent siblings)
  35 siblings, 0 replies; 37+ messages in thread
From: Butler, Peter @ 2014-04-11 18:22 UTC (permalink / raw)
  To: linux-sctp

The difference between 3.14 and 3.4.2 is staggering.  An order of magnitude or so.  For example, using the precisely same setup as before, whereas I get about 2.1 Gbps throughput with 3.4 2, I can only manage between 70-150 Mbps with 3.14 - a staggering difference.

Moreover, the SCTP throughput seems to 'choke' itself with 3.14, such that it is always trying to recover.  For example, with 3.4.2 the 2.1 Gbps throughput is quite consistent from one second to the next (as you would expect):

[root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t 60
iperf version 3.0.1 (10 January 2014)
Linux Lab200slot2 3.4.2-1.fc16.x86_64 #1 SMP Thu Jun 14 20:17:26 UTC 2012 x86_64
Time: Fri, 11 Apr 2014 18:19:15 GMT
Connecting to host 192.168.241.3, port 5201
      Cookie: Lab200slot2.1397240355.069035.0d5b0f
[  4] local 192.168.241.2 port 56030 connected to 192.168.241.3 port 5201
Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec   255 MBytes  2.14 Gbits/sec
[  4]   1.00-2.00   sec   253 MBytes  2.12 Gbits/sec
[  4]   2.00-3.00   sec   255 MBytes  2.14 Gbits/sec
[  4]   3.00-4.00   sec   255 MBytes  2.14 Gbits/sec
[  4]   4.00-5.00   sec   255 MBytes  2.14 Gbits/sec
[  4]   5.00-6.00   sec   257 MBytes  2.15 Gbits/sec
[  4]   6.00-7.00   sec   253 MBytes  2.13 Gbits/sec
[  4]   7.00-8.00   sec   254 MBytes  2.13 Gbits/sec
[  4]   8.00-9.00   sec   255 MBytes  2.14 Gbits/sec
[  4]   9.00-10.00  sec   252 MBytes  2.12 Gbits/sec
(etc)

but with 3.14 the numbers as all over the place:

[root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t 60
iperf version 3.0.1 (10 January 2014)
Linux Lab200slot2 3.14.0 #1 SMP Thu Apr 3 23:18:29 EDT 2014 x86_64
Time: Fri, 11 Apr 2014 17:56:21 GMT
Connecting to host 192.168.241.3, port 5201
      Cookie: Lab200slot2.1397238981.812898.548918
[  4] local 192.168.241.2 port 38616 connected to 192.168.241.3 port 5201
Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.09   sec  20.8 MBytes   161 Mbits/sec
[  4]   1.09-2.13   sec  10.8 MBytes  86.8 Mbits/sec
[  4]   2.13-3.15   sec  3.57 MBytes  29.5 Mbits/sec
[  4]   3.15-4.16   sec  4.33 MBytes  35.7 Mbits/sec
[  4]   4.16-6.21   sec  10.4 MBytes  42.7 Mbits/sec
[  4]   6.21-6.21   sec  0.00 Bytes  0.00 bits/sec
[  4]   6.21-7.35   sec  34.6 MBytes   253 Mbits/sec
[  4]   7.35-11.45  sec  22.0 MBytes  45.0 Mbits/sec
[  4]  11.45-11.45  sec  0.00 Bytes  0.00 bits/sec
[  4]  11.45-11.45  sec  0.00 Bytes  0.00 bits/sec
[  4]  11.45-11.45  sec  0.00 Bytes  0.00 bits/sec
[  4]  11.45-12.51  sec  16.0 MBytes   126 Mbits/sec
[  4]  12.51-13.59  sec  20.3 MBytes   158 Mbits/sec
[  4]  13.59-14.65  sec  13.4 MBytes   107 Mbits/sec
[  4]  14.65-16.79  sec  33.3 MBytes   130 Mbits/sec
[  4]  16.79-16.79  sec  0.00 Bytes  0.00 bits/sec
[  4]  16.79-17.82  sec  5.94 MBytes  48.7 Mbits/sec
(etc)

Note: the difference appears to be SCTP-specific, as I get exactly the same TCP throughput in both kernels. 




-----Original Message-----
From: Daniel Borkmann [mailto:dborkman@redhat.com] 
Sent: April-11-14 11:22 AM
To: Butler, Peter
Cc: Vlad Yasevich; linux-sctp@vger.kernel.org
Subject: Re: Is SCTP throughput really this low compared to TCP?

On 04/11/2014 05:07 PM, Butler, Peter wrote:
> Yes indeed this is ixgbe and I do have SCTP checksum offloading 
> enabled.  (For what
 > it's worth, the checksum offload gives about a 20% throughput gain - but this is,  > of course, already included in the numbers I posted to this thread as I've been using  > the CRC offload all along.)

Ok, understood.

> I re-did all the tests with TSO/GSO/LRO/GRO disabled (on both sides of 
> the association -
 > i.e. on both endpoint nodes), and using 1452-byte messages instead of 1000-byte  > messages.  With this new setup, the TCP performance drops significantly, as expected,  > while the SCTP performance is boosted, and the playing field is somewhat more 'level'.
 > (Note that I could not use 1464-byte messages as suggested by Vlad, as anything above  > 1452 cut the SCTP performance in half - must have hit the segmentation limit at this  > slightly lower message size.  MTU is 1500.)
>
> So comparing "apples to apples" now, TCP only out-performs SCTP by 
> approximately 40-70%
 > over the various range of network latencies I tested with (RTTs of 0.2 ms, 10 ms, 20 ms,  > and 50 ms).  40-70% is still significant, but nowhere near the 200% better (i.e. 3  > times the throughput) I was getting before.
>
> Does this value (i.e. 40-70%) sound reasonable?  Is this the 
> more-or-less accepted
 > performance difference with the current LKSCTP implementation?

Yes, that sounds reasonable to me. There are still a lot of open todos in terms of performance that we need to tackle over time, e.g. the way chunks are handled, imho, and copies involved in fast path, also that we're heavily using atomic reference counting, and other open issues.

> Also, for what it's worth, I get better SCTP throughput numbers with 
> the older kernel
 > (3.4.2) than with the newer kernel (3.14)...

Interesting, a lot of things happened in between; were you able to bisect/identify a possible commit that causes this? How big is the difference?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Is SCTP throughput really this low compared to TCP?
  2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
                   ` (9 preceding siblings ...)
  2014-04-11 18:22 ` Butler, Peter
@ 2014-04-11 18:40 ` Daniel Borkmann
  2014-04-11 18:41 ` Daniel Borkmann
                   ` (24 subsequent siblings)
  35 siblings, 0 replies; 37+ messages in thread
From: Daniel Borkmann @ 2014-04-11 18:40 UTC (permalink / raw)
  To: linux-sctp

On 04/11/2014 08:22 PM, Butler, Peter wrote:
> The difference between 3.14 and 3.4.2 is staggering.  An order of magnitude or so.  For example,
 > using the precisely same setup as before, whereas I get about 2.1 Gbps throughput with 3.4 2, I
 > can only manage between 70-150 Mbps with 3.14 - a staggering difference.
>
> Moreover, the SCTP throughput seems to 'choke' itself with 3.14, such that it is always trying to
 > recover.  For example, with 3.4.2 the 2.1 Gbps throughput is quite consistent from one second to
 > the next (as you would expect):
>
> [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t 60
> iperf version 3.0.1 (10 January 2014)
> Linux Lab200slot2 3.4.2-1.fc16.x86_64 #1 SMP Thu Jun 14 20:17:26 UTC 2012 x86_64
> Time: Fri, 11 Apr 2014 18:19:15 GMT
> Connecting to host 192.168.241.3, port 5201
>        Cookie: Lab200slot2.1397240355.069035.0d5b0f
> [  4] local 192.168.241.2 port 56030 connected to 192.168.241.3 port 5201
> Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
> [ ID] Interval           Transfer     Bandwidth
> [  4]   0.00-1.00   sec   255 MBytes  2.14 Gbits/sec
> [  4]   1.00-2.00   sec   253 MBytes  2.12 Gbits/sec
> [  4]   2.00-3.00   sec   255 MBytes  2.14 Gbits/sec
> [  4]   3.00-4.00   sec   255 MBytes  2.14 Gbits/sec
> [  4]   4.00-5.00   sec   255 MBytes  2.14 Gbits/sec
> [  4]   5.00-6.00   sec   257 MBytes  2.15 Gbits/sec
> [  4]   6.00-7.00   sec   253 MBytes  2.13 Gbits/sec
> [  4]   7.00-8.00   sec   254 MBytes  2.13 Gbits/sec
> [  4]   8.00-9.00   sec   255 MBytes  2.14 Gbits/sec
> [  4]   9.00-10.00  sec   252 MBytes  2.12 Gbits/sec
> (etc)
>
> but with 3.14 the numbers as all over the place:
>
> [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t 60
> iperf version 3.0.1 (10 January 2014)
> Linux Lab200slot2 3.14.0 #1 SMP Thu Apr 3 23:18:29 EDT 2014 x86_64
> Time: Fri, 11 Apr 2014 17:56:21 GMT
> Connecting to host 192.168.241.3, port 5201
>        Cookie: Lab200slot2.1397238981.812898.548918
> [  4] local 192.168.241.2 port 38616 connected to 192.168.241.3 port 5201
> Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
> [ ID] Interval           Transfer     Bandwidth
> [  4]   0.00-1.09   sec  20.8 MBytes   161 Mbits/sec
> [  4]   1.09-2.13   sec  10.8 MBytes  86.8 Mbits/sec
> [  4]   2.13-3.15   sec  3.57 MBytes  29.5 Mbits/sec
> [  4]   3.15-4.16   sec  4.33 MBytes  35.7 Mbits/sec
> [  4]   4.16-6.21   sec  10.4 MBytes  42.7 Mbits/sec
> [  4]   6.21-6.21   sec  0.00 Bytes  0.00 bits/sec
> [  4]   6.21-7.35   sec  34.6 MBytes   253 Mbits/sec
> [  4]   7.35-11.45  sec  22.0 MBytes  45.0 Mbits/sec
> [  4]  11.45-11.45  sec  0.00 Bytes  0.00 bits/sec
> [  4]  11.45-11.45  sec  0.00 Bytes  0.00 bits/sec
> [  4]  11.45-11.45  sec  0.00 Bytes  0.00 bits/sec
> [  4]  11.45-12.51  sec  16.0 MBytes   126 Mbits/sec
> [  4]  12.51-13.59  sec  20.3 MBytes   158 Mbits/sec
> [  4]  13.59-14.65  sec  13.4 MBytes   107 Mbits/sec
> [  4]  14.65-16.79  sec  33.3 MBytes   130 Mbits/sec
> [  4]  16.79-16.79  sec  0.00 Bytes  0.00 bits/sec
> [  4]  16.79-17.82  sec  5.94 MBytes  48.7 Mbits/sec
> (etc)
>
> Note: the difference appears to be SCTP-specific, as I get exactly the same TCP
 > throughput in both kernels.

Hmm, okay. :/ Could you further bisect on your side to narrow down from which
kernel onwards this behaviour can be seen?

Thanks,

Daniel

> -----Original Message-----
> From: Daniel Borkmann [mailto:dborkman@redhat.com]
> Sent: April-11-14 11:22 AM
> To: Butler, Peter
> Cc: Vlad Yasevich; linux-sctp@vger.kernel.org
> Subject: Re: Is SCTP throughput really this low compared to TCP?
>
> On 04/11/2014 05:07 PM, Butler, Peter wrote:
>> Yes indeed this is ixgbe and I do have SCTP checksum offloading
>> enabled.  (For what
>   > it's worth, the checksum offload gives about a 20% throughput gain - but this is,  > of course, already included in the numbers I posted to this thread as I've been using  > the CRC offload all along.)
>
> Ok, understood.
>
>> I re-did all the tests with TSO/GSO/LRO/GRO disabled (on both sides of
>> the association -
>   > i.e. on both endpoint nodes), and using 1452-byte messages instead of 1000-byte  > messages.  With this new setup, the TCP performance drops significantly, as expected,  > while the SCTP performance is boosted, and the playing field is somewhat more 'level'.
>   > (Note that I could not use 1464-byte messages as suggested by Vlad, as anything above  > 1452 cut the SCTP performance in half - must have hit the segmentation limit at this  > slightly lower message size.  MTU is 1500.)
>>
>> So comparing "apples to apples" now, TCP only out-performs SCTP by
>> approximately 40-70%
>   > over the various range of network latencies I tested with (RTTs of 0.2 ms, 10 ms, 20 ms,  > and 50 ms).  40-70% is still significant, but nowhere near the 200% better (i.e. 3  > times the throughput) I was getting before.
>>
>> Does this value (i.e. 40-70%) sound reasonable?  Is this the
>> more-or-less accepted
>   > performance difference with the current LKSCTP implementation?
>
> Yes, that sounds reasonable to me. There are still a lot of open todos in terms of performance that we need to tackle over time, e.g. the way chunks are handled, imho, and copies involved in fast path, also that we're heavily using atomic reference counting, and other open issues.
>
>> Also, for what it's worth, I get better SCTP throughput numbers with
>> the older kernel
>   > (3.4.2) than with the newer kernel (3.14)...
>
> Interesting, a lot of things happened in between; were you able to bisect/identify a possible commit that causes this? How big is the difference?
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Is SCTP throughput really this low compared to TCP?
  2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
                   ` (10 preceding siblings ...)
  2014-04-11 18:40 ` Daniel Borkmann
@ 2014-04-11 18:41 ` Daniel Borkmann
  2014-04-11 18:58 ` Butler, Peter
                   ` (23 subsequent siblings)
  35 siblings, 0 replies; 37+ messages in thread
From: Daniel Borkmann @ 2014-04-11 18:41 UTC (permalink / raw)
  To: linux-sctp

On 04/11/2014 08:40 PM, Daniel Borkmann wrote:
> On 04/11/2014 08:22 PM, Butler, Peter wrote:
>> The difference between 3.14 and 3.4.2 is staggering.  An order of magnitude or so.  For example,
>  > using the precisely same setup as before, whereas I get about 2.1 Gbps throughput with 3.4 2, I
>  > can only manage between 70-150 Mbps with 3.14 - a staggering difference.
>>
>> Moreover, the SCTP throughput seems to 'choke' itself with 3.14, such that it is always trying to
>  > recover.  For example, with 3.4.2 the 2.1 Gbps throughput is quite consistent from one second to
>  > the next (as you would expect):
>>
>> [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t 60
>> iperf version 3.0.1 (10 January 2014)
>> Linux Lab200slot2 3.4.2-1.fc16.x86_64 #1 SMP Thu Jun 14 20:17:26 UTC 2012 x86_64
>> Time: Fri, 11 Apr 2014 18:19:15 GMT
>> Connecting to host 192.168.241.3, port 5201
>>        Cookie: Lab200slot2.1397240355.069035.0d5b0f
>> [  4] local 192.168.241.2 port 56030 connected to 192.168.241.3 port 5201
>> Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
>> [ ID] Interval           Transfer     Bandwidth
>> [  4]   0.00-1.00   sec   255 MBytes  2.14 Gbits/sec
>> [  4]   1.00-2.00   sec   253 MBytes  2.12 Gbits/sec
>> [  4]   2.00-3.00   sec   255 MBytes  2.14 Gbits/sec
>> [  4]   3.00-4.00   sec   255 MBytes  2.14 Gbits/sec
>> [  4]   4.00-5.00   sec   255 MBytes  2.14 Gbits/sec
>> [  4]   5.00-6.00   sec   257 MBytes  2.15 Gbits/sec
>> [  4]   6.00-7.00   sec   253 MBytes  2.13 Gbits/sec
>> [  4]   7.00-8.00   sec   254 MBytes  2.13 Gbits/sec
>> [  4]   8.00-9.00   sec   255 MBytes  2.14 Gbits/sec
>> [  4]   9.00-10.00  sec   252 MBytes  2.12 Gbits/sec
>> (etc)
>>
>> but with 3.14 the numbers as all over the place:
>>
>> [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t 60
>> iperf version 3.0.1 (10 January 2014)
>> Linux Lab200slot2 3.14.0 #1 SMP Thu Apr 3 23:18:29 EDT 2014 x86_64
>> Time: Fri, 11 Apr 2014 17:56:21 GMT
>> Connecting to host 192.168.241.3, port 5201
>>        Cookie: Lab200slot2.1397238981.812898.548918
>> [  4] local 192.168.241.2 port 38616 connected to 192.168.241.3 port 5201
>> Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
>> [ ID] Interval           Transfer     Bandwidth
>> [  4]   0.00-1.09   sec  20.8 MBytes   161 Mbits/sec
>> [  4]   1.09-2.13   sec  10.8 MBytes  86.8 Mbits/sec
>> [  4]   2.13-3.15   sec  3.57 MBytes  29.5 Mbits/sec
>> [  4]   3.15-4.16   sec  4.33 MBytes  35.7 Mbits/sec
>> [  4]   4.16-6.21   sec  10.4 MBytes  42.7 Mbits/sec
>> [  4]   6.21-6.21   sec  0.00 Bytes  0.00 bits/sec
>> [  4]   6.21-7.35   sec  34.6 MBytes   253 Mbits/sec
>> [  4]   7.35-11.45  sec  22.0 MBytes  45.0 Mbits/sec
>> [  4]  11.45-11.45  sec  0.00 Bytes  0.00 bits/sec
>> [  4]  11.45-11.45  sec  0.00 Bytes  0.00 bits/sec
>> [  4]  11.45-11.45  sec  0.00 Bytes  0.00 bits/sec
>> [  4]  11.45-12.51  sec  16.0 MBytes   126 Mbits/sec
>> [  4]  12.51-13.59  sec  20.3 MBytes   158 Mbits/sec
>> [  4]  13.59-14.65  sec  13.4 MBytes   107 Mbits/sec
>> [  4]  14.65-16.79  sec  33.3 MBytes   130 Mbits/sec
>> [  4]  16.79-16.79  sec  0.00 Bytes  0.00 bits/sec
>> [  4]  16.79-17.82  sec  5.94 MBytes  48.7 Mbits/sec
>> (etc)
>>
>> Note: the difference appears to be SCTP-specific, as I get exactly the same TCP
>  > throughput in both kernels.
>
> Hmm, okay. :/ Could you further bisect on your side to narrow down from which
> kernel onwards this behaviour can be seen?

Is that behaviour consistent between IPv4 and IPv6?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: Is SCTP throughput really this low compared to TCP?
  2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
                   ` (11 preceding siblings ...)
  2014-04-11 18:41 ` Daniel Borkmann
@ 2014-04-11 18:58 ` Butler, Peter
  2014-04-11 19:16 ` Butler, Peter
                   ` (22 subsequent siblings)
  35 siblings, 0 replies; 37+ messages in thread
From: Butler, Peter @ 2014-04-11 18:58 UTC (permalink / raw)
  To: linux-sctp

I have the perf data (operf/opreport) and am trying to send it out - but our email server is rejecting it with:  "Reason: content policy violation".  I've contacted the IT help desk and will get it out to this mailing list ASAP



-----Original Message-----
From: Vlad Yasevich [mailto:vyasevich@gmail.com] 
Sent: April-11-14 11:28 AM
To: Butler, Peter; Daniel Borkmann
Cc: linux-sctp@vger.kernel.org
Subject: Re: Is SCTP throughput really this low compared to TCP?

On 04/11/2014 11:07 AM, Butler, Peter wrote:
> Yes indeed this is ixgbe and I do have SCTP checksum offloading 
> enabled.  (For what it's worth, the checksum offload gives about a 20% 
> throughput gain - but this is, of course, already included in the 
> numbers I posted to this thread as I've been using the CRC offload all 
> along.)
> 
> I re-did all the tests with TSO/GSO/LRO/GRO disabled (on both sides of the association - i.e. on both endpoint nodes), and using 1452-byte messages instead of 1000-byte messages.  With this new setup, the TCP performance drops significantly, as expected, while the SCTP performance is boosted, and the playing field is somewhat more 'level'.   (Note that I could not use 1464-byte messages as suggested by Vlad, as anything above 1452 cut the SCTP performance in half - must have hit the segmentation limit at this slightly lower message size.  MTU is 1500.)
> 
> So comparing "apples to apples" now, TCP only out-performs SCTP by approximately 40-70% over the various range of network latencies I tested with (RTTs of 0.2 ms, 10 ms, 20 ms, and 50 ms).  40-70% is still significant, but nowhere near the 200% better (i.e. 3 times the throughput) I was getting before.
> 
> Does this value (i.e. 40-70%) sound reasonable? 

This still looks high.  Could you run 'perf record -a' and 'perf report'
to see where we are spending all of our time in sctp.

My guess is that a lot of it is going to be in memcpy(), but I am curious.

> Is this the more-or-less accepted performance difference with the current LKSCTP implementation?
> 
> Also, for what it's worth, I get better SCTP throughput numbers with the older kernel (3.4.2) than with the newer kernel (3.14)...
> 

That's interesting.  I'll have to look at see what might have changed here.

-vlad

> 
> 
> -----Original Message-----
> From: Daniel Borkmann [mailto:dborkman@redhat.com]
> Sent: April-11-14 3:43 AM
> To: Vlad Yasevich
> Cc: Butler, Peter; linux-sctp@vger.kernel.org
> Subject: Re: Is SCTP throughput really this low compared to TCP?
> 
> Hi Peter,
> 
> On 04/10/2014 10:21 PM, Vlad Yasevich wrote:
>> On 04/10/2014 03:12 PM, Butler, Peter wrote:
>>> I've been testing SCTP throughput between two nodes over a 10Gb-Ethernet backplane, and am finding that at best, its throughput is about a third of that of TCP.  Is this number generally accepted for current LKSCTP performance?
>>>
>>> All TCP/SCTP tests performed with 1000-byte (payload) messages, between 8-core Xeon nodes @ 2.13GHz, with no CPU throttling (always running at 100%) on otherwise idle systems.  Test applications include netperf, iperf and proprietary in-house stubs.
>>>
>>> The latency between nodes is generally 0.2 ms.  Tests were run using this low-latency scenario, as well as using traffic control (tc) to simulate networks with 10 ms, 20 ms and 50 ms latency (i.e. 20 ms, 40 ms and 100 ms RTT, respectively).
>>>
>>> In addition, each of these network scenarios were tested using various kernel socket buffer sizes, ranging from the default kernel size (100-200 kB), to several MB for send and receive buffers, and multiple send:receive ratios for these buffer sizes (generally using larger receive buffer sizes, up to a factor of about 6).
>>>
>>> Finally, tests were performed on kernels as old as 3.4.2 and as recent as 3.14.
>>>
>>> The TCP throughput is about 3x higher than that of SCTP as a best-case scenario (i.e. from an SCTP perspective), and much higher still in worst-case scenarios.
>>
>> To do a more of apples-to-apples comparison, you need to disable 
>> tso/gso on the sending node.
>>
>> The reason is that even if you limit buffer sizes, tcp will still try 
>> to do tso on the transmit size, thus coalescing you 1000-byte 
>> messages into something much larger, thus utilizing your MTU much more efficiently.
>>
>> SCTP, on the other hand, has to preserve message boundaries which 
>> results in sub-optimal mtu utilization when using 1000-byte payloads.
>>
>> My recommendation is to use 1464 byte message for SCTP on a 1500 byte 
>> MTU nic.
>>
>> I would be interested to see the results.  There could very well be issues.
> 
> Agreed.
> 
> Also, what NIC are you using? It seems only Intel provides SCTP checksum offloading so far, i.e. ixgbe/i40e NICs.
> 
>> -vlad


^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: Is SCTP throughput really this low compared to TCP?
  2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
                   ` (12 preceding siblings ...)
  2014-04-11 18:58 ` Butler, Peter
@ 2014-04-11 19:16 ` Butler, Peter
  2014-04-11 19:20 ` Vlad Yasevich
                   ` (21 subsequent siblings)
  35 siblings, 0 replies; 37+ messages in thread
From: Butler, Peter @ 2014-04-11 19:16 UTC (permalink / raw)
  To: linux-sctp

What an excellent question.  I just tried that now and you are definitely on to something.  Whereas IPv4 is erratic on 3.14 (for SCTP), IPv6 is fairly smooth (see results below).

However, note that as previously mentioned I still get better throughput numbers with 3.4.2.  For the no-latency (0.2 ms) test, the 3.4.2 kernel yields about 2.1 Gbps, whereas the 3.14 kernel yields only about 1.6 Gbps.

These results show that the erratic behaviour seen with kernel 3.14 appears to be confined to IPv4 SCTP only:

IPv6:

[root@Lab200slot2 ~]#  iperf3 --sctp -6 -c 2001:db8:0:f101::1 -V -l 1400 -t 60
iperf version 3.0.1 (10 January 2014)
Linux Lab200slot2 3.14.0 #1 SMP Thu Apr 3 23:18:29 EDT 2014 x86_64
Time: Fri, 11 Apr 2014 19:08:41 GMT
Connecting to host 2001:db8:0:f101::1, port 5201
      Cookie: Lab200slot2.1397243321.714295.2b3f7c
[  4] local 2001:db8:0:f101::2 port 55804 connected to 2001:db8:0:f101::1 port 5201
Starting Test: protocol: SCTP, 1 streams, 1400 byte blocks, omitting 0 seconds, 60 second test
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec   169 MBytes  1.42 Gbits/sec
[  4]   1.00-2.00   sec   201 MBytes  1.69 Gbits/sec
[  4]   2.00-3.00   sec   188 MBytes  1.58 Gbits/sec
[  4]   3.00-4.00   sec   174 MBytes  1.46 Gbits/sec
[  4]   4.00-5.00   sec   165 MBytes  1.39 Gbits/sec
[  4]   5.00-6.00   sec   199 MBytes  1.67 Gbits/sec
[  4]   6.00-7.00   sec   163 MBytes  1.36 Gbits/sec
[  4]   7.00-8.00   sec   174 MBytes  1.46 Gbits/sec
[  4]   8.00-9.00   sec   193 MBytes  1.62 Gbits/sec
[  4]   9.00-10.00  sec   196 MBytes  1.65 Gbits/sec
[  4]  10.00-11.00  sec   157 MBytes  1.31 Gbits/sec
[  4]  11.00-12.00  sec   175 MBytes  1.47 Gbits/sec
[  4]  12.00-13.00  sec   192 MBytes  1.61 Gbits/sec
[  4]  13.00-14.00  sec   199 MBytes  1.67 Gbits/sec
(etc)



IPv4:

 [root@Lab200slot2 ~]#  iperf3 --sctp -4 -c 192.168.240.3 -V -l 1400 -t 60
iperf version 3.0.1 (10 January 2014)
Linux Lab200slot2 3.14.0 #1 SMP Thu Apr 3 23:18:29 EDT 2014 x86_64
Time: Fri, 11 Apr 2014 19:09:28 GMT
Connecting to host 192.168.240.3, port 5201
      Cookie: Lab200slot2.1397243368.815040.7ecb3d
[  4] local 192.168.240.2 port 36273 connected to 192.168.240.3 port 5201
Starting Test: protocol: SCTP, 1 streams, 1400 byte blocks, omitting 0 seconds, 60 second test
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec   221 MBytes  1.85 Gbits/sec
[  4]   1.00-2.42   sec  91.3 MBytes   541 Mbits/sec
[  4]   2.42-3.00   sec   127 MBytes  1.83 Gbits/sec
[  4]   3.00-4.00   sec   216 MBytes  1.81 Gbits/sec
[  4]   4.00-5.51   sec   111 MBytes   617 Mbits/sec
[  4]   5.51-6.75   sec  54.0 MBytes   365 Mbits/sec
[  4]   6.75-7.00   sec  57.4 MBytes  1.89 Gbits/sec
[  4]   7.00-9.55   sec   121 MBytes   399 Mbits/sec
[  4]   9.55-9.56   sec  0.00 Bytes  0.00 bits/sec
[  4]   9.56-10.00  sec  99.7 MBytes  1.88 Gbits/sec
[  4]  10.00-11.00  sec   220 MBytes  1.85 Gbits/sec
[  4]  11.00-12.34  sec  74.3 MBytes   466 Mbits/sec
(etc)




-----Original Message-----
From: Daniel Borkmann [mailto:dborkman@redhat.com] 
Sent: April-11-14 2:42 PM
To: Butler, Peter
Cc: Vlad Yasevich; linux-sctp@vger.kernel.org
Subject: Re: Is SCTP throughput really this low compared to TCP?

On 04/11/2014 08:40 PM, Daniel Borkmann wrote:
> On 04/11/2014 08:22 PM, Butler, Peter wrote:
>> The difference between 3.14 and 3.4.2 is staggering.  An order of 
>> magnitude or so.  For example,
>  > using the precisely same setup as before, whereas I get about 2.1 
> Gbps throughput with 3.4 2, I  > can only manage between 70-150 Mbps with 3.14 - a staggering difference.
>>
>> Moreover, the SCTP throughput seems to 'choke' itself with 3.14, such 
>> that it is always trying to
>  > recover.  For example, with 3.4.2 the 2.1 Gbps throughput is quite 
> consistent from one second to  > the next (as you would expect):
>>
>> [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t 
>> 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2 
>> 3.4.2-1.fc16.x86_64 #1 SMP Thu Jun 14 20:17:26 UTC 2012 x86_64
>> Time: Fri, 11 Apr 2014 18:19:15 GMT
>> Connecting to host 192.168.241.3, port 5201
>>        Cookie: Lab200slot2.1397240355.069035.0d5b0f
>> [  4] local 192.168.241.2 port 56030 connected to 192.168.241.3 port 
>> 5201 Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
>> [ ID] Interval           Transfer     Bandwidth
>> [  4]   0.00-1.00   sec   255 MBytes  2.14 Gbits/sec
>> [  4]   1.00-2.00   sec   253 MBytes  2.12 Gbits/sec
>> [  4]   2.00-3.00   sec   255 MBytes  2.14 Gbits/sec
>> [  4]   3.00-4.00   sec   255 MBytes  2.14 Gbits/sec
>> [  4]   4.00-5.00   sec   255 MBytes  2.14 Gbits/sec
>> [  4]   5.00-6.00   sec   257 MBytes  2.15 Gbits/sec
>> [  4]   6.00-7.00   sec   253 MBytes  2.13 Gbits/sec
>> [  4]   7.00-8.00   sec   254 MBytes  2.13 Gbits/sec
>> [  4]   8.00-9.00   sec   255 MBytes  2.14 Gbits/sec
>> [  4]   9.00-10.00  sec   252 MBytes  2.12 Gbits/sec
>> (etc)
>>
>> but with 3.14 the numbers as all over the place:
>>
>> [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t 
>> 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2 3.14.0 #1 
>> SMP Thu Apr 3 23:18:29 EDT 2014 x86_64
>> Time: Fri, 11 Apr 2014 17:56:21 GMT
>> Connecting to host 192.168.241.3, port 5201
>>        Cookie: Lab200slot2.1397238981.812898.548918
>> [  4] local 192.168.241.2 port 38616 connected to 192.168.241.3 port 
>> 5201 Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
>> [ ID] Interval           Transfer     Bandwidth
>> [  4]   0.00-1.09   sec  20.8 MBytes   161 Mbits/sec
>> [  4]   1.09-2.13   sec  10.8 MBytes  86.8 Mbits/sec
>> [  4]   2.13-3.15   sec  3.57 MBytes  29.5 Mbits/sec
>> [  4]   3.15-4.16   sec  4.33 MBytes  35.7 Mbits/sec
>> [  4]   4.16-6.21   sec  10.4 MBytes  42.7 Mbits/sec
>> [  4]   6.21-6.21   sec  0.00 Bytes  0.00 bits/sec
>> [  4]   6.21-7.35   sec  34.6 MBytes   253 Mbits/sec
>> [  4]   7.35-11.45  sec  22.0 MBytes  45.0 Mbits/sec
>> [  4]  11.45-11.45  sec  0.00 Bytes  0.00 bits/sec [  4]  11.45-11.45  
>> sec  0.00 Bytes  0.00 bits/sec [  4]  11.45-11.45  sec  0.00 Bytes  
>> 0.00 bits/sec
>> [  4]  11.45-12.51  sec  16.0 MBytes   126 Mbits/sec
>> [  4]  12.51-13.59  sec  20.3 MBytes   158 Mbits/sec
>> [  4]  13.59-14.65  sec  13.4 MBytes   107 Mbits/sec
>> [  4]  14.65-16.79  sec  33.3 MBytes   130 Mbits/sec
>> [  4]  16.79-16.79  sec  0.00 Bytes  0.00 bits/sec [  4]  16.79-17.82  
>> sec  5.94 MBytes  48.7 Mbits/sec
>> (etc)
>>
>> Note: the difference appears to be SCTP-specific, as I get exactly 
>> the same TCP
>  > throughput in both kernels.
>
> Hmm, okay. :/ Could you further bisect on your side to narrow down 
> from which kernel onwards this behaviour can be seen?

Is that behaviour consistent between IPv4 and IPv6?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Is SCTP throughput really this low compared to TCP?
  2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
                   ` (13 preceding siblings ...)
  2014-04-11 19:16 ` Butler, Peter
@ 2014-04-11 19:20 ` Vlad Yasevich
  2014-04-11 19:24 ` Butler, Peter
                   ` (20 subsequent siblings)
  35 siblings, 0 replies; 37+ messages in thread
From: Vlad Yasevich @ 2014-04-11 19:20 UTC (permalink / raw)
  To: linux-sctp

On 04/11/2014 02:22 PM, Butler, Peter wrote:
> The difference between 3.14 and 3.4.2 is staggering.  An order of
magnitude or so.  For example, using the precisely same setup as before,
whereas I get about 2.1 Gbps throughput with 3.4 2, I can only manage
between 70-150 Mbps with 3.14 - a staggering difference.
>
> Moreover, the SCTP throughput seems to 'choke' itself with 3.14, such
that it is always trying to recover.  For example, with 3.4.2 the 2.1
Gbps throughput is quite consistent from one second to the next (as you
would expect):
>
> but with 3.14 the numbers as all over the place:
>
> [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t 60
> iperf version 3.0.1 (10 January 2014)
> Linux Lab200slot2 3.14.0 #1 SMP Thu Apr 3 23:18:29 EDT 2014 x86_64
> Time: Fri, 11 Apr 2014 17:56:21 GMT
> Connecting to host 192.168.241.3, port 5201
>       Cookie: Lab200slot2.1397238981.812898.548918
> [  4] local 192.168.241.2 port 38616 connected to 192.168.241.3 port 5201
> Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0
seconds, 60 second test
> [ ID] Interval           Transfer     Bandwidth
> [  4]   0.00-1.09   sec  20.8 MBytes   161 Mbits/sec
> [  4]   1.09-2.13   sec  10.8 MBytes  86.8 Mbits/sec
> [  4]   2.13-3.15   sec  3.57 MBytes  29.5 Mbits/sec
> [  4]   3.15-4.16   sec  4.33 MBytes  35.7 Mbits/sec
> [  4]   4.16-6.21   sec  10.4 MBytes  42.7 Mbits/sec
> [  4]   6.21-6.21   sec  0.00 Bytes  0.00 bits/sec
> [  4]   6.21-7.35   sec  34.6 MBytes   253 Mbits/sec
> [  4]   7.35-11.45  sec  22.0 MBytes  45.0 Mbits/sec
> [  4]  11.45-11.45  sec  0.00 Bytes  0.00 bits/sec
> [  4]  11.45-11.45  sec  0.00 Bytes  0.00 bits/sec
> [  4]  11.45-11.45  sec  0.00 Bytes  0.00 bits/sec
> [  4]  11.45-12.51  sec  16.0 MBytes   126 Mbits/sec
> [  4]  12.51-13.59  sec  20.3 MBytes   158 Mbits/sec
> [  4]  13.59-14.65  sec  13.4 MBytes   107 Mbits/sec
> [  4]  14.65-16.79  sec  33.3 MBytes   130 Mbits/sec
> [  4]  16.79-16.79  sec  0.00 Bytes  0.00 bits/sec
> [  4]  16.79-17.82  sec  5.94 MBytes  48.7 Mbits/sec
> (etc)
>
> Note: the difference appears to be SCTP-specific, as I get exactly the
same TCP throughput in both kernels.
>

Ouch.  That is not very good behavior...  I wonder if this
a side-effect of the new rwnd algorithm...

In fact, I think I do see a small problem with the algorithm.

Can you try this patch:

diff --git a/net/sctp/ulpevent.c b/net/sctp/ulpevent.c
index 8d198ae..c17592a 100644
--- a/net/sctp/ulpevent.c
+++ b/net/sctp/ulpevent.c
@@ -738,7 +738,7 @@ struct sctp_ulpevent
*sctp_ulpevent_make_rcvmsg(struct sctp_association *asoc,
 	 * Since this is a clone of the original skb, only account for
 	 * the data of this chunk as other chunks will be accounted separately.
 	 */
-	sctp_ulpevent_init(event, 0, skb->len + sizeof(struct sk_buff));
+	sctp_ulpevent_init(event, 0, skb->len);

 	sctp_ulpevent_receive_data(event, asoc);

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* RE: Is SCTP throughput really this low compared to TCP?
  2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
                   ` (14 preceding siblings ...)
  2014-04-11 19:20 ` Vlad Yasevich
@ 2014-04-11 19:24 ` Butler, Peter
  2014-04-11 20:14 ` Butler, Peter
                   ` (19 subsequent siblings)
  35 siblings, 0 replies; 37+ messages in thread
From: Butler, Peter @ 2014-04-11 19:24 UTC (permalink / raw)
  To: linux-sctp

I can certainly try that patch, however see the previous email where Daniel suggested that the issue may be IPv4 only.  I have subsequently tested it (email sent out 5 minutes ago) and he was right: IPv6 is smooth, whereas IPv4 is erratic.

Although even when using the smooth IPv6 behaviour, the 3.4.2 throughput is still better than 3.14; for example, 2.1 Gbps in the 'no' latency case (0.2 ms RTT) on 3.4.2 but only 1.6 Gbps with 3.14.

Should I try out the patch, or does the IPv4 vs IPv6 data shed new light on this?




-----Original Message-----
From: Vlad Yasevich [mailto:vyasevich@gmail.com] 
Sent: April-11-14 3:21 PM
To: Butler, Peter; Daniel Borkmann
Cc: linux-sctp@vger.kernel.org
Subject: Re: Is SCTP throughput really this low compared to TCP?

On 04/11/2014 02:22 PM, Butler, Peter wrote:
> The difference between 3.14 and 3.4.2 is staggering.  An order of
magnitude or so.  For example, using the precisely same setup as before, whereas I get about 2.1 Gbps throughput with 3.4 2, I can only manage between 70-150 Mbps with 3.14 - a staggering difference.
>
> Moreover, the SCTP throughput seems to 'choke' itself with 3.14, such
that it is always trying to recover.  For example, with 3.4.2 the 2.1 Gbps throughput is quite consistent from one second to the next (as you would expect):
>
> but with 3.14 the numbers as all over the place:
>
> [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t 
> 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2 3.14.0 #1 
> SMP Thu Apr 3 23:18:29 EDT 2014 x86_64
> Time: Fri, 11 Apr 2014 17:56:21 GMT
> Connecting to host 192.168.241.3, port 5201
>       Cookie: Lab200slot2.1397238981.812898.548918
> [  4] local 192.168.241.2 port 38616 connected to 192.168.241.3 port 
> 5201 Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, 
> omitting 0
seconds, 60 second test
> [ ID] Interval           Transfer     Bandwidth
> [  4]   0.00-1.09   sec  20.8 MBytes   161 Mbits/sec
> [  4]   1.09-2.13   sec  10.8 MBytes  86.8 Mbits/sec
> [  4]   2.13-3.15   sec  3.57 MBytes  29.5 Mbits/sec
> [  4]   3.15-4.16   sec  4.33 MBytes  35.7 Mbits/sec
> [  4]   4.16-6.21   sec  10.4 MBytes  42.7 Mbits/sec
> [  4]   6.21-6.21   sec  0.00 Bytes  0.00 bits/sec
> [  4]   6.21-7.35   sec  34.6 MBytes   253 Mbits/sec
> [  4]   7.35-11.45  sec  22.0 MBytes  45.0 Mbits/sec
> [  4]  11.45-11.45  sec  0.00 Bytes  0.00 bits/sec [  4]  11.45-11.45  
> sec  0.00 Bytes  0.00 bits/sec [  4]  11.45-11.45  sec  0.00 Bytes  
> 0.00 bits/sec
> [  4]  11.45-12.51  sec  16.0 MBytes   126 Mbits/sec
> [  4]  12.51-13.59  sec  20.3 MBytes   158 Mbits/sec
> [  4]  13.59-14.65  sec  13.4 MBytes   107 Mbits/sec
> [  4]  14.65-16.79  sec  33.3 MBytes   130 Mbits/sec
> [  4]  16.79-16.79  sec  0.00 Bytes  0.00 bits/sec [  4]  16.79-17.82  
> sec  5.94 MBytes  48.7 Mbits/sec
> (etc)
>
> Note: the difference appears to be SCTP-specific, as I get exactly the
same TCP throughput in both kernels.
>

Ouch.  That is not very good behavior...  I wonder if this a side-effect of the new rwnd algorithm...

In fact, I think I do see a small problem with the algorithm.

Can you try this patch:

<snipped>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: Is SCTP throughput really this low compared to TCP?
  2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
                   ` (15 preceding siblings ...)
  2014-04-11 19:24 ` Butler, Peter
@ 2014-04-11 20:14 ` Butler, Peter
  2014-04-11 20:18 ` Butler, Peter
                   ` (18 subsequent siblings)
  35 siblings, 0 replies; 37+ messages in thread
From: Butler, Peter @ 2014-04-11 20:14 UTC (permalink / raw)
  To: linux-sctp

I tried setting /proc/sys/net/sctp/max_burst to 0 on both nodes (both sides of the association): doing so prevented iperf from sending any traffic at all.  Only the initial 4-way handshake and subsequent heartbeat packets were transmitted.  When I kill the process (client and/or server) it reports 0 bits/s throughput.

I then tried various different values from 1 to 16 (default was 4), they all result in about the same 2.1 Gbps throughput (for the low-latency (0.2 ms RTT) case) as originally reported.




-----Original Message-----
From: Vlad Yasevich [mailto:vyasevich@gmail.com] 
Sent: April-11-14 2:20 PM
To: Butler, Peter; Daniel Borkmann
Cc: linux-sctp@vger.kernel.org
Subject: Re: Is SCTP throughput really this low compared to TCP?

On 04/11/2014 11:07 AM, Butler, Peter wrote:
> Yes indeed this is ixgbe and I do have SCTP checksum offloading 
> enabled.  (For what it's worth, the checksum offload gives about a 20% 
> throughput gain - but this is, of course, already included in the 
> numbers I posted to this thread as I've been using the CRC offload all 
> along.)
> 
> I re-did all the tests with TSO/GSO/LRO/GRO disabled (on both sides of the association - i.e. on both endpoint nodes), and using 1452-byte messages instead of 1000-byte messages.  With this new setup, the TCP performance drops significantly, as expected, while the SCTP performance is boosted, and the playing field is somewhat more 'level'.   (Note that I could not use 1464-byte messages as suggested by Vlad, as anything above 1452 cut the SCTP performance in half - must have hit the segmentation limit at this slightly lower message size.  MTU is 1500.)
> 
> So comparing "apples to apples" now, TCP only out-performs SCTP by approximately 40-70% over the various range of network latencies I tested with (RTTs of 0.2 ms, 10 ms, 20 ms, and 50 ms).  40-70% is still significant, but nowhere near the 200% better (i.e. 3 times the throughput) I was getting before.
> 
> Does this value (i.e. 40-70%) sound reasonable?  Is this the more-or-less accepted performance difference with the current LKSCTP implementation?
> 
> Also, for what it's worth, I get better SCTP throughput numbers with the older kernel (3.4.2) than with the newer kernel (3.14)...
> 

Hi Peter

Could you run an experiment setting max_burst to 0?

The way the SCTP spec is written, upon every SACK, if the stack has new data to send, it will only sent max_burst*mtu bytes.  This may not always fill the current congestion window and might preclude growth.

Setting max_burst to 0 will disable burst limitation.  I am curious to see if this would impact throughput on a low-rtt link like yours.

-vlad


> 
> 
> -----Original Message-----
> From: Daniel Borkmann [mailto:dborkman@redhat.com]
> Sent: April-11-14 3:43 AM
> To: Vlad Yasevich
> Cc: Butler, Peter; linux-sctp@vger.kernel.org
> Subject: Re: Is SCTP throughput really this low compared to TCP?
> 
> Hi Peter,
> 
> On 04/10/2014 10:21 PM, Vlad Yasevich wrote:
>> On 04/10/2014 03:12 PM, Butler, Peter wrote:
>>> I've been testing SCTP throughput between two nodes over a 10Gb-Ethernet backplane, and am finding that at best, its throughput is about a third of that of TCP.  Is this number generally accepted for current LKSCTP performance?
>>>
>>> All TCP/SCTP tests performed with 1000-byte (payload) messages, between 8-core Xeon nodes @ 2.13GHz, with no CPU throttling (always running at 100%) on otherwise idle systems.  Test applications include netperf, iperf and proprietary in-house stubs.
>>>
>>> The latency between nodes is generally 0.2 ms.  Tests were run using this low-latency scenario, as well as using traffic control (tc) to simulate networks with 10 ms, 20 ms and 50 ms latency (i.e. 20 ms, 40 ms and 100 ms RTT, respectively).
>>>
>>> In addition, each of these network scenarios were tested using various kernel socket buffer sizes, ranging from the default kernel size (100-200 kB), to several MB for send and receive buffers, and multiple send:receive ratios for these buffer sizes (generally using larger receive buffer sizes, up to a factor of about 6).
>>>
>>> Finally, tests were performed on kernels as old as 3.4.2 and as recent as 3.14.
>>>
>>> The TCP throughput is about 3x higher than that of SCTP as a best-case scenario (i.e. from an SCTP perspective), and much higher still in worst-case scenarios.
>>
>> To do a more of apples-to-apples comparison, you need to disable 
>> tso/gso on the sending node.
>>
>> The reason is that even if you limit buffer sizes, tcp will still try 
>> to do tso on the transmit size, thus coalescing you 1000-byte 
>> messages into something much larger, thus utilizing your MTU much more efficiently.
>>
>> SCTP, on the other hand, has to preserve message boundaries which 
>> results in sub-optimal mtu utilization when using 1000-byte payloads.
>>
>> My recommendation is to use 1464 byte message for SCTP on a 1500 byte 
>> MTU nic.
>>
>> I would be interested to see the results.  There could very well be issues.
> 
> Agreed.
> 
> Also, what NIC are you using? It seems only Intel provides SCTP checksum offloading so far, i.e. ixgbe/i40e NICs.
> 
>> -vlad


^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: Is SCTP throughput really this low compared to TCP?
  2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
                   ` (16 preceding siblings ...)
  2014-04-11 20:14 ` Butler, Peter
@ 2014-04-11 20:18 ` Butler, Peter
  2014-04-11 20:51 ` Vlad Yasevich
                   ` (17 subsequent siblings)
  35 siblings, 0 replies; 37+ messages in thread
From: Butler, Peter @ 2014-04-11 20:18 UTC (permalink / raw)
  To: linux-sctp

It may take a little time to do the bisection.  Can you provide me with a guesstimate as to which kernel(s) may have introduced this behaviour?


-----Original Message-----
From: Daniel Borkmann [mailto:dborkman@redhat.com] 
Sent: April-11-14 2:40 PM
To: Butler, Peter
Cc: Vlad Yasevich; linux-sctp@vger.kernel.org
Subject: Re: Is SCTP throughput really this low compared to TCP?

On 04/11/2014 08:22 PM, Butler, Peter wrote:
> The difference between 3.14 and 3.4.2 is staggering.  An order of 
> magnitude or so.  For example,
 > using the precisely same setup as before, whereas I get about 2.1 Gbps throughput with 3.4 2, I  > can only manage between 70-150 Mbps with 3.14 - a staggering difference.
>
> Moreover, the SCTP throughput seems to 'choke' itself with 3.14, such 
> that it is always trying to
 > recover.  For example, with 3.4.2 the 2.1 Gbps throughput is quite consistent from one second to  > the next (as you would expect):
>
> [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t 
> 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2 
> 3.4.2-1.fc16.x86_64 #1 SMP Thu Jun 14 20:17:26 UTC 2012 x86_64
> Time: Fri, 11 Apr 2014 18:19:15 GMT
> Connecting to host 192.168.241.3, port 5201
>        Cookie: Lab200slot2.1397240355.069035.0d5b0f
> [  4] local 192.168.241.2 port 56030 connected to 192.168.241.3 port 
> 5201 Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
> [ ID] Interval           Transfer     Bandwidth
> [  4]   0.00-1.00   sec   255 MBytes  2.14 Gbits/sec
> [  4]   1.00-2.00   sec   253 MBytes  2.12 Gbits/sec
> [  4]   2.00-3.00   sec   255 MBytes  2.14 Gbits/sec
> [  4]   3.00-4.00   sec   255 MBytes  2.14 Gbits/sec
> [  4]   4.00-5.00   sec   255 MBytes  2.14 Gbits/sec
> [  4]   5.00-6.00   sec   257 MBytes  2.15 Gbits/sec
> [  4]   6.00-7.00   sec   253 MBytes  2.13 Gbits/sec
> [  4]   7.00-8.00   sec   254 MBytes  2.13 Gbits/sec
> [  4]   8.00-9.00   sec   255 MBytes  2.14 Gbits/sec
> [  4]   9.00-10.00  sec   252 MBytes  2.12 Gbits/sec
> (etc)
>
> but with 3.14 the numbers as all over the place:
>
> [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t 
> 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2 3.14.0 #1 
> SMP Thu Apr 3 23:18:29 EDT 2014 x86_64
> Time: Fri, 11 Apr 2014 17:56:21 GMT
> Connecting to host 192.168.241.3, port 5201
>        Cookie: Lab200slot2.1397238981.812898.548918
> [  4] local 192.168.241.2 port 38616 connected to 192.168.241.3 port 
> 5201 Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
> [ ID] Interval           Transfer     Bandwidth
> [  4]   0.00-1.09   sec  20.8 MBytes   161 Mbits/sec
> [  4]   1.09-2.13   sec  10.8 MBytes  86.8 Mbits/sec
> [  4]   2.13-3.15   sec  3.57 MBytes  29.5 Mbits/sec
> [  4]   3.15-4.16   sec  4.33 MBytes  35.7 Mbits/sec
> [  4]   4.16-6.21   sec  10.4 MBytes  42.7 Mbits/sec
> [  4]   6.21-6.21   sec  0.00 Bytes  0.00 bits/sec
> [  4]   6.21-7.35   sec  34.6 MBytes   253 Mbits/sec
> [  4]   7.35-11.45  sec  22.0 MBytes  45.0 Mbits/sec
> [  4]  11.45-11.45  sec  0.00 Bytes  0.00 bits/sec [  4]  11.45-11.45  
> sec  0.00 Bytes  0.00 bits/sec [  4]  11.45-11.45  sec  0.00 Bytes  
> 0.00 bits/sec
> [  4]  11.45-12.51  sec  16.0 MBytes   126 Mbits/sec
> [  4]  12.51-13.59  sec  20.3 MBytes   158 Mbits/sec
> [  4]  13.59-14.65  sec  13.4 MBytes   107 Mbits/sec
> [  4]  14.65-16.79  sec  33.3 MBytes   130 Mbits/sec
> [  4]  16.79-16.79  sec  0.00 Bytes  0.00 bits/sec [  4]  16.79-17.82  
> sec  5.94 MBytes  48.7 Mbits/sec
> (etc)
>
> Note: the difference appears to be SCTP-specific, as I get exactly the 
> same TCP
 > throughput in both kernels.

Hmm, okay. :/ Could you further bisect on your side to narrow down from which kernel onwards this behaviour can be seen?

Thanks,

Daniel

> -----Original Message-----
> From: Daniel Borkmann [mailto:dborkman@redhat.com]
> Sent: April-11-14 11:22 AM
> To: Butler, Peter
> Cc: Vlad Yasevich; linux-sctp@vger.kernel.org
> Subject: Re: Is SCTP throughput really this low compared to TCP?
>
> On 04/11/2014 05:07 PM, Butler, Peter wrote:
>> Yes indeed this is ixgbe and I do have SCTP checksum offloading 
>> enabled.  (For what
>   > it's worth, the checksum offload gives about a 20% throughput gain 
> - but this is,  > of course, already included in the numbers I posted 
> to this thread as I've been using  > the CRC offload all along.)
>
> Ok, understood.
>
>> I re-did all the tests with TSO/GSO/LRO/GRO disabled (on both sides 
>> of the association -
>   > i.e. on both endpoint nodes), and using 1452-byte messages instead of 1000-byte  > messages.  With this new setup, the TCP performance drops significantly, as expected,  > while the SCTP performance is boosted, and the playing field is somewhat more 'level'.
>   > (Note that I could not use 1464-byte messages as suggested by 
> Vlad, as anything above  > 1452 cut the SCTP performance in half - 
> must have hit the segmentation limit at this  > slightly lower message 
> size.  MTU is 1500.)
>>
>> So comparing "apples to apples" now, TCP only out-performs SCTP by 
>> approximately 40-70%
>   > over the various range of network latencies I tested with (RTTs of 0.2 ms, 10 ms, 20 ms,  > and 50 ms).  40-70% is still significant, but nowhere near the 200% better (i.e. 3  > times the throughput) I was getting before.
>>
>> Does this value (i.e. 40-70%) sound reasonable?  Is this the 
>> more-or-less accepted
>   > performance difference with the current LKSCTP implementation?
>
> Yes, that sounds reasonable to me. There are still a lot of open todos in terms of performance that we need to tackle over time, e.g. the way chunks are handled, imho, and copies involved in fast path, also that we're heavily using atomic reference counting, and other open issues.
>
>> Also, for what it's worth, I get better SCTP throughput numbers with 
>> the older kernel
>   > (3.4.2) than with the newer kernel (3.14)...
>
> Interesting, a lot of things happened in between; were you able to bisect/identify a possible commit that causes this? How big is the difference?
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Is SCTP throughput really this low compared to TCP?
  2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
                   ` (17 preceding siblings ...)
  2014-04-11 20:18 ` Butler, Peter
@ 2014-04-11 20:51 ` Vlad Yasevich
  2014-04-11 20:53 ` Vlad Yasevich
                   ` (16 subsequent siblings)
  35 siblings, 0 replies; 37+ messages in thread
From: Vlad Yasevich @ 2014-04-11 20:51 UTC (permalink / raw)
  To: linux-sctp

On 04/11/2014 03:24 PM, Butler, Peter wrote:
> I can certainly try that patch, however see the previous email where Daniel suggested that the issue may be IPv4 only.  I have subsequently tested it (email sent out 5 minutes ago) and he was right: IPv6 is smooth, whereas IPv4 is erratic.
> 
> Although even when using the smooth IPv6 behaviour, the 3.4.2 throughput is still better than 3.14; for example, 2.1 Gbps in the 'no' latency case (0.2 ms RTT) on 3.4.2 but only 1.6 Gbps with 3.14.
> 
> Should I try out the patch, or does the IPv4 vs IPv6 data shed new light on this?

No, the patch is actually wrong, so don't worry about it.
The v4 vs v6 is data is definitely something we need to address.

Thanks
-vlad

> 
> 
> 
> 
> -----Original Message-----
> From: Vlad Yasevich [mailto:vyasevich@gmail.com] 
> Sent: April-11-14 3:21 PM
> To: Butler, Peter; Daniel Borkmann
> Cc: linux-sctp@vger.kernel.org
> Subject: Re: Is SCTP throughput really this low compared to TCP?
> 
> On 04/11/2014 02:22 PM, Butler, Peter wrote:
>> The difference between 3.14 and 3.4.2 is staggering.  An order of
> magnitude or so.  For example, using the precisely same setup as before, whereas I get about 2.1 Gbps throughput with 3.4 2, I can only manage between 70-150 Mbps with 3.14 - a staggering difference.
>>
>> Moreover, the SCTP throughput seems to 'choke' itself with 3.14, such
> that it is always trying to recover.  For example, with 3.4.2 the 2.1 Gbps throughput is quite consistent from one second to the next (as you would expect):
>>
>> but with 3.14 the numbers as all over the place:
>>
>> [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t 
>> 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2 3.14.0 #1 
>> SMP Thu Apr 3 23:18:29 EDT 2014 x86_64
>> Time: Fri, 11 Apr 2014 17:56:21 GMT
>> Connecting to host 192.168.241.3, port 5201
>>       Cookie: Lab200slot2.1397238981.812898.548918
>> [  4] local 192.168.241.2 port 38616 connected to 192.168.241.3 port 
>> 5201 Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, 
>> omitting 0
> seconds, 60 second test
>> [ ID] Interval           Transfer     Bandwidth
>> [  4]   0.00-1.09   sec  20.8 MBytes   161 Mbits/sec
>> [  4]   1.09-2.13   sec  10.8 MBytes  86.8 Mbits/sec
>> [  4]   2.13-3.15   sec  3.57 MBytes  29.5 Mbits/sec
>> [  4]   3.15-4.16   sec  4.33 MBytes  35.7 Mbits/sec
>> [  4]   4.16-6.21   sec  10.4 MBytes  42.7 Mbits/sec
>> [  4]   6.21-6.21   sec  0.00 Bytes  0.00 bits/sec
>> [  4]   6.21-7.35   sec  34.6 MBytes   253 Mbits/sec
>> [  4]   7.35-11.45  sec  22.0 MBytes  45.0 Mbits/sec
>> [  4]  11.45-11.45  sec  0.00 Bytes  0.00 bits/sec [  4]  11.45-11.45  
>> sec  0.00 Bytes  0.00 bits/sec [  4]  11.45-11.45  sec  0.00 Bytes  
>> 0.00 bits/sec
>> [  4]  11.45-12.51  sec  16.0 MBytes   126 Mbits/sec
>> [  4]  12.51-13.59  sec  20.3 MBytes   158 Mbits/sec
>> [  4]  13.59-14.65  sec  13.4 MBytes   107 Mbits/sec
>> [  4]  14.65-16.79  sec  33.3 MBytes   130 Mbits/sec
>> [  4]  16.79-16.79  sec  0.00 Bytes  0.00 bits/sec [  4]  16.79-17.82  
>> sec  5.94 MBytes  48.7 Mbits/sec
>> (etc)
>>
>> Note: the difference appears to be SCTP-specific, as I get exactly the
> same TCP throughput in both kernels.
>>
> 
> Ouch.  That is not very good behavior...  I wonder if this a side-effect of the new rwnd algorithm...
> 
> In fact, I think I do see a small problem with the algorithm.
> 
> Can you try this patch:
> 
> <snipped>
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Is SCTP throughput really this low compared to TCP?
  2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
                   ` (18 preceding siblings ...)
  2014-04-11 20:51 ` Vlad Yasevich
@ 2014-04-11 20:53 ` Vlad Yasevich
  2014-04-11 20:57 ` Butler, Peter
                   ` (15 subsequent siblings)
  35 siblings, 0 replies; 37+ messages in thread
From: Vlad Yasevich @ 2014-04-11 20:53 UTC (permalink / raw)
  To: linux-sctp

On 04/11/2014 04:14 PM, Butler, Peter wrote:
> I tried setting /proc/sys/net/sctp/max_burst to 0 on both nodes (both sides of the association): doing so prevented iperf from sending any traffic at all.  Only the initial 4-way handshake and subsequent heartbeat packets were transmitted.  When I kill the process (client and/or server) it reports 0 bits/s throughput.
> 
> I then tried various different values from 1 to 16 (default was 4), they all result in about the same 2.1 Gbps throughput (for the low-latency (0.2 ms RTT) case) as originally reported.
> 
> 

I didn't realize that max_burst=0 only works in the 3.14 kernel.
However, the data for burst of 16 definitely helps.  I was concerned
that this might be a congestion window issue.

Thanks
-vlad
> 
> 
> -----Original Message-----
> From: Vlad Yasevich [mailto:vyasevich@gmail.com] 
> Sent: April-11-14 2:20 PM
> To: Butler, Peter; Daniel Borkmann
> Cc: linux-sctp@vger.kernel.org
> Subject: Re: Is SCTP throughput really this low compared to TCP?
> 
> On 04/11/2014 11:07 AM, Butler, Peter wrote:
>> Yes indeed this is ixgbe and I do have SCTP checksum offloading 
>> enabled.  (For what it's worth, the checksum offload gives about a 20% 
>> throughput gain - but this is, of course, already included in the 
>> numbers I posted to this thread as I've been using the CRC offload all 
>> along.)
>>
>> I re-did all the tests with TSO/GSO/LRO/GRO disabled (on both sides of the association - i.e. on both endpoint nodes), and using 1452-byte messages instead of 1000-byte messages.  With this new setup, the TCP performance drops significantly, as expected, while the SCTP performance is boosted, and the playing field is somewhat more 'level'.   (Note that I could not use 1464-byte messages as suggested by Vlad, as anything above 1452 cut the SCTP performance in half - must have hit the segmentation limit at this slightly lower message size.  MTU is 1500.)
>>
>> So comparing "apples to apples" now, TCP only out-performs SCTP by approximately 40-70% over the various range of network latencies I tested with (RTTs of 0.2 ms, 10 ms, 20 ms, and 50 ms).  40-70% is still significant, but nowhere near the 200% better (i.e. 3 times the throughput) I was getting before.
>>
>> Does this value (i.e. 40-70%) sound reasonable?  Is this the more-or-less accepted performance difference with the current LKSCTP implementation?
>>
>> Also, for what it's worth, I get better SCTP throughput numbers with the older kernel (3.4.2) than with the newer kernel (3.14)...
>>
> 
> Hi Peter
> 
> Could you run an experiment setting max_burst to 0?
> 
> The way the SCTP spec is written, upon every SACK, if the stack has new data to send, it will only sent max_burst*mtu bytes.  This may not always fill the current congestion window and might preclude growth.
> 
> Setting max_burst to 0 will disable burst limitation.  I am curious to see if this would impact throughput on a low-rtt link like yours.
> 
> -vlad
> 
> 
>>
>>
>> -----Original Message-----
>> From: Daniel Borkmann [mailto:dborkman@redhat.com]
>> Sent: April-11-14 3:43 AM
>> To: Vlad Yasevich
>> Cc: Butler, Peter; linux-sctp@vger.kernel.org
>> Subject: Re: Is SCTP throughput really this low compared to TCP?
>>
>> Hi Peter,
>>
>> On 04/10/2014 10:21 PM, Vlad Yasevich wrote:
>>> On 04/10/2014 03:12 PM, Butler, Peter wrote:
>>>> I've been testing SCTP throughput between two nodes over a 10Gb-Ethernet backplane, and am finding that at best, its throughput is about a third of that of TCP.  Is this number generally accepted for current LKSCTP performance?
>>>>
>>>> All TCP/SCTP tests performed with 1000-byte (payload) messages, between 8-core Xeon nodes @ 2.13GHz, with no CPU throttling (always running at 100%) on otherwise idle systems.  Test applications include netperf, iperf and proprietary in-house stubs.
>>>>
>>>> The latency between nodes is generally 0.2 ms.  Tests were run using this low-latency scenario, as well as using traffic control (tc) to simulate networks with 10 ms, 20 ms and 50 ms latency (i.e. 20 ms, 40 ms and 100 ms RTT, respectively).
>>>>
>>>> In addition, each of these network scenarios were tested using various kernel socket buffer sizes, ranging from the default kernel size (100-200 kB), to several MB for send and receive buffers, and multiple send:receive ratios for these buffer sizes (generally using larger receive buffer sizes, up to a factor of about 6).
>>>>
>>>> Finally, tests were performed on kernels as old as 3.4.2 and as recent as 3.14.
>>>>
>>>> The TCP throughput is about 3x higher than that of SCTP as a best-case scenario (i.e. from an SCTP perspective), and much higher still in worst-case scenarios.
>>>
>>> To do a more of apples-to-apples comparison, you need to disable 
>>> tso/gso on the sending node.
>>>
>>> The reason is that even if you limit buffer sizes, tcp will still try 
>>> to do tso on the transmit size, thus coalescing you 1000-byte 
>>> messages into something much larger, thus utilizing your MTU much more efficiently.
>>>
>>> SCTP, on the other hand, has to preserve message boundaries which 
>>> results in sub-optimal mtu utilization when using 1000-byte payloads.
>>>
>>> My recommendation is to use 1464 byte message for SCTP on a 1500 byte 
>>> MTU nic.
>>>
>>> I would be interested to see the results.  There could very well be issues.
>>
>> Agreed.
>>
>> Also, what NIC are you using? It seems only Intel provides SCTP checksum offloading so far, i.e. ixgbe/i40e NICs.
>>
>>> -vlad
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: Is SCTP throughput really this low compared to TCP?
  2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
                   ` (19 preceding siblings ...)
  2014-04-11 20:53 ` Vlad Yasevich
@ 2014-04-11 20:57 ` Butler, Peter
  2014-04-11 23:58 ` Daniel Borkmann
                   ` (14 subsequent siblings)
  35 siblings, 0 replies; 37+ messages in thread
From: Butler, Peter @ 2014-04-11 20:57 UTC (permalink / raw)
  To: linux-sctp

I just realized that I didn't mention which kernel I tested the max_burst values with: it was 3.4.2.  (I haven't tested it with the 3.14 kernel)

So, to recap: on 3.4.2 max_burst=0 prevented any DATA traffic from being sent, and max_burst values from 1 to 16 yielded the same throughput of ~2.1 Gbps on the low latency network setup.




-----Original Message-----
From: Vlad Yasevich [mailto:vyasevich@gmail.com] 
Sent: April-11-14 4:54 PM
To: Butler, Peter; Daniel Borkmann
Cc: linux-sctp@vger.kernel.org
Subject: Re: Is SCTP throughput really this low compared to TCP?

On 04/11/2014 04:14 PM, Butler, Peter wrote:
> I tried setting /proc/sys/net/sctp/max_burst to 0 on both nodes (both sides of the association): doing so prevented iperf from sending any traffic at all.  Only the initial 4-way handshake and subsequent heartbeat packets were transmitted.  When I kill the process (client and/or server) it reports 0 bits/s throughput.
> 
> I then tried various different values from 1 to 16 (default was 4), they all result in about the same 2.1 Gbps throughput (for the low-latency (0.2 ms RTT) case) as originally reported.
> 
> 

I didn't realize that max_burst=0 only works in the 3.14 kernel.
However, the data for burst of 16 definitely helps.  I was concerned that this might be a congestion window issue.

Thanks
-vlad
> 
> 
> -----Original Message-----
> From: Vlad Yasevich [mailto:vyasevich@gmail.com]
> Sent: April-11-14 2:20 PM
> To: Butler, Peter; Daniel Borkmann
> Cc: linux-sctp@vger.kernel.org
> Subject: Re: Is SCTP throughput really this low compared to TCP?
> 
> On 04/11/2014 11:07 AM, Butler, Peter wrote:
>> Yes indeed this is ixgbe and I do have SCTP checksum offloading 
>> enabled.  (For what it's worth, the checksum offload gives about a 
>> 20% throughput gain - but this is, of course, already included in the 
>> numbers I posted to this thread as I've been using the CRC offload 
>> all
>> along.)
>>
>> I re-did all the tests with TSO/GSO/LRO/GRO disabled (on both sides of the association - i.e. on both endpoint nodes), and using 1452-byte messages instead of 1000-byte messages.  With this new setup, the TCP performance drops significantly, as expected, while the SCTP performance is boosted, and the playing field is somewhat more 'level'.   (Note that I could not use 1464-byte messages as suggested by Vlad, as anything above 1452 cut the SCTP performance in half - must have hit the segmentation limit at this slightly lower message size.  MTU is 1500.)
>>
>> So comparing "apples to apples" now, TCP only out-performs SCTP by approximately 40-70% over the various range of network latencies I tested with (RTTs of 0.2 ms, 10 ms, 20 ms, and 50 ms).  40-70% is still significant, but nowhere near the 200% better (i.e. 3 times the throughput) I was getting before.
>>
>> Does this value (i.e. 40-70%) sound reasonable?  Is this the more-or-less accepted performance difference with the current LKSCTP implementation?
>>
>> Also, for what it's worth, I get better SCTP throughput numbers with the older kernel (3.4.2) than with the newer kernel (3.14)...
>>
> 
> Hi Peter
> 
> Could you run an experiment setting max_burst to 0?
> 
> The way the SCTP spec is written, upon every SACK, if the stack has new data to send, it will only sent max_burst*mtu bytes.  This may not always fill the current congestion window and might preclude growth.
> 
> Setting max_burst to 0 will disable burst limitation.  I am curious to see if this would impact throughput on a low-rtt link like yours.
> 
> -vlad
> 
> 
>>
>>
>> -----Original Message-----
>> From: Daniel Borkmann [mailto:dborkman@redhat.com]
>> Sent: April-11-14 3:43 AM
>> To: Vlad Yasevich
>> Cc: Butler, Peter; linux-sctp@vger.kernel.org
>> Subject: Re: Is SCTP throughput really this low compared to TCP?
>>
>> Hi Peter,
>>
>> On 04/10/2014 10:21 PM, Vlad Yasevich wrote:
>>> On 04/10/2014 03:12 PM, Butler, Peter wrote:
>>>> I've been testing SCTP throughput between two nodes over a 10Gb-Ethernet backplane, and am finding that at best, its throughput is about a third of that of TCP.  Is this number generally accepted for current LKSCTP performance?
>>>>
>>>> All TCP/SCTP tests performed with 1000-byte (payload) messages, between 8-core Xeon nodes @ 2.13GHz, with no CPU throttling (always running at 100%) on otherwise idle systems.  Test applications include netperf, iperf and proprietary in-house stubs.
>>>>
>>>> The latency between nodes is generally 0.2 ms.  Tests were run using this low-latency scenario, as well as using traffic control (tc) to simulate networks with 10 ms, 20 ms and 50 ms latency (i.e. 20 ms, 40 ms and 100 ms RTT, respectively).
>>>>
>>>> In addition, each of these network scenarios were tested using various kernel socket buffer sizes, ranging from the default kernel size (100-200 kB), to several MB for send and receive buffers, and multiple send:receive ratios for these buffer sizes (generally using larger receive buffer sizes, up to a factor of about 6).
>>>>
>>>> Finally, tests were performed on kernels as old as 3.4.2 and as recent as 3.14.
>>>>
>>>> The TCP throughput is about 3x higher than that of SCTP as a best-case scenario (i.e. from an SCTP perspective), and much higher still in worst-case scenarios.
>>>
>>> To do a more of apples-to-apples comparison, you need to disable 
>>> tso/gso on the sending node.
>>>
>>> The reason is that even if you limit buffer sizes, tcp will still 
>>> try to do tso on the transmit size, thus coalescing you 1000-byte 
>>> messages into something much larger, thus utilizing your MTU much more efficiently.
>>>
>>> SCTP, on the other hand, has to preserve message boundaries which 
>>> results in sub-optimal mtu utilization when using 1000-byte payloads.
>>>
>>> My recommendation is to use 1464 byte message for SCTP on a 1500 
>>> byte MTU nic.
>>>
>>> I would be interested to see the results.  There could very well be issues.
>>
>> Agreed.
>>
>> Also, what NIC are you using? It seems only Intel provides SCTP checksum offloading so far, i.e. ixgbe/i40e NICs.
>>
>>> -vlad
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Is SCTP throughput really this low compared to TCP?
  2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
                   ` (20 preceding siblings ...)
  2014-04-11 20:57 ` Butler, Peter
@ 2014-04-11 23:58 ` Daniel Borkmann
  2014-04-12  7:27 ` Dongsheng Song
                   ` (13 subsequent siblings)
  35 siblings, 0 replies; 37+ messages in thread
From: Daniel Borkmann @ 2014-04-11 23:58 UTC (permalink / raw)
  To: linux-sctp

On 04/11/2014 10:18 PM, Butler, Peter wrote:
> It may take a little time to do the bisection.  Can you provide me with a
 > guesstimate as to which kernel(s) may have introduced this behaviour?

Just out of curiosity, could you do a ...

git revert ef2820a735f74ea60335f8ba3801b844f0cb184d

... on your tree and try to see what you get with that?

> -----Original Message-----
> From: Daniel Borkmann [mailto:dborkman@redhat.com]
> Sent: April-11-14 2:40 PM
> To: Butler, Peter
> Cc: Vlad Yasevich; linux-sctp@vger.kernel.org
> Subject: Re: Is SCTP throughput really this low compared to TCP?
>
> On 04/11/2014 08:22 PM, Butler, Peter wrote:
>> The difference between 3.14 and 3.4.2 is staggering.  An order of
>> magnitude or so.  For example,
>   > using the precisely same setup as before, whereas I get about 2.1 Gbps throughput with 3.4 2, I  > can only manage between 70-150 Mbps with 3.14 - a staggering difference.
>>
>> Moreover, the SCTP throughput seems to 'choke' itself with 3.14, such
>> that it is always trying to
>   > recover.  For example, with 3.4.2 the 2.1 Gbps throughput is quite consistent from one second to  > the next (as you would expect):
>>
>> [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t
>> 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2
>> 3.4.2-1.fc16.x86_64 #1 SMP Thu Jun 14 20:17:26 UTC 2012 x86_64
>> Time: Fri, 11 Apr 2014 18:19:15 GMT
>> Connecting to host 192.168.241.3, port 5201
>>         Cookie: Lab200slot2.1397240355.069035.0d5b0f
>> [  4] local 192.168.241.2 port 56030 connected to 192.168.241.3 port
>> 5201 Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
>> [ ID] Interval           Transfer     Bandwidth
>> [  4]   0.00-1.00   sec   255 MBytes  2.14 Gbits/sec
>> [  4]   1.00-2.00   sec   253 MBytes  2.12 Gbits/sec
>> [  4]   2.00-3.00   sec   255 MBytes  2.14 Gbits/sec
>> [  4]   3.00-4.00   sec   255 MBytes  2.14 Gbits/sec
>> [  4]   4.00-5.00   sec   255 MBytes  2.14 Gbits/sec
>> [  4]   5.00-6.00   sec   257 MBytes  2.15 Gbits/sec
>> [  4]   6.00-7.00   sec   253 MBytes  2.13 Gbits/sec
>> [  4]   7.00-8.00   sec   254 MBytes  2.13 Gbits/sec
>> [  4]   8.00-9.00   sec   255 MBytes  2.14 Gbits/sec
>> [  4]   9.00-10.00  sec   252 MBytes  2.12 Gbits/sec
>> (etc)
>>
>> but with 3.14 the numbers as all over the place:
>>
>> [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t
>> 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2 3.14.0 #1
>> SMP Thu Apr 3 23:18:29 EDT 2014 x86_64
>> Time: Fri, 11 Apr 2014 17:56:21 GMT
>> Connecting to host 192.168.241.3, port 5201
>>         Cookie: Lab200slot2.1397238981.812898.548918
>> [  4] local 192.168.241.2 port 38616 connected to 192.168.241.3 port
>> 5201 Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
>> [ ID] Interval           Transfer     Bandwidth
>> [  4]   0.00-1.09   sec  20.8 MBytes   161 Mbits/sec
>> [  4]   1.09-2.13   sec  10.8 MBytes  86.8 Mbits/sec
>> [  4]   2.13-3.15   sec  3.57 MBytes  29.5 Mbits/sec
>> [  4]   3.15-4.16   sec  4.33 MBytes  35.7 Mbits/sec
>> [  4]   4.16-6.21   sec  10.4 MBytes  42.7 Mbits/sec
>> [  4]   6.21-6.21   sec  0.00 Bytes  0.00 bits/sec
>> [  4]   6.21-7.35   sec  34.6 MBytes   253 Mbits/sec
>> [  4]   7.35-11.45  sec  22.0 MBytes  45.0 Mbits/sec
>> [  4]  11.45-11.45  sec  0.00 Bytes  0.00 bits/sec [  4]  11.45-11.45
>> sec  0.00 Bytes  0.00 bits/sec [  4]  11.45-11.45  sec  0.00 Bytes
>> 0.00 bits/sec
>> [  4]  11.45-12.51  sec  16.0 MBytes   126 Mbits/sec
>> [  4]  12.51-13.59  sec  20.3 MBytes   158 Mbits/sec
>> [  4]  13.59-14.65  sec  13.4 MBytes   107 Mbits/sec
>> [  4]  14.65-16.79  sec  33.3 MBytes   130 Mbits/sec
>> [  4]  16.79-16.79  sec  0.00 Bytes  0.00 bits/sec [  4]  16.79-17.82
>> sec  5.94 MBytes  48.7 Mbits/sec
>> (etc)
>>
>> Note: the difference appears to be SCTP-specific, as I get exactly the
>> same TCP
>   > throughput in both kernels.
>
> Hmm, okay. :/ Could you further bisect on your side to narrow down from which kernel onwards this behaviour can be seen?
>
> Thanks,
>
> Daniel
>
>> -----Original Message-----
>> From: Daniel Borkmann [mailto:dborkman@redhat.com]
>> Sent: April-11-14 11:22 AM
>> To: Butler, Peter
>> Cc: Vlad Yasevich; linux-sctp@vger.kernel.org
>> Subject: Re: Is SCTP throughput really this low compared to TCP?
>>
>> On 04/11/2014 05:07 PM, Butler, Peter wrote:
>>> Yes indeed this is ixgbe and I do have SCTP checksum offloading
>>> enabled.  (For what
>>    > it's worth, the checksum offload gives about a 20% throughput gain
>> - but this is,  > of course, already included in the numbers I posted
>> to this thread as I've been using  > the CRC offload all along.)
>>
>> Ok, understood.
>>
>>> I re-did all the tests with TSO/GSO/LRO/GRO disabled (on both sides
>>> of the association -
>>    > i.e. on both endpoint nodes), and using 1452-byte messages instead of 1000-byte  > messages.  With this new setup, the TCP performance drops significantly, as expected,  > while the SCTP performance is boosted, and the playing field is somewhat more 'level'.
>>    > (Note that I could not use 1464-byte messages as suggested by
>> Vlad, as anything above  > 1452 cut the SCTP performance in half -
>> must have hit the segmentation limit at this  > slightly lower message
>> size.  MTU is 1500.)
>>>
>>> So comparing "apples to apples" now, TCP only out-performs SCTP by
>>> approximately 40-70%
>>    > over the various range of network latencies I tested with (RTTs of 0.2 ms, 10 ms, 20 ms,  > and 50 ms).  40-70% is still significant, but nowhere near the 200% better (i.e. 3  > times the throughput) I was getting before.
>>>
>>> Does this value (i.e. 40-70%) sound reasonable?  Is this the
>>> more-or-less accepted
>>    > performance difference with the current LKSCTP implementation?
>>
>> Yes, that sounds reasonable to me. There are still a lot of open todos in terms of performance that we need to tackle over time, e.g. the way chunks are handled, imho, and copies involved in fast path, also that we're heavily using atomic reference counting, and other open issues.
>>
>>> Also, for what it's worth, I get better SCTP throughput numbers with
>>> the older kernel
>>    > (3.4.2) than with the newer kernel (3.14)...
>>
>> Interesting, a lot of things happened in between; were you able to bisect/identify a possible commit that causes this? How big is the difference?
>>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Is SCTP throughput really this low compared to TCP?
  2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
                   ` (21 preceding siblings ...)
  2014-04-11 23:58 ` Daniel Borkmann
@ 2014-04-12  7:27 ` Dongsheng Song
  2014-04-14 14:52 ` Butler, Peter
                   ` (12 subsequent siblings)
  35 siblings, 0 replies; 37+ messages in thread
From: Dongsheng Song @ 2014-04-12  7:27 UTC (permalink / raw)
  To: linux-sctp

Hi all,

I found a strange SCTP stream behavior, when the message size equal to
the recv buffer size, the performance is VERY LOW ( just 1% !!!). TCP
no such issue, just drop 20%.

*) SCTP message size = 128K, recvbuffer size = 128K
netperf -H 10.0.0.201 -p 1234 -l 10 -t SCTP_STREAM -- -m 128K -S 64K
SCTP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.201
() port 0 AF_INET
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

131072 212992 131072    10.04       6.16

*) message size = 64K, recvbuffer size = 128K
Then I double the send buffer size:
netperf -H 10.0.0.201 -p 1234 -l 10 -t SCTP_STREAM -- -m 64K -S 64K
SCTP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.201
() port 0 AF_INET
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

131072 212992  65536    10.00     596.26

But TCP no such issue,

*) TCP message size = 128K, recvbuffer size = 128K
netperf -H 10.0.0.201 -p 1234 -l 10 -t TCP_STREAM -- -m 128K -S 64K
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
10.0.0.201 () port 0 AF_INET
./netperf -H 10.0.0.201 -p 1234 -l 10 -t TCP_STREAM -- -m 128K -S 64K
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

131072  16384 131072    10.00    2368.84

*) TCP message size = 64K, recvbuffer size = 128K
netperf -H 10.0.0.201 -p 1234 -l 10 -t TCP_STREAM -- -m 128K -S 128K
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
10.0.0.201 () port 0 AF_INET
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

262144  16384 131072    10.00    2993.15

I'm doing these test 2 VMware VMs with Intel Core i3-2120 @ 3.30GHz,
running Ubuntu 14.04 LTS, Linux kernel3.13.0-24-generic.
SCTP_RR is very good, but SCTP_STREAM have many improve room:

[9250.98] netperf -H 10.0.0.201 -p 1234 -l 10 -t UDP_RR  -- -r 960,512
[9070.85] netperf -H 10.0.0.201 -p 1234 -l 10 -t TCP_RR  -- -r 960,512
[2127.47] netperf -H 10.0.0.201 -p 1234 -l 10 -t TCP_CRR -- -r 960,512
[8126.60] netperf -H 10.0.0.201 -p 1234 -l 10 -t SCTP_RR -- -r 960,512

[3380.84] netperf -H 10.0.0.201 -p 1234 -l 10 -t TCP_STREAM   -- -m 16384
[ 952.17] netperf -H 10.0.0.201 -p 1234 -l 10 -t UDP_STREAM   -- -m 65504
[ 699.98] netperf -H 10.0.0.201 -p 1234 -l 10 -t SCTP_STREAM  -- -m
128K -s 512K -M 128K -S 256K

Regards,
Dongsheng

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: Is SCTP throughput really this low compared to TCP?
  2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
                   ` (22 preceding siblings ...)
  2014-04-12  7:27 ` Dongsheng Song
@ 2014-04-14 14:52 ` Butler, Peter
  2014-04-14 15:49 ` Butler, Peter
                   ` (11 subsequent siblings)
  35 siblings, 0 replies; 37+ messages in thread
From: Butler, Peter @ 2014-04-14 14:52 UTC (permalink / raw)
  To: linux-sctp

I started the bisection process - again this will take some time and I (unfortunately) can't dedicate all my time to this at work (other fires for me to put out here as well).  However on a hunch, based on TIPC work I had previously done that seemed to suggest that the 3.10.x kernel stream was the last stream before significant changes were made to the underlying net API, I tested the latest kernel in this stream, namely 3.10.36.

As I suspected, this kernel is still 'good' - that is, there is no erratic behaviour with SCTP/IPv4 and the throughput.  However the throughput is still better with the 3.4 2 kernel (2.1 Gbps as opposed to 1.75 Gbps).

SUMMARY so far for SCTP+IPv4:

3.4.2 kernel:  smooth throughput 2.1 Gbps
3.10.36 kernel: smooth throughput 1.75 Gbps
3.14 kernel: highly erratic, terrible throughput




-----Original Message-----
From: Daniel Borkmann [mailto:dborkman@redhat.com] 
Sent: April-11-14 2:40 PM
To: Butler, Peter
Cc: Vlad Yasevich; linux-sctp@vger.kernel.org
Subject: Re: Is SCTP throughput really this low compared to TCP?

On 04/11/2014 08:22 PM, Butler, Peter wrote:
> The difference between 3.14 and 3.4.2 is staggering.  An order of 
> magnitude or so.  For example,
 > using the precisely same setup as before, whereas I get about 2.1 Gbps throughput with 3.4 2, I  > can only manage between 70-150 Mbps with 3.14 - a staggering difference.
>
> Moreover, the SCTP throughput seems to 'choke' itself with 3.14, such 
> that it is always trying to
 > recover.  For example, with 3.4.2 the 2.1 Gbps throughput is quite consistent from one second to  > the next (as you would expect):
>
> [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t 
> 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2 
> 3.4.2-1.fc16.x86_64 #1 SMP Thu Jun 14 20:17:26 UTC 2012 x86_64
> Time: Fri, 11 Apr 2014 18:19:15 GMT
> Connecting to host 192.168.241.3, port 5201
>        Cookie: Lab200slot2.1397240355.069035.0d5b0f
> [  4] local 192.168.241.2 port 56030 connected to 192.168.241.3 port 
> 5201 Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
> [ ID] Interval           Transfer     Bandwidth
> [  4]   0.00-1.00   sec   255 MBytes  2.14 Gbits/sec
> [  4]   1.00-2.00   sec   253 MBytes  2.12 Gbits/sec
> [  4]   2.00-3.00   sec   255 MBytes  2.14 Gbits/sec
> [  4]   3.00-4.00   sec   255 MBytes  2.14 Gbits/sec
> [  4]   4.00-5.00   sec   255 MBytes  2.14 Gbits/sec
> [  4]   5.00-6.00   sec   257 MBytes  2.15 Gbits/sec
> [  4]   6.00-7.00   sec   253 MBytes  2.13 Gbits/sec
> [  4]   7.00-8.00   sec   254 MBytes  2.13 Gbits/sec
> [  4]   8.00-9.00   sec   255 MBytes  2.14 Gbits/sec
> [  4]   9.00-10.00  sec   252 MBytes  2.12 Gbits/sec
> (etc)
>
> but with 3.14 the numbers as all over the place:
>
> [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t 
> 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2 3.14.0 #1 
> SMP Thu Apr 3 23:18:29 EDT 2014 x86_64
> Time: Fri, 11 Apr 2014 17:56:21 GMT
> Connecting to host 192.168.241.3, port 5201
>        Cookie: Lab200slot2.1397238981.812898.548918
> [  4] local 192.168.241.2 port 38616 connected to 192.168.241.3 port 
> 5201 Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
> [ ID] Interval           Transfer     Bandwidth
> [  4]   0.00-1.09   sec  20.8 MBytes   161 Mbits/sec
> [  4]   1.09-2.13   sec  10.8 MBytes  86.8 Mbits/sec
> [  4]   2.13-3.15   sec  3.57 MBytes  29.5 Mbits/sec
> [  4]   3.15-4.16   sec  4.33 MBytes  35.7 Mbits/sec
> [  4]   4.16-6.21   sec  10.4 MBytes  42.7 Mbits/sec
> [  4]   6.21-6.21   sec  0.00 Bytes  0.00 bits/sec
> [  4]   6.21-7.35   sec  34.6 MBytes   253 Mbits/sec
> [  4]   7.35-11.45  sec  22.0 MBytes  45.0 Mbits/sec
> [  4]  11.45-11.45  sec  0.00 Bytes  0.00 bits/sec [  4]  11.45-11.45  
> sec  0.00 Bytes  0.00 bits/sec [  4]  11.45-11.45  sec  0.00 Bytes  
> 0.00 bits/sec
> [  4]  11.45-12.51  sec  16.0 MBytes   126 Mbits/sec
> [  4]  12.51-13.59  sec  20.3 MBytes   158 Mbits/sec
> [  4]  13.59-14.65  sec  13.4 MBytes   107 Mbits/sec
> [  4]  14.65-16.79  sec  33.3 MBytes   130 Mbits/sec
> [  4]  16.79-16.79  sec  0.00 Bytes  0.00 bits/sec [  4]  16.79-17.82  
> sec  5.94 MBytes  48.7 Mbits/sec
> (etc)
>
> Note: the difference appears to be SCTP-specific, as I get exactly the 
> same TCP
 > throughput in both kernels.

Hmm, okay. :/ Could you further bisect on your side to narrow down from which kernel onwards this behaviour can be seen?

Thanks,

Daniel

> -----Original Message-----
> From: Daniel Borkmann [mailto:dborkman@redhat.com]
> Sent: April-11-14 11:22 AM
> To: Butler, Peter
> Cc: Vlad Yasevich; linux-sctp@vger.kernel.org
> Subject: Re: Is SCTP throughput really this low compared to TCP?
>
> On 04/11/2014 05:07 PM, Butler, Peter wrote:
>> Yes indeed this is ixgbe and I do have SCTP checksum offloading 
>> enabled.  (For what
>   > it's worth, the checksum offload gives about a 20% throughput gain 
> - but this is,  > of course, already included in the numbers I posted 
> to this thread as I've been using  > the CRC offload all along.)
>
> Ok, understood.
>
>> I re-did all the tests with TSO/GSO/LRO/GRO disabled (on both sides 
>> of the association -
>   > i.e. on both endpoint nodes), and using 1452-byte messages instead of 1000-byte  > messages.  With this new setup, the TCP performance drops significantly, as expected,  > while the SCTP performance is boosted, and the playing field is somewhat more 'level'.
>   > (Note that I could not use 1464-byte messages as suggested by 
> Vlad, as anything above  > 1452 cut the SCTP performance in half - 
> must have hit the segmentation limit at this  > slightly lower message 
> size.  MTU is 1500.)
>>
>> So comparing "apples to apples" now, TCP only out-performs SCTP by 
>> approximately 40-70%
>   > over the various range of network latencies I tested with (RTTs of 0.2 ms, 10 ms, 20 ms,  > and 50 ms).  40-70% is still significant, but nowhere near the 200% better (i.e. 3  > times the throughput) I was getting before.
>>
>> Does this value (i.e. 40-70%) sound reasonable?  Is this the 
>> more-or-less accepted
>   > performance difference with the current LKSCTP implementation?
>
> Yes, that sounds reasonable to me. There are still a lot of open todos in terms of performance that we need to tackle over time, e.g. the way chunks are handled, imho, and copies involved in fast path, also that we're heavily using atomic reference counting, and other open issues.
>
>> Also, for what it's worth, I get better SCTP throughput numbers with 
>> the older kernel
>   > (3.4.2) than with the newer kernel (3.14)...
>
> Interesting, a lot of things happened in between; were you able to bisect/identify a possible commit that causes this? How big is the difference?
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: Is SCTP throughput really this low compared to TCP?
  2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
                   ` (23 preceding siblings ...)
  2014-04-14 14:52 ` Butler, Peter
@ 2014-04-14 15:49 ` Butler, Peter
  2014-04-14 16:43 ` Butler, Peter
                   ` (10 subsequent siblings)
  35 siblings, 0 replies; 37+ messages in thread
From: Butler, Peter @ 2014-04-14 15:49 UTC (permalink / raw)
  To: linux-sctp

Here are some perf numbers.  Note that these were obtained with operf/opreport.  Only the top 20 or so lines are shown here.

Identical load test performed on 3.4.2 and 3.14.

3.4.2:

CPU: Intel Core/i7, speed 2.134e+06 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000
samples  cumulative samples  %        cumulative %     linenr info                 image name               symbol name
130002   130002         4.9435   4.9435    copy_user_64.S:240          vmlinux                  copy_user_generic_string
125955   255957         4.7896   9.7330    memcpy_64.S:59              vmlinux                  memcpy
71026    326983         2.7008  12.4339    spinlock.c:136              vmlinux                  _raw_spin_lock
57138    384121         2.1727  14.6066    slub.c:409                  vmlinux                  cmpxchg_double_slab
56559    440680         2.1507  16.7573    slub.c:2208                 vmlinux                  __slab_alloc
53058    493738         2.0176  18.7749    ixgbe_main.c:2952           ixgbe.ko                 ixgbe_poll
51541    545279         1.9599  20.7348    slub.c:2439                 vmlinux                  __slab_free
49916    595195         1.8981  22.6329    ip_tables.c:294             vmlinux                  ipt_do_table
42406    637601         1.6125  24.2455    ixgbe_main.c:7824           ixgbe.ko                 ixgbe_xmit_frame_ring
40929    678530         1.5564  25.8018    slub.c:3463                 vmlinux                  kfree
40349    718879         1.5343  27.3361    core.c:132                  vmlinux                  nf_iterate
35521    754400         1.3507  28.6869    output.c:347                sctp.ko                  sctp_packet_transmit
34071    788471         1.2956  29.9825    outqueue.c:1342             sctp.ko                  sctp_check_transmitted
33962    822433         1.2914  31.2739    slub.c:2601                 vmlinux                  kmem_cache_free
33450    855883         1.2720  32.5459    outqueue.c:735              sctp.ko                  sctp_outq_flush
33005    888888         1.2551  33.8009    skbuff.c:172                vmlinux                  __alloc_skb
29231    918119         1.1115  34.9125    socket.c:1565               sctp.ko                  sctp_sendmsg
27950    946069         1.0628  35.9753    (no location information)   libc-2.14.90.so          __memmove_ssse3_back
26718    972787         1.0160  36.9913    (no location information)   nf_conntrack.ko          nf_conntrack_in
26589    999376         1.0111  38.0023    slub.c:4049                 vmlinux                  __kmalloc_node_track_caller
26449    1025825        1.0058  39.0081    slub.c:2375                 vmlinux                  kmem_cache_alloc
26211    1052036        0.9967  40.0048    sm_sideeffect.c:1074        sctp.ko                  sctp_do_sm
25527    1077563        0.9707  40.9755    slub.c:2404                 vmlinux                  kmem_cache_alloc_node
23970    1101533        0.9115  41.8870    (no location information)   libc-2.14.90.so          _int_free
23266    1124799        0.8847  42.7717    memset_64.S:62              vmlinux                  memset
22976    1147775        0.8737  43.6454    (no location information)   nf_conntrack.ko          hash_conntrack_raw
21855    1169630        0.8311  44.4764    chunk.c:175                 sctp.ko                  sctp_datamsg_from_user
21730    1191360        0.8263  45.3027    list_debug.c:24             vmlinux                  __list_add
21252    1212612        0.8081  46.1109    dev.c:3151                  vmlinux                  __netif_receive_skb
20742    1233354        0.7887  46.8996    (no location information)   libc-2.14.90.so          _int_malloc
19955    1253309        0.7588  47.6584    input.c:130                 sctp.ko                  sctp_rcv



3.14:

CPU: Intel Core/i7, speed 2.134e+06 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000
samples  cumulative samples  %        cumulative %     linenr info                 image name               symbol name
168021   168021         6.1446   6.1446    copy_user_64.S:183          vmlinux-3.14.0           copy_user_generic_string
85199    253220         3.1158   9.2604    memcpy_64.S:59              vmlinux-3.14.0           memcpy
80133    333353         2.9305  12.1909    spinlock.c:174              vmlinux-3.14.0           _raw_spin_lock_bh
74086    407439         2.7094  14.9003    spinlock.c:150              vmlinux-3.14.0           _raw_spin_lock
51878    459317         1.8972  16.7975    ixgbe_main.c:6930           ixgbe.ko                 ixgbe_xmit_frame_ring
49354    508671         1.8049  18.6024    slub.c:2538                 vmlinux-3.14.0           __slab_free
39103    547774         1.4300  20.0324    outqueue.c:706              sctp.ko                  sctp_outq_flush
37775    585549         1.3815  21.4139    outqueue.c:1304             sctp.ko                  sctp_check_transmitted
37514    623063         1.3719  22.7858    output.c:380                sctp.ko                  sctp_packet_transmit
37320    660383         1.3648  24.1506    slub.c:2700                 vmlinux-3.14.0           kmem_cache_free
36147    696530         1.3219  25.4725    ip_tables.c:294             vmlinux-3.14.0           ipt_do_table
35494    732024         1.2980  26.7705    sm_sideeffect.c:1100        sctp.ko                  sctp_do_sm
35452    767476         1.2965  28.0670    core.c:135                  vmlinux-3.14.0           nf_iterate
34697    802173         1.2689  29.3359    slub.c:2281                 vmlinux-3.14.0           __slab_alloc
33890    836063         1.2394  30.5753    slub.c:415                  vmlinux-3.14.0           cmpxchg_double_slab
33566    869629         1.2275  31.8028    (no location information)   libc-2.14.90.so          _int_free
33228    902857         1.2152  33.0180    socket.c:1590               sctp.ko                  sctp_sendmsg
32774    935631         1.1986  34.2166    slub.c:3381                 vmlinux-3.14.0           kfree
30359    965990         1.1102  35.3268    (no location information)   libc-2.14.90.so          __memmove_ssse3_back
28905    994895         1.0571  36.3839    list_debug.c:25             vmlinux-3.14.0           __list_add
25888    1020783        0.9467  37.3306    skbuff.c:199                vmlinux-3.14.0           __alloc_skb
25490    1046273        0.9322  38.2628    fib_trie.c:1399             vmlinux-3.14.0           fib_table_lookup
25232    1071505        0.9227  39.1856    nf_conntrack_core.c:376     nf_conntrack.ko          __nf_conntrack_find_get
24114    1095619        0.8819  40.0674    chunk.c:168                 sctp.ko                  sctp_datamsg_from_user
24067    1119686        0.8801  40.9476    ixgbe_main.c:2020           ixgbe.ko                 ixgbe_clean_rx_irq
23972    1143658        0.8767  41.8242    (no location information)   libc-2.14.90.so          _int_malloc
22117    1165775        0.8088  42.6331    ip_output.c:215             vmlinux-3.14.0           ip_finish_output
22037    1187812        0.8059  43.4390    slub.c:3854                 vmlinux-3.14.0           __kmalloc_node_track_caller
21847    1209659        0.7990  44.2379    dev.c:2546                  vmlinux-3.14.0           dev_hard_start_xmit
21564    1231223        0.7886  45.0265    slub.c:2481                 vmlinux-3.14.0           kmem_cache_alloc
20659    1251882        0.7555  45.7820    socket.c:2049               sctp.ko                  sctp_recvmsg







-----Original Message-----
From: Vlad Yasevich [mailto:vyasevich@gmail.com] 
Sent: April-11-14 11:28 AM
To: Butler, Peter; Daniel Borkmann
Cc: linux-sctp@vger.kernel.org
Subject: Re: Is SCTP throughput really this low compared to TCP?

On 04/11/2014 11:07 AM, Butler, Peter wrote:
> Yes indeed this is ixgbe and I do have SCTP checksum offloading 
> enabled.  (For what it's worth, the checksum offload gives about a 20% 
> throughput gain - but this is, of course, already included in the 
> numbers I posted to this thread as I've been using the CRC offload all 
> along.)
> 
> I re-did all the tests with TSO/GSO/LRO/GRO disabled (on both sides of the association - i.e. on both endpoint nodes), and using 1452-byte messages instead of 1000-byte messages.  With this new setup, the TCP performance drops significantly, as expected, while the SCTP performance is boosted, and the playing field is somewhat more 'level'.   (Note that I could not use 1464-byte messages as suggested by Vlad, as anything above 1452 cut the SCTP performance in half - must have hit the segmentation limit at this slightly lower message size.  MTU is 1500.)
> 
> So comparing "apples to apples" now, TCP only out-performs SCTP by approximately 40-70% over the various range of network latencies I tested with (RTTs of 0.2 ms, 10 ms, 20 ms, and 50 ms).  40-70% is still significant, but nowhere near the 200% better (i.e. 3 times the throughput) I was getting before.
> 
> Does this value (i.e. 40-70%) sound reasonable? 

This still looks high.  Could you run 'perf record -a' and 'perf report'
to see where we are spending all of our time in sctp.

My guess is that a lot of it is going to be in memcpy(), but I am curious.

> Is this the more-or-less accepted performance difference with the current LKSCTP implementation?
> 
> Also, for what it's worth, I get better SCTP throughput numbers with the older kernel (3.4.2) than with the newer kernel (3.14)...
> 

That's interesting.  I'll have to look at see what might have changed here.

-vlad

> 
> 
> -----Original Message-----
> From: Daniel Borkmann [mailto:dborkman@redhat.com]
> Sent: April-11-14 3:43 AM
> To: Vlad Yasevich
> Cc: Butler, Peter; linux-sctp@vger.kernel.org
> Subject: Re: Is SCTP throughput really this low compared to TCP?
> 
> Hi Peter,
> 
> On 04/10/2014 10:21 PM, Vlad Yasevich wrote:
>> On 04/10/2014 03:12 PM, Butler, Peter wrote:
>>> I've been testing SCTP throughput between two nodes over a 10Gb-Ethernet backplane, and am finding that at best, its throughput is about a third of that of TCP.  Is this number generally accepted for current LKSCTP performance?
>>>
>>> All TCP/SCTP tests performed with 1000-byte (payload) messages, between 8-core Xeon nodes @ 2.13GHz, with no CPU throttling (always running at 100%) on otherwise idle systems.  Test applications include netperf, iperf and proprietary in-house stubs.
>>>
>>> The latency between nodes is generally 0.2 ms.  Tests were run using this low-latency scenario, as well as using traffic control (tc) to simulate networks with 10 ms, 20 ms and 50 ms latency (i.e. 20 ms, 40 ms and 100 ms RTT, respectively).
>>>
>>> In addition, each of these network scenarios were tested using various kernel socket buffer sizes, ranging from the default kernel size (100-200 kB), to several MB for send and receive buffers, and multiple send:receive ratios for these buffer sizes (generally using larger receive buffer sizes, up to a factor of about 6).
>>>
>>> Finally, tests were performed on kernels as old as 3.4.2 and as recent as 3.14.
>>>
>>> The TCP throughput is about 3x higher than that of SCTP as a best-case scenario (i.e. from an SCTP perspective), and much higher still in worst-case scenarios.
>>
>> To do a more of apples-to-apples comparison, you need to disable 
>> tso/gso on the sending node.
>>
>> The reason is that even if you limit buffer sizes, tcp will still try 
>> to do tso on the transmit size, thus coalescing you 1000-byte 
>> messages into something much larger, thus utilizing your MTU much more efficiently.
>>
>> SCTP, on the other hand, has to preserve message boundaries which 
>> results in sub-optimal mtu utilization when using 1000-byte payloads.
>>
>> My recommendation is to use 1464 byte message for SCTP on a 1500 byte 
>> MTU nic.
>>
>> I would be interested to see the results.  There could very well be issues.
> 
> Agreed.
> 
> Also, what NIC are you using? It seems only Intel provides SCTP checksum offloading so far, i.e. ixgbe/i40e NICs.
> 
>> -vlad


^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: Is SCTP throughput really this low compared to TCP?
  2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
                   ` (24 preceding siblings ...)
  2014-04-14 15:49 ` Butler, Peter
@ 2014-04-14 16:43 ` Butler, Peter
  2014-04-14 16:45 ` Daniel Borkmann
                   ` (9 subsequent siblings)
  35 siblings, 0 replies; 37+ messages in thread
From: Butler, Peter @ 2014-04-14 16:43 UTC (permalink / raw)
  To: linux-sctp

With the git revert command applied, the SCTP+IPv4 performance on 3.14.0 is back to normal!  Same throughput as 3.4.2 and steady and smooth:

[root@Lab200slot2 ~]#  iperf3 --sctp -4 -c 192.168.240.3 -V -l 1452 -t 60
iperf version 3.0.1 (10 January 2014)
Linux Lab200slot2 3.14.0+ #1 SMP Mon Apr 14 12:06:40 EDT 2014 x86_64
Time: Mon, 14 Apr 2014 16:40:48 GMT
Connecting to host 192.168.240.3, port 5201
      Cookie: Lab200slot2.1397493648.413274.65e131
[  4] local 192.168.240.2 port 50548 connected to 192.168.240.3 port 5201
Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec   240 MBytes  2.02 Gbits/sec
[  4]   1.00-2.00   sec   239 MBytes  2.01 Gbits/sec
[  4]   2.00-3.00   sec   240 MBytes  2.01 Gbits/sec
[  4]   3.00-4.00   sec   239 MBytes  2.00 Gbits/sec
[  4]   4.00-5.00   sec   245 MBytes  2.05 Gbits/sec
[  4]   5.00-6.00   sec   240 MBytes  2.01 Gbits/sec
[  4]   6.00-7.00   sec   240 MBytes  2.02 Gbits/sec
[  4]   7.00-8.00   sec   239 MBytes  2.01 Gbits/sec






-----Original Message-----
From: Daniel Borkmann [mailto:dborkman@redhat.com] 
Sent: April-11-14 7:58 PM
To: Butler, Peter
Cc: Vlad Yasevich; linux-sctp@vger.kernel.org
Subject: Re: Is SCTP throughput really this low compared to TCP?

On 04/11/2014 10:18 PM, Butler, Peter wrote:
> It may take a little time to do the bisection.  Can you provide me 
> with a
 > guesstimate as to which kernel(s) may have introduced this behaviour?

Just out of curiosity, could you do a ...

git revert ef2820a735f74ea60335f8ba3801b844f0cb184d

... on your tree and try to see what you get with that?

> -----Original Message-----
> From: Daniel Borkmann [mailto:dborkman@redhat.com]
> Sent: April-11-14 2:40 PM
> To: Butler, Peter
> Cc: Vlad Yasevich; linux-sctp@vger.kernel.org
> Subject: Re: Is SCTP throughput really this low compared to TCP?
>
> On 04/11/2014 08:22 PM, Butler, Peter wrote:
>> The difference between 3.14 and 3.4.2 is staggering.  An order of 
>> magnitude or so.  For example,
>   > using the precisely same setup as before, whereas I get about 2.1 Gbps throughput with 3.4 2, I  > can only manage between 70-150 Mbps with 3.14 - a staggering difference.
>>
>> Moreover, the SCTP throughput seems to 'choke' itself with 3.14, such 
>> that it is always trying to
>   > recover.  For example, with 3.4.2 the 2.1 Gbps throughput is quite consistent from one second to  > the next (as you would expect):
>>
>> [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t
>> 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2
>> 3.4.2-1.fc16.x86_64 #1 SMP Thu Jun 14 20:17:26 UTC 2012 x86_64
>> Time: Fri, 11 Apr 2014 18:19:15 GMT
>> Connecting to host 192.168.241.3, port 5201
>>         Cookie: Lab200slot2.1397240355.069035.0d5b0f
>> [  4] local 192.168.241.2 port 56030 connected to 192.168.241.3 port
>> 5201 Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
>> [ ID] Interval           Transfer     Bandwidth
>> [  4]   0.00-1.00   sec   255 MBytes  2.14 Gbits/sec
>> [  4]   1.00-2.00   sec   253 MBytes  2.12 Gbits/sec
>> [  4]   2.00-3.00   sec   255 MBytes  2.14 Gbits/sec
>> [  4]   3.00-4.00   sec   255 MBytes  2.14 Gbits/sec
>> [  4]   4.00-5.00   sec   255 MBytes  2.14 Gbits/sec
>> [  4]   5.00-6.00   sec   257 MBytes  2.15 Gbits/sec
>> [  4]   6.00-7.00   sec   253 MBytes  2.13 Gbits/sec
>> [  4]   7.00-8.00   sec   254 MBytes  2.13 Gbits/sec
>> [  4]   8.00-9.00   sec   255 MBytes  2.14 Gbits/sec
>> [  4]   9.00-10.00  sec   252 MBytes  2.12 Gbits/sec
>> (etc)
>>
>> but with 3.14 the numbers as all over the place:
>>
>> [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t
>> 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2 3.14.0 #1 
>> SMP Thu Apr 3 23:18:29 EDT 2014 x86_64
>> Time: Fri, 11 Apr 2014 17:56:21 GMT
>> Connecting to host 192.168.241.3, port 5201
>>         Cookie: Lab200slot2.1397238981.812898.548918
>> [  4] local 192.168.241.2 port 38616 connected to 192.168.241.3 port
>> 5201 Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
>> [ ID] Interval           Transfer     Bandwidth
>> [  4]   0.00-1.09   sec  20.8 MBytes   161 Mbits/sec
>> [  4]   1.09-2.13   sec  10.8 MBytes  86.8 Mbits/sec
>> [  4]   2.13-3.15   sec  3.57 MBytes  29.5 Mbits/sec
>> [  4]   3.15-4.16   sec  4.33 MBytes  35.7 Mbits/sec
>> [  4]   4.16-6.21   sec  10.4 MBytes  42.7 Mbits/sec
>> [  4]   6.21-6.21   sec  0.00 Bytes  0.00 bits/sec
>> [  4]   6.21-7.35   sec  34.6 MBytes   253 Mbits/sec
>> [  4]   7.35-11.45  sec  22.0 MBytes  45.0 Mbits/sec
>> [  4]  11.45-11.45  sec  0.00 Bytes  0.00 bits/sec [  4]  11.45-11.45 
>> sec  0.00 Bytes  0.00 bits/sec [  4]  11.45-11.45  sec  0.00 Bytes
>> 0.00 bits/sec
>> [  4]  11.45-12.51  sec  16.0 MBytes   126 Mbits/sec
>> [  4]  12.51-13.59  sec  20.3 MBytes   158 Mbits/sec
>> [  4]  13.59-14.65  sec  13.4 MBytes   107 Mbits/sec
>> [  4]  14.65-16.79  sec  33.3 MBytes   130 Mbits/sec
>> [  4]  16.79-16.79  sec  0.00 Bytes  0.00 bits/sec [  4]  16.79-17.82 
>> sec  5.94 MBytes  48.7 Mbits/sec
>> (etc)
>>
>> Note: the difference appears to be SCTP-specific, as I get exactly 
>> the same TCP
>   > throughput in both kernels.
>
> Hmm, okay. :/ Could you further bisect on your side to narrow down from which kernel onwards this behaviour can be seen?
>
> Thanks,
>
> Daniel
>
>> -----Original Message-----
>> From: Daniel Borkmann [mailto:dborkman@redhat.com]
>> Sent: April-11-14 11:22 AM
>> To: Butler, Peter
>> Cc: Vlad Yasevich; linux-sctp@vger.kernel.org
>> Subject: Re: Is SCTP throughput really this low compared to TCP?
>>
>> On 04/11/2014 05:07 PM, Butler, Peter wrote:
>>> Yes indeed this is ixgbe and I do have SCTP checksum offloading 
>>> enabled.  (For what
>>    > it's worth, the checksum offload gives about a 20% throughput 
>> gain
>> - but this is,  > of course, already included in the numbers I posted 
>> to this thread as I've been using  > the CRC offload all along.)
>>
>> Ok, understood.
>>
>>> I re-did all the tests with TSO/GSO/LRO/GRO disabled (on both sides 
>>> of the association -
>>    > i.e. on both endpoint nodes), and using 1452-byte messages instead of 1000-byte  > messages.  With this new setup, the TCP performance drops significantly, as expected,  > while the SCTP performance is boosted, and the playing field is somewhat more 'level'.
>>    > (Note that I could not use 1464-byte messages as suggested by 
>> Vlad, as anything above  > 1452 cut the SCTP performance in half - 
>> must have hit the segmentation limit at this  > slightly lower 
>> message size.  MTU is 1500.)
>>>
>>> So comparing "apples to apples" now, TCP only out-performs SCTP by 
>>> approximately 40-70%
>>    > over the various range of network latencies I tested with (RTTs of 0.2 ms, 10 ms, 20 ms,  > and 50 ms).  40-70% is still significant, but nowhere near the 200% better (i.e. 3  > times the throughput) I was getting before.
>>>
>>> Does this value (i.e. 40-70%) sound reasonable?  Is this the 
>>> more-or-less accepted
>>    > performance difference with the current LKSCTP implementation?
>>
>> Yes, that sounds reasonable to me. There are still a lot of open todos in terms of performance that we need to tackle over time, e.g. the way chunks are handled, imho, and copies involved in fast path, also that we're heavily using atomic reference counting, and other open issues.
>>
>>> Also, for what it's worth, I get better SCTP throughput numbers with 
>>> the older kernel
>>    > (3.4.2) than with the newer kernel (3.14)...
>>
>> Interesting, a lot of things happened in between; were you able to bisect/identify a possible commit that causes this? How big is the difference?
>>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Is SCTP throughput really this low compared to TCP?
  2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
                   ` (25 preceding siblings ...)
  2014-04-14 16:43 ` Butler, Peter
@ 2014-04-14 16:45 ` Daniel Borkmann
  2014-04-14 16:47 ` Butler, Peter
                   ` (8 subsequent siblings)
  35 siblings, 0 replies; 37+ messages in thread
From: Daniel Borkmann @ 2014-04-14 16:45 UTC (permalink / raw)
  To: linux-sctp

Ok, thanks! I'll send out a revert for now today as this is otherwise
catastrophic ... when that is done, we/the developer from NSN can still
think about his patch and how to solve that differently.

On 04/14/2014 06:43 PM, Butler, Peter wrote:
> With the git revert command applied, the SCTP+IPv4 performance on 3.14.0 is back to normal!  Same throughput as 3.4.2 and steady and smooth:
>
> [root@Lab200slot2 ~]#  iperf3 --sctp -4 -c 192.168.240.3 -V -l 1452 -t 60
> iperf version 3.0.1 (10 January 2014)
> Linux Lab200slot2 3.14.0+ #1 SMP Mon Apr 14 12:06:40 EDT 2014 x86_64
> Time: Mon, 14 Apr 2014 16:40:48 GMT
> Connecting to host 192.168.240.3, port 5201
>        Cookie: Lab200slot2.1397493648.413274.65e131
> [  4] local 192.168.240.2 port 50548 connected to 192.168.240.3 port 5201
> Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
> [ ID] Interval           Transfer     Bandwidth
> [  4]   0.00-1.00   sec   240 MBytes  2.02 Gbits/sec
> [  4]   1.00-2.00   sec   239 MBytes  2.01 Gbits/sec
> [  4]   2.00-3.00   sec   240 MBytes  2.01 Gbits/sec
> [  4]   3.00-4.00   sec   239 MBytes  2.00 Gbits/sec
> [  4]   4.00-5.00   sec   245 MBytes  2.05 Gbits/sec
> [  4]   5.00-6.00   sec   240 MBytes  2.01 Gbits/sec
> [  4]   6.00-7.00   sec   240 MBytes  2.02 Gbits/sec
> [  4]   7.00-8.00   sec   239 MBytes  2.01 Gbits/sec
>
>
>
>
>
>
> -----Original Message-----
> From: Daniel Borkmann [mailto:dborkman@redhat.com]
> Sent: April-11-14 7:58 PM
> To: Butler, Peter
> Cc: Vlad Yasevich; linux-sctp@vger.kernel.org
> Subject: Re: Is SCTP throughput really this low compared to TCP?
>
> On 04/11/2014 10:18 PM, Butler, Peter wrote:
>> It may take a little time to do the bisection.  Can you provide me
>> with a
>   > guesstimate as to which kernel(s) may have introduced this behaviour?
>
> Just out of curiosity, could you do a ...
>
> git revert ef2820a735f74ea60335f8ba3801b844f0cb184d
>
> ... on your tree and try to see what you get with that?
>
>> -----Original Message-----
>> From: Daniel Borkmann [mailto:dborkman@redhat.com]
>> Sent: April-11-14 2:40 PM
>> To: Butler, Peter
>> Cc: Vlad Yasevich; linux-sctp@vger.kernel.org
>> Subject: Re: Is SCTP throughput really this low compared to TCP?
>>
>> On 04/11/2014 08:22 PM, Butler, Peter wrote:
>>> The difference between 3.14 and 3.4.2 is staggering.  An order of
>>> magnitude or so.  For example,
>>    > using the precisely same setup as before, whereas I get about 2.1 Gbps throughput with 3.4 2, I  > can only manage between 70-150 Mbps with 3.14 - a staggering difference.
>>>
>>> Moreover, the SCTP throughput seems to 'choke' itself with 3.14, such
>>> that it is always trying to
>>    > recover.  For example, with 3.4.2 the 2.1 Gbps throughput is quite consistent from one second to  > the next (as you would expect):
>>>
>>> [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t
>>> 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2
>>> 3.4.2-1.fc16.x86_64 #1 SMP Thu Jun 14 20:17:26 UTC 2012 x86_64
>>> Time: Fri, 11 Apr 2014 18:19:15 GMT
>>> Connecting to host 192.168.241.3, port 5201
>>>          Cookie: Lab200slot2.1397240355.069035.0d5b0f
>>> [  4] local 192.168.241.2 port 56030 connected to 192.168.241.3 port
>>> 5201 Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
>>> [ ID] Interval           Transfer     Bandwidth
>>> [  4]   0.00-1.00   sec   255 MBytes  2.14 Gbits/sec
>>> [  4]   1.00-2.00   sec   253 MBytes  2.12 Gbits/sec
>>> [  4]   2.00-3.00   sec   255 MBytes  2.14 Gbits/sec
>>> [  4]   3.00-4.00   sec   255 MBytes  2.14 Gbits/sec
>>> [  4]   4.00-5.00   sec   255 MBytes  2.14 Gbits/sec
>>> [  4]   5.00-6.00   sec   257 MBytes  2.15 Gbits/sec
>>> [  4]   6.00-7.00   sec   253 MBytes  2.13 Gbits/sec
>>> [  4]   7.00-8.00   sec   254 MBytes  2.13 Gbits/sec
>>> [  4]   8.00-9.00   sec   255 MBytes  2.14 Gbits/sec
>>> [  4]   9.00-10.00  sec   252 MBytes  2.12 Gbits/sec
>>> (etc)
>>>
>>> but with 3.14 the numbers as all over the place:
>>>
>>> [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t
>>> 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2 3.14.0 #1
>>> SMP Thu Apr 3 23:18:29 EDT 2014 x86_64
>>> Time: Fri, 11 Apr 2014 17:56:21 GMT
>>> Connecting to host 192.168.241.3, port 5201
>>>          Cookie: Lab200slot2.1397238981.812898.548918
>>> [  4] local 192.168.241.2 port 38616 connected to 192.168.241.3 port
>>> 5201 Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
>>> [ ID] Interval           Transfer     Bandwidth
>>> [  4]   0.00-1.09   sec  20.8 MBytes   161 Mbits/sec
>>> [  4]   1.09-2.13   sec  10.8 MBytes  86.8 Mbits/sec
>>> [  4]   2.13-3.15   sec  3.57 MBytes  29.5 Mbits/sec
>>> [  4]   3.15-4.16   sec  4.33 MBytes  35.7 Mbits/sec
>>> [  4]   4.16-6.21   sec  10.4 MBytes  42.7 Mbits/sec
>>> [  4]   6.21-6.21   sec  0.00 Bytes  0.00 bits/sec
>>> [  4]   6.21-7.35   sec  34.6 MBytes   253 Mbits/sec
>>> [  4]   7.35-11.45  sec  22.0 MBytes  45.0 Mbits/sec
>>> [  4]  11.45-11.45  sec  0.00 Bytes  0.00 bits/sec [  4]  11.45-11.45
>>> sec  0.00 Bytes  0.00 bits/sec [  4]  11.45-11.45  sec  0.00 Bytes
>>> 0.00 bits/sec
>>> [  4]  11.45-12.51  sec  16.0 MBytes   126 Mbits/sec
>>> [  4]  12.51-13.59  sec  20.3 MBytes   158 Mbits/sec
>>> [  4]  13.59-14.65  sec  13.4 MBytes   107 Mbits/sec
>>> [  4]  14.65-16.79  sec  33.3 MBytes   130 Mbits/sec
>>> [  4]  16.79-16.79  sec  0.00 Bytes  0.00 bits/sec [  4]  16.79-17.82
>>> sec  5.94 MBytes  48.7 Mbits/sec
>>> (etc)
>>>
>>> Note: the difference appears to be SCTP-specific, as I get exactly
>>> the same TCP
>>    > throughput in both kernels.
>>
>> Hmm, okay. :/ Could you further bisect on your side to narrow down from which kernel onwards this behaviour can be seen?
>>
>> Thanks,
>>
>> Daniel
>>
>>> -----Original Message-----
>>> From: Daniel Borkmann [mailto:dborkman@redhat.com]
>>> Sent: April-11-14 11:22 AM
>>> To: Butler, Peter
>>> Cc: Vlad Yasevich; linux-sctp@vger.kernel.org
>>> Subject: Re: Is SCTP throughput really this low compared to TCP?
>>>
>>> On 04/11/2014 05:07 PM, Butler, Peter wrote:
>>>> Yes indeed this is ixgbe and I do have SCTP checksum offloading
>>>> enabled.  (For what
>>>     > it's worth, the checksum offload gives about a 20% throughput
>>> gain
>>> - but this is,  > of course, already included in the numbers I posted
>>> to this thread as I've been using  > the CRC offload all along.)
>>>
>>> Ok, understood.
>>>
>>>> I re-did all the tests with TSO/GSO/LRO/GRO disabled (on both sides
>>>> of the association -
>>>     > i.e. on both endpoint nodes), and using 1452-byte messages instead of 1000-byte  > messages.  With this new setup, the TCP performance drops significantly, as expected,  > while the SCTP performance is boosted, and the playing field is somewhat more 'level'.
>>>     > (Note that I could not use 1464-byte messages as suggested by
>>> Vlad, as anything above  > 1452 cut the SCTP performance in half -
>>> must have hit the segmentation limit at this  > slightly lower
>>> message size.  MTU is 1500.)
>>>>
>>>> So comparing "apples to apples" now, TCP only out-performs SCTP by
>>>> approximately 40-70%
>>>     > over the various range of network latencies I tested with (RTTs of 0.2 ms, 10 ms, 20 ms,  > and 50 ms).  40-70% is still significant, but nowhere near the 200% better (i.e. 3  > times the throughput) I was getting before.
>>>>
>>>> Does this value (i.e. 40-70%) sound reasonable?  Is this the
>>>> more-or-less accepted
>>>     > performance difference with the current LKSCTP implementation?
>>>
>>> Yes, that sounds reasonable to me. There are still a lot of open todos in terms of performance that we need to tackle over time, e.g. the way chunks are handled, imho, and copies involved in fast path, also that we're heavily using atomic reference counting, and other open issues.
>>>
>>>> Also, for what it's worth, I get better SCTP throughput numbers with
>>>> the older kernel
>>>     > (3.4.2) than with the newer kernel (3.14)...
>>>
>>> Interesting, a lot of things happened in between; were you able to bisect/identify a possible commit that causes this? How big is the difference?
>>>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: Is SCTP throughput really this low compared to TCP?
  2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
                   ` (26 preceding siblings ...)
  2014-04-14 16:45 ` Daniel Borkmann
@ 2014-04-14 16:47 ` Butler, Peter
  2014-04-14 17:06 ` Butler, Peter
                   ` (7 subsequent siblings)
  35 siblings, 0 replies; 37+ messages in thread
From: Butler, Peter @ 2014-04-14 16:47 UTC (permalink / raw)
  To: linux-sctp

Glad to be of help :o)

-----Original Message-----
From: Daniel Borkmann [mailto:dborkman@redhat.com] 
Sent: April-14-14 12:46 PM
To: Butler, Peter
Cc: Vlad Yasevich; linux-sctp@vger.kernel.org
Subject: Re: Is SCTP throughput really this low compared to TCP?

Ok, thanks! I'll send out a revert for now today as this is otherwise catastrophic ... when that is done, we/the developer from NSN can still think about his patch and how to solve that differently.

On 04/14/2014 06:43 PM, Butler, Peter wrote:
> With the git revert command applied, the SCTP+IPv4 performance on 3.14.0 is back to normal!  Same throughput as 3.4.2 and steady and smooth:
>
> [root@Lab200slot2 ~]#  iperf3 --sctp -4 -c 192.168.240.3 -V -l 1452 -t 
> 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2 3.14.0+ #1 
> SMP Mon Apr 14 12:06:40 EDT 2014 x86_64
> Time: Mon, 14 Apr 2014 16:40:48 GMT
> Connecting to host 192.168.240.3, port 5201
>        Cookie: Lab200slot2.1397493648.413274.65e131
> [  4] local 192.168.240.2 port 50548 connected to 192.168.240.3 port 
> 5201 Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
> [ ID] Interval           Transfer     Bandwidth
> [  4]   0.00-1.00   sec   240 MBytes  2.02 Gbits/sec
> [  4]   1.00-2.00   sec   239 MBytes  2.01 Gbits/sec
> [  4]   2.00-3.00   sec   240 MBytes  2.01 Gbits/sec
> [  4]   3.00-4.00   sec   239 MBytes  2.00 Gbits/sec
> [  4]   4.00-5.00   sec   245 MBytes  2.05 Gbits/sec
> [  4]   5.00-6.00   sec   240 MBytes  2.01 Gbits/sec
> [  4]   6.00-7.00   sec   240 MBytes  2.02 Gbits/sec
> [  4]   7.00-8.00   sec   239 MBytes  2.01 Gbits/sec
>
>
>
>
>
>
> -----Original Message-----
> From: Daniel Borkmann [mailto:dborkman@redhat.com]
> Sent: April-11-14 7:58 PM
> To: Butler, Peter
> Cc: Vlad Yasevich; linux-sctp@vger.kernel.org
> Subject: Re: Is SCTP throughput really this low compared to TCP?
>
> On 04/11/2014 10:18 PM, Butler, Peter wrote:
>> It may take a little time to do the bisection.  Can you provide me 
>> with a
>   > guesstimate as to which kernel(s) may have introduced this behaviour?
>
> Just out of curiosity, could you do a ...
>
> git revert ef2820a735f74ea60335f8ba3801b844f0cb184d
>
> ... on your tree and try to see what you get with that?
>
>> -----Original Message-----
>> From: Daniel Borkmann [mailto:dborkman@redhat.com]
>> Sent: April-11-14 2:40 PM
>> To: Butler, Peter
>> Cc: Vlad Yasevich; linux-sctp@vger.kernel.org
>> Subject: Re: Is SCTP throughput really this low compared to TCP?
>>
>> On 04/11/2014 08:22 PM, Butler, Peter wrote:
>>> The difference between 3.14 and 3.4.2 is staggering.  An order of 
>>> magnitude or so.  For example,
>>    > using the precisely same setup as before, whereas I get about 2.1 Gbps throughput with 3.4 2, I  > can only manage between 70-150 Mbps with 3.14 - a staggering difference.
>>>
>>> Moreover, the SCTP throughput seems to 'choke' itself with 3.14, 
>>> such that it is always trying to
>>    > recover.  For example, with 3.4.2 the 2.1 Gbps throughput is quite consistent from one second to  > the next (as you would expect):
>>>
>>> [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 
>>> -t
>>> 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2
>>> 3.4.2-1.fc16.x86_64 #1 SMP Thu Jun 14 20:17:26 UTC 2012 x86_64
>>> Time: Fri, 11 Apr 2014 18:19:15 GMT
>>> Connecting to host 192.168.241.3, port 5201
>>>          Cookie: Lab200slot2.1397240355.069035.0d5b0f
>>> [  4] local 192.168.241.2 port 56030 connected to 192.168.241.3 port
>>> 5201 Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
>>> [ ID] Interval           Transfer     Bandwidth
>>> [  4]   0.00-1.00   sec   255 MBytes  2.14 Gbits/sec
>>> [  4]   1.00-2.00   sec   253 MBytes  2.12 Gbits/sec
>>> [  4]   2.00-3.00   sec   255 MBytes  2.14 Gbits/sec
>>> [  4]   3.00-4.00   sec   255 MBytes  2.14 Gbits/sec
>>> [  4]   4.00-5.00   sec   255 MBytes  2.14 Gbits/sec
>>> [  4]   5.00-6.00   sec   257 MBytes  2.15 Gbits/sec
>>> [  4]   6.00-7.00   sec   253 MBytes  2.13 Gbits/sec
>>> [  4]   7.00-8.00   sec   254 MBytes  2.13 Gbits/sec
>>> [  4]   8.00-9.00   sec   255 MBytes  2.14 Gbits/sec
>>> [  4]   9.00-10.00  sec   252 MBytes  2.12 Gbits/sec
>>> (etc)
>>>
>>> but with 3.14 the numbers as all over the place:
>>>
>>> [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 
>>> -t
>>> 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2 3.14.0 #1 
>>> SMP Thu Apr 3 23:18:29 EDT 2014 x86_64
>>> Time: Fri, 11 Apr 2014 17:56:21 GMT
>>> Connecting to host 192.168.241.3, port 5201
>>>          Cookie: Lab200slot2.1397238981.812898.548918
>>> [  4] local 192.168.241.2 port 38616 connected to 192.168.241.3 port
>>> 5201 Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
>>> [ ID] Interval           Transfer     Bandwidth
>>> [  4]   0.00-1.09   sec  20.8 MBytes   161 Mbits/sec
>>> [  4]   1.09-2.13   sec  10.8 MBytes  86.8 Mbits/sec
>>> [  4]   2.13-3.15   sec  3.57 MBytes  29.5 Mbits/sec
>>> [  4]   3.15-4.16   sec  4.33 MBytes  35.7 Mbits/sec
>>> [  4]   4.16-6.21   sec  10.4 MBytes  42.7 Mbits/sec
>>> [  4]   6.21-6.21   sec  0.00 Bytes  0.00 bits/sec
>>> [  4]   6.21-7.35   sec  34.6 MBytes   253 Mbits/sec
>>> [  4]   7.35-11.45  sec  22.0 MBytes  45.0 Mbits/sec
>>> [  4]  11.45-11.45  sec  0.00 Bytes  0.00 bits/sec [  4]  
>>> 11.45-11.45 sec  0.00 Bytes  0.00 bits/sec [  4]  11.45-11.45  sec  
>>> 0.00 Bytes
>>> 0.00 bits/sec
>>> [  4]  11.45-12.51  sec  16.0 MBytes   126 Mbits/sec
>>> [  4]  12.51-13.59  sec  20.3 MBytes   158 Mbits/sec
>>> [  4]  13.59-14.65  sec  13.4 MBytes   107 Mbits/sec
>>> [  4]  14.65-16.79  sec  33.3 MBytes   130 Mbits/sec
>>> [  4]  16.79-16.79  sec  0.00 Bytes  0.00 bits/sec [  4]  
>>> 16.79-17.82 sec  5.94 MBytes  48.7 Mbits/sec
>>> (etc)
>>>
>>> Note: the difference appears to be SCTP-specific, as I get exactly 
>>> the same TCP
>>    > throughput in both kernels.
>>
>> Hmm, okay. :/ Could you further bisect on your side to narrow down from which kernel onwards this behaviour can be seen?
>>
>> Thanks,
>>
>> Daniel
>>
>>> -----Original Message-----
>>> From: Daniel Borkmann [mailto:dborkman@redhat.com]
>>> Sent: April-11-14 11:22 AM
>>> To: Butler, Peter
>>> Cc: Vlad Yasevich; linux-sctp@vger.kernel.org
>>> Subject: Re: Is SCTP throughput really this low compared to TCP?
>>>
>>> On 04/11/2014 05:07 PM, Butler, Peter wrote:
>>>> Yes indeed this is ixgbe and I do have SCTP checksum offloading 
>>>> enabled.  (For what
>>>     > it's worth, the checksum offload gives about a 20% throughput 
>>> gain
>>> - but this is,  > of course, already included in the numbers I 
>>> posted to this thread as I've been using  > the CRC offload all 
>>> along.)
>>>
>>> Ok, understood.
>>>
>>>> I re-did all the tests with TSO/GSO/LRO/GRO disabled (on both sides 
>>>> of the association -
>>>     > i.e. on both endpoint nodes), and using 1452-byte messages instead of 1000-byte  > messages.  With this new setup, the TCP performance drops significantly, as expected,  > while the SCTP performance is boosted, and the playing field is somewhat more 'level'.
>>>     > (Note that I could not use 1464-byte messages as suggested by 
>>> Vlad, as anything above  > 1452 cut the SCTP performance in half - 
>>> must have hit the segmentation limit at this  > slightly lower 
>>> message size.  MTU is 1500.)
>>>>
>>>> So comparing "apples to apples" now, TCP only out-performs SCTP by 
>>>> approximately 40-70%
>>>     > over the various range of network latencies I tested with (RTTs of 0.2 ms, 10 ms, 20 ms,  > and 50 ms).  40-70% is still significant, but nowhere near the 200% better (i.e. 3  > times the throughput) I was getting before.
>>>>
>>>> Does this value (i.e. 40-70%) sound reasonable?  Is this the 
>>>> more-or-less accepted
>>>     > performance difference with the current LKSCTP implementation?
>>>
>>> Yes, that sounds reasonable to me. There are still a lot of open todos in terms of performance that we need to tackle over time, e.g. the way chunks are handled, imho, and copies involved in fast path, also that we're heavily using atomic reference counting, and other open issues.
>>>
>>>> Also, for what it's worth, I get better SCTP throughput numbers 
>>>> with the older kernel
>>>     > (3.4.2) than with the newer kernel (3.14)...
>>>
>>> Interesting, a lot of things happened in between; were you able to bisect/identify a possible commit that causes this? How big is the difference?
>>>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: Is SCTP throughput really this low compared to TCP?
  2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
                   ` (27 preceding siblings ...)
  2014-04-14 16:47 ` Butler, Peter
@ 2014-04-14 17:06 ` Butler, Peter
  2014-04-14 17:10 ` Butler, Peter
                   ` (6 subsequent siblings)
  35 siblings, 0 replies; 37+ messages in thread
From: Butler, Peter @ 2014-04-14 17:06 UTC (permalink / raw)
  To: linux-sctp

This particular branch of the email thread(s) is now defunct: as per Daniel Borkmann the offending code was narrowed down to the following patch:

git revert ef2820a735f74ea60335f8ba3801b844f0cb184d

With this patch reverted, 3.14.0 SCTP+IPv4 performance is back to normal and working properly.

See other branch of this thread for details.



-----Original Message-----
From: Butler, Peter 
Sent: April-14-14 10:52 AM
To: Daniel Borkmann
Cc: Vlad Yasevich; linux-sctp@vger.kernel.org
Subject: RE: Is SCTP throughput really this low compared to TCP?

I started the bisection process - again this will take some time and I (unfortunately) can't dedicate all my time to this at work (other fires for me to put out here as well).  However on a hunch, based on TIPC work I had previously done that seemed to suggest that the 3.10.x kernel stream was the last stream before significant changes were made to the underlying net API, I tested the latest kernel in this stream, namely 3.10.36.

As I suspected, this kernel is still 'good' - that is, there is no erratic behaviour with SCTP/IPv4 and the throughput.  However the throughput is still better with the 3.4 2 kernel (2.1 Gbps as opposed to 1.75 Gbps).

SUMMARY so far for SCTP+IPv4:

3.4.2 kernel:  smooth throughput 2.1 Gbps
3.10.36 kernel: smooth throughput 1.75 Gbps
3.14 kernel: highly erratic, terrible throughput




-----Original Message-----
From: Daniel Borkmann [mailto:dborkman@redhat.com]
Sent: April-11-14 2:40 PM
To: Butler, Peter
Cc: Vlad Yasevich; linux-sctp@vger.kernel.org
Subject: Re: Is SCTP throughput really this low compared to TCP?

On 04/11/2014 08:22 PM, Butler, Peter wrote:
> The difference between 3.14 and 3.4.2 is staggering.  An order of 
> magnitude or so.  For example,
 > using the precisely same setup as before, whereas I get about 2.1 Gbps throughput with 3.4 2, I  > can only manage between 70-150 Mbps with 3.14 - a staggering difference.
>
> Moreover, the SCTP throughput seems to 'choke' itself with 3.14, such 
> that it is always trying to
 > recover.  For example, with 3.4.2 the 2.1 Gbps throughput is quite consistent from one second to  > the next (as you would expect):
>
> [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t
> 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2
> 3.4.2-1.fc16.x86_64 #1 SMP Thu Jun 14 20:17:26 UTC 2012 x86_64
> Time: Fri, 11 Apr 2014 18:19:15 GMT
> Connecting to host 192.168.241.3, port 5201
>        Cookie: Lab200slot2.1397240355.069035.0d5b0f
> [  4] local 192.168.241.2 port 56030 connected to 192.168.241.3 port
> 5201 Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
> [ ID] Interval           Transfer     Bandwidth
> [  4]   0.00-1.00   sec   255 MBytes  2.14 Gbits/sec
> [  4]   1.00-2.00   sec   253 MBytes  2.12 Gbits/sec
> [  4]   2.00-3.00   sec   255 MBytes  2.14 Gbits/sec
> [  4]   3.00-4.00   sec   255 MBytes  2.14 Gbits/sec
> [  4]   4.00-5.00   sec   255 MBytes  2.14 Gbits/sec
> [  4]   5.00-6.00   sec   257 MBytes  2.15 Gbits/sec
> [  4]   6.00-7.00   sec   253 MBytes  2.13 Gbits/sec
> [  4]   7.00-8.00   sec   254 MBytes  2.13 Gbits/sec
> [  4]   8.00-9.00   sec   255 MBytes  2.14 Gbits/sec
> [  4]   9.00-10.00  sec   252 MBytes  2.12 Gbits/sec
> (etc)
>
> but with 3.14 the numbers as all over the place:
>
> [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t
> 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2 3.14.0 #1 
> SMP Thu Apr 3 23:18:29 EDT 2014 x86_64
> Time: Fri, 11 Apr 2014 17:56:21 GMT
> Connecting to host 192.168.241.3, port 5201
>        Cookie: Lab200slot2.1397238981.812898.548918
> [  4] local 192.168.241.2 port 38616 connected to 192.168.241.3 port
> 5201 Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
> [ ID] Interval           Transfer     Bandwidth
> [  4]   0.00-1.09   sec  20.8 MBytes   161 Mbits/sec
> [  4]   1.09-2.13   sec  10.8 MBytes  86.8 Mbits/sec
> [  4]   2.13-3.15   sec  3.57 MBytes  29.5 Mbits/sec
> [  4]   3.15-4.16   sec  4.33 MBytes  35.7 Mbits/sec
> [  4]   4.16-6.21   sec  10.4 MBytes  42.7 Mbits/sec
> [  4]   6.21-6.21   sec  0.00 Bytes  0.00 bits/sec
> [  4]   6.21-7.35   sec  34.6 MBytes   253 Mbits/sec
> [  4]   7.35-11.45  sec  22.0 MBytes  45.0 Mbits/sec
> [  4]  11.45-11.45  sec  0.00 Bytes  0.00 bits/sec [  4]  11.45-11.45 
> sec  0.00 Bytes  0.00 bits/sec [  4]  11.45-11.45  sec  0.00 Bytes
> 0.00 bits/sec
> [  4]  11.45-12.51  sec  16.0 MBytes   126 Mbits/sec
> [  4]  12.51-13.59  sec  20.3 MBytes   158 Mbits/sec
> [  4]  13.59-14.65  sec  13.4 MBytes   107 Mbits/sec
> [  4]  14.65-16.79  sec  33.3 MBytes   130 Mbits/sec
> [  4]  16.79-16.79  sec  0.00 Bytes  0.00 bits/sec [  4]  16.79-17.82 
> sec  5.94 MBytes  48.7 Mbits/sec
> (etc)
>
> Note: the difference appears to be SCTP-specific, as I get exactly the 
> same TCP
 > throughput in both kernels.

Hmm, okay. :/ Could you further bisect on your side to narrow down from which kernel onwards this behaviour can be seen?

Thanks,

Daniel

> -----Original Message-----
> From: Daniel Borkmann [mailto:dborkman@redhat.com]
> Sent: April-11-14 11:22 AM
> To: Butler, Peter
> Cc: Vlad Yasevich; linux-sctp@vger.kernel.org
> Subject: Re: Is SCTP throughput really this low compared to TCP?
>
> On 04/11/2014 05:07 PM, Butler, Peter wrote:
>> Yes indeed this is ixgbe and I do have SCTP checksum offloading 
>> enabled.  (For what
>   > it's worth, the checksum offload gives about a 20% throughput gain
> - but this is,  > of course, already included in the numbers I posted 
> to this thread as I've been using  > the CRC offload all along.)
>
> Ok, understood.
>
>> I re-did all the tests with TSO/GSO/LRO/GRO disabled (on both sides 
>> of the association -
>   > i.e. on both endpoint nodes), and using 1452-byte messages instead of 1000-byte  > messages.  With this new setup, the TCP performance drops significantly, as expected,  > while the SCTP performance is boosted, and the playing field is somewhat more 'level'.
>   > (Note that I could not use 1464-byte messages as suggested by 
> Vlad, as anything above  > 1452 cut the SCTP performance in half - 
> must have hit the segmentation limit at this  > slightly lower message 
> size.  MTU is 1500.)
>>
>> So comparing "apples to apples" now, TCP only out-performs SCTP by 
>> approximately 40-70%
>   > over the various range of network latencies I tested with (RTTs of 0.2 ms, 10 ms, 20 ms,  > and 50 ms).  40-70% is still significant, but nowhere near the 200% better (i.e. 3  > times the throughput) I was getting before.
>>
>> Does this value (i.e. 40-70%) sound reasonable?  Is this the 
>> more-or-less accepted
>   > performance difference with the current LKSCTP implementation?
>
> Yes, that sounds reasonable to me. There are still a lot of open todos in terms of performance that we need to tackle over time, e.g. the way chunks are handled, imho, and copies involved in fast path, also that we're heavily using atomic reference counting, and other open issues.
>
>> Also, for what it's worth, I get better SCTP throughput numbers with 
>> the older kernel
>   > (3.4.2) than with the newer kernel (3.14)...
>
> Interesting, a lot of things happened in between; were you able to bisect/identify a possible commit that causes this? How big is the difference?
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: Is SCTP throughput really this low compared to TCP?
  2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
                   ` (28 preceding siblings ...)
  2014-04-14 17:06 ` Butler, Peter
@ 2014-04-14 17:10 ` Butler, Peter
  2014-04-14 18:54 ` Matija Glavinic Pecotic
                   ` (5 subsequent siblings)
  35 siblings, 0 replies; 37+ messages in thread
From: Butler, Peter @ 2014-04-14 17:10 UTC (permalink / raw)
  To: linux-sctp

This particular branch of the email thread(s) is now defunct: as per Daniel Borkmann the offending code was narrowed down to the following patch:

git revert ef2820a735f74ea60335f8ba3801b844f0cb184d

With this patch reverted, 3.14.0 SCTP+IPv4 performance is back to normal and working properly.

See other branch of this thread for details.



-----Original Message-----
From: Butler, Peter 
Sent: April-14-14 11:50 AM
To: Vlad Yasevich; Daniel Borkmann
Cc: linux-sctp@vger.kernel.org
Subject: RE: Is SCTP throughput really this low compared to TCP?

Here are some perf numbers.  Note that these were obtained with operf/opreport.  Only the top 20 or so lines are shown here.

Identical load test performed on 3.4.2 and 3.14.

3.4.2:

CPU: Intel Core/i7, speed 2.134e+06 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000
samples  cumulative samples  %        cumulative %     linenr info                 image name               symbol name
130002   130002         4.9435   4.9435    copy_user_64.S:240          vmlinux                  copy_user_generic_string
125955   255957         4.7896   9.7330    memcpy_64.S:59              vmlinux                  memcpy
71026    326983         2.7008  12.4339    spinlock.c:136              vmlinux                  _raw_spin_lock
57138    384121         2.1727  14.6066    slub.c:409                  vmlinux                  cmpxchg_double_slab
56559    440680         2.1507  16.7573    slub.c:2208                 vmlinux                  __slab_alloc
53058    493738         2.0176  18.7749    ixgbe_main.c:2952           ixgbe.ko                 ixgbe_poll
51541    545279         1.9599  20.7348    slub.c:2439                 vmlinux                  __slab_free
49916    595195         1.8981  22.6329    ip_tables.c:294             vmlinux                  ipt_do_table
42406    637601         1.6125  24.2455    ixgbe_main.c:7824           ixgbe.ko                 ixgbe_xmit_frame_ring
40929    678530         1.5564  25.8018    slub.c:3463                 vmlinux                  kfree
40349    718879         1.5343  27.3361    core.c:132                  vmlinux                  nf_iterate
35521    754400         1.3507  28.6869    output.c:347                sctp.ko                  sctp_packet_transmit
34071    788471         1.2956  29.9825    outqueue.c:1342             sctp.ko                  sctp_check_transmitted
33962    822433         1.2914  31.2739    slub.c:2601                 vmlinux                  kmem_cache_free
33450    855883         1.2720  32.5459    outqueue.c:735              sctp.ko                  sctp_outq_flush
33005    888888         1.2551  33.8009    skbuff.c:172                vmlinux                  __alloc_skb
29231    918119         1.1115  34.9125    socket.c:1565               sctp.ko                  sctp_sendmsg
27950    946069         1.0628  35.9753    (no location information)   libc-2.14.90.so          __memmove_ssse3_back
26718    972787         1.0160  36.9913    (no location information)   nf_conntrack.ko          nf_conntrack_in
26589    999376         1.0111  38.0023    slub.c:4049                 vmlinux                  __kmalloc_node_track_caller
26449    1025825        1.0058  39.0081    slub.c:2375                 vmlinux                  kmem_cache_alloc
26211    1052036        0.9967  40.0048    sm_sideeffect.c:1074        sctp.ko                  sctp_do_sm
25527    1077563        0.9707  40.9755    slub.c:2404                 vmlinux                  kmem_cache_alloc_node
23970    1101533        0.9115  41.8870    (no location information)   libc-2.14.90.so          _int_free
23266    1124799        0.8847  42.7717    memset_64.S:62              vmlinux                  memset
22976    1147775        0.8737  43.6454    (no location information)   nf_conntrack.ko          hash_conntrack_raw
21855    1169630        0.8311  44.4764    chunk.c:175                 sctp.ko                  sctp_datamsg_from_user
21730    1191360        0.8263  45.3027    list_debug.c:24             vmlinux                  __list_add
21252    1212612        0.8081  46.1109    dev.c:3151                  vmlinux                  __netif_receive_skb
20742    1233354        0.7887  46.8996    (no location information)   libc-2.14.90.so          _int_malloc
19955    1253309        0.7588  47.6584    input.c:130                 sctp.ko                  sctp_rcv



3.14:

CPU: Intel Core/i7, speed 2.134e+06 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000
samples  cumulative samples  %        cumulative %     linenr info                 image name               symbol name
168021   168021         6.1446   6.1446    copy_user_64.S:183          vmlinux-3.14.0           copy_user_generic_string
85199    253220         3.1158   9.2604    memcpy_64.S:59              vmlinux-3.14.0           memcpy
80133    333353         2.9305  12.1909    spinlock.c:174              vmlinux-3.14.0           _raw_spin_lock_bh
74086    407439         2.7094  14.9003    spinlock.c:150              vmlinux-3.14.0           _raw_spin_lock
51878    459317         1.8972  16.7975    ixgbe_main.c:6930           ixgbe.ko                 ixgbe_xmit_frame_ring
49354    508671         1.8049  18.6024    slub.c:2538                 vmlinux-3.14.0           __slab_free
39103    547774         1.4300  20.0324    outqueue.c:706              sctp.ko                  sctp_outq_flush
37775    585549         1.3815  21.4139    outqueue.c:1304             sctp.ko                  sctp_check_transmitted
37514    623063         1.3719  22.7858    output.c:380                sctp.ko                  sctp_packet_transmit
37320    660383         1.3648  24.1506    slub.c:2700                 vmlinux-3.14.0           kmem_cache_free
36147    696530         1.3219  25.4725    ip_tables.c:294             vmlinux-3.14.0           ipt_do_table
35494    732024         1.2980  26.7705    sm_sideeffect.c:1100        sctp.ko                  sctp_do_sm
35452    767476         1.2965  28.0670    core.c:135                  vmlinux-3.14.0           nf_iterate
34697    802173         1.2689  29.3359    slub.c:2281                 vmlinux-3.14.0           __slab_alloc
33890    836063         1.2394  30.5753    slub.c:415                  vmlinux-3.14.0           cmpxchg_double_slab
33566    869629         1.2275  31.8028    (no location information)   libc-2.14.90.so          _int_free
33228    902857         1.2152  33.0180    socket.c:1590               sctp.ko                  sctp_sendmsg
32774    935631         1.1986  34.2166    slub.c:3381                 vmlinux-3.14.0           kfree
30359    965990         1.1102  35.3268    (no location information)   libc-2.14.90.so          __memmove_ssse3_back
28905    994895         1.0571  36.3839    list_debug.c:25             vmlinux-3.14.0           __list_add
25888    1020783        0.9467  37.3306    skbuff.c:199                vmlinux-3.14.0           __alloc_skb
25490    1046273        0.9322  38.2628    fib_trie.c:1399             vmlinux-3.14.0           fib_table_lookup
25232    1071505        0.9227  39.1856    nf_conntrack_core.c:376     nf_conntrack.ko          __nf_conntrack_find_get
24114    1095619        0.8819  40.0674    chunk.c:168                 sctp.ko                  sctp_datamsg_from_user
24067    1119686        0.8801  40.9476    ixgbe_main.c:2020           ixgbe.ko                 ixgbe_clean_rx_irq
23972    1143658        0.8767  41.8242    (no location information)   libc-2.14.90.so          _int_malloc
22117    1165775        0.8088  42.6331    ip_output.c:215             vmlinux-3.14.0           ip_finish_output
22037    1187812        0.8059  43.4390    slub.c:3854                 vmlinux-3.14.0           __kmalloc_node_track_caller
21847    1209659        0.7990  44.2379    dev.c:2546                  vmlinux-3.14.0           dev_hard_start_xmit
21564    1231223        0.7886  45.0265    slub.c:2481                 vmlinux-3.14.0           kmem_cache_alloc
20659    1251882        0.7555  45.7820    socket.c:2049               sctp.ko                  sctp_recvmsg







-----Original Message-----
From: Vlad Yasevich [mailto:vyasevich@gmail.com]
Sent: April-11-14 11:28 AM
To: Butler, Peter; Daniel Borkmann
Cc: linux-sctp@vger.kernel.org
Subject: Re: Is SCTP throughput really this low compared to TCP?

On 04/11/2014 11:07 AM, Butler, Peter wrote:
> Yes indeed this is ixgbe and I do have SCTP checksum offloading 
> enabled.  (For what it's worth, the checksum offload gives about a 20% 
> throughput gain - but this is, of course, already included in the 
> numbers I posted to this thread as I've been using the CRC offload all
> along.)
> 
> I re-did all the tests with TSO/GSO/LRO/GRO disabled (on both sides of the association - i.e. on both endpoint nodes), and using 1452-byte messages instead of 1000-byte messages.  With this new setup, the TCP performance drops significantly, as expected, while the SCTP performance is boosted, and the playing field is somewhat more 'level'.   (Note that I could not use 1464-byte messages as suggested by Vlad, as anything above 1452 cut the SCTP performance in half - must have hit the segmentation limit at this slightly lower message size.  MTU is 1500.)
> 
> So comparing "apples to apples" now, TCP only out-performs SCTP by approximately 40-70% over the various range of network latencies I tested with (RTTs of 0.2 ms, 10 ms, 20 ms, and 50 ms).  40-70% is still significant, but nowhere near the 200% better (i.e. 3 times the throughput) I was getting before.
> 
> Does this value (i.e. 40-70%) sound reasonable? 

This still looks high.  Could you run 'perf record -a' and 'perf report'
to see where we are spending all of our time in sctp.

My guess is that a lot of it is going to be in memcpy(), but I am curious.

> Is this the more-or-less accepted performance difference with the current LKSCTP implementation?
> 
> Also, for what it's worth, I get better SCTP throughput numbers with the older kernel (3.4.2) than with the newer kernel (3.14)...
> 

That's interesting.  I'll have to look at see what might have changed here.

-vlad

> 
> 
> -----Original Message-----
> From: Daniel Borkmann [mailto:dborkman@redhat.com]
> Sent: April-11-14 3:43 AM
> To: Vlad Yasevich
> Cc: Butler, Peter; linux-sctp@vger.kernel.org
> Subject: Re: Is SCTP throughput really this low compared to TCP?
> 
> Hi Peter,
> 
> On 04/10/2014 10:21 PM, Vlad Yasevich wrote:
>> On 04/10/2014 03:12 PM, Butler, Peter wrote:
>>> I've been testing SCTP throughput between two nodes over a 10Gb-Ethernet backplane, and am finding that at best, its throughput is about a third of that of TCP.  Is this number generally accepted for current LKSCTP performance?
>>>
>>> All TCP/SCTP tests performed with 1000-byte (payload) messages, between 8-core Xeon nodes @ 2.13GHz, with no CPU throttling (always running at 100%) on otherwise idle systems.  Test applications include netperf, iperf and proprietary in-house stubs.
>>>
>>> The latency between nodes is generally 0.2 ms.  Tests were run using this low-latency scenario, as well as using traffic control (tc) to simulate networks with 10 ms, 20 ms and 50 ms latency (i.e. 20 ms, 40 ms and 100 ms RTT, respectively).
>>>
>>> In addition, each of these network scenarios were tested using various kernel socket buffer sizes, ranging from the default kernel size (100-200 kB), to several MB for send and receive buffers, and multiple send:receive ratios for these buffer sizes (generally using larger receive buffer sizes, up to a factor of about 6).
>>>
>>> Finally, tests were performed on kernels as old as 3.4.2 and as recent as 3.14.
>>>
>>> The TCP throughput is about 3x higher than that of SCTP as a best-case scenario (i.e. from an SCTP perspective), and much higher still in worst-case scenarios.
>>
>> To do a more of apples-to-apples comparison, you need to disable 
>> tso/gso on the sending node.
>>
>> The reason is that even if you limit buffer sizes, tcp will still try 
>> to do tso on the transmit size, thus coalescing you 1000-byte 
>> messages into something much larger, thus utilizing your MTU much more efficiently.
>>
>> SCTP, on the other hand, has to preserve message boundaries which 
>> results in sub-optimal mtu utilization when using 1000-byte payloads.
>>
>> My recommendation is to use 1464 byte message for SCTP on a 1500 byte 
>> MTU nic.
>>
>> I would be interested to see the results.  There could very well be issues.
> 
> Agreed.
> 
> Also, what NIC are you using? It seems only Intel provides SCTP checksum offloading so far, i.e. ixgbe/i40e NICs.
> 
>> -vlad


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Is SCTP throughput really this low compared to TCP?
  2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
                   ` (29 preceding siblings ...)
  2014-04-14 17:10 ` Butler, Peter
@ 2014-04-14 18:54 ` Matija Glavinic Pecotic
  2014-04-14 19:46 ` Daniel Borkmann
                   ` (4 subsequent siblings)
  35 siblings, 0 replies; 37+ messages in thread
From: Matija Glavinic Pecotic @ 2014-04-14 18:54 UTC (permalink / raw)
  To: linux-sctp

Hello,

On 04/14/2014 06:45 PM, ext Daniel Borkmann wrote:
> Ok, thanks! I'll send out a revert for now today as this is otherwise
> catastrophic ... when that is done, we/the developer from NSN can still
> think about his patch and how to solve that differently.

Thanks from my side as well, I will for sure look into it. Congestion control seems to be completely broken. Seems stack doesnt like the new way how rwnd is calculated. Dependency to ipv4 is also interesting.

I also wasnt able to reproduce with my laptop/desktop, it seems some higher processing power is also needed to hit the case.

> On 04/14/2014 06:43 PM, Butler, Peter wrote:
>> With the git revert command applied, the SCTP+IPv4 performance on 3.14.0 is back to normal!  Same throughput as 3.4.2 and steady and smooth:
>>
>> [root@Lab200slot2 ~]#  iperf3 --sctp -4 -c 192.168.240.3 -V -l 1452 -t 60
>> iperf version 3.0.1 (10 January 2014)
>> Linux Lab200slot2 3.14.0+ #1 SMP Mon Apr 14 12:06:40 EDT 2014 x86_64
>> Time: Mon, 14 Apr 2014 16:40:48 GMT
>> Connecting to host 192.168.240.3, port 5201
>>        Cookie: Lab200slot2.1397493648.413274.65e131
>> [  4] local 192.168.240.2 port 50548 connected to 192.168.240.3 port 5201
>> Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
>> [ ID] Interval           Transfer     Bandwidth
>> [  4]   0.00-1.00   sec   240 MBytes  2.02 Gbits/sec
>> [  4]   1.00-2.00   sec   239 MBytes  2.01 Gbits/sec
>> [  4]   2.00-3.00   sec   240 MBytes  2.01 Gbits/sec
>> [  4]   3.00-4.00   sec   239 MBytes  2.00 Gbits/sec
>> [  4]   4.00-5.00   sec   245 MBytes  2.05 Gbits/sec
>> [  4]   5.00-6.00   sec   240 MBytes  2.01 Gbits/sec
>> [  4]   6.00-7.00   sec   240 MBytes  2.02 Gbits/sec
>> [  4]   7.00-8.00   sec   239 MBytes  2.01 Gbits/sec
>>
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: Daniel Borkmann [mailto:dborkman@redhat.com]
>> Sent: April-11-14 7:58 PM
>> To: Butler, Peter
>> Cc: Vlad Yasevich; linux-sctp@vger.kernel.org
>> Subject: Re: Is SCTP throughput really this low compared to TCP?
>>
>> On 04/11/2014 10:18 PM, Butler, Peter wrote:
>>> It may take a little time to do the bisection.  Can you provide me
>>> with a
>>   > guesstimate as to which kernel(s) may have introduced this behaviour?
>>
>> Just out of curiosity, could you do a ...
>>
>> git revert ef2820a735f74ea60335f8ba3801b844f0cb184d
>>
>> ... on your tree and try to see what you get with that?
>>
>>> -----Original Message-----
>>> From: Daniel Borkmann [mailto:dborkman@redhat.com]
>>> Sent: April-11-14 2:40 PM
>>> To: Butler, Peter
>>> Cc: Vlad Yasevich; linux-sctp@vger.kernel.org
>>> Subject: Re: Is SCTP throughput really this low compared to TCP?
>>>
>>> On 04/11/2014 08:22 PM, Butler, Peter wrote:
>>>> The difference between 3.14 and 3.4.2 is staggering.  An order of
>>>> magnitude or so.  For example,
>>>    > using the precisely same setup as before, whereas I get about 2.1 Gbps throughput with 3.4 2, I  > can only manage between 70-150 Mbps with 3.14 - a staggering difference.
>>>>
>>>> Moreover, the SCTP throughput seems to 'choke' itself with 3.14, such
>>>> that it is always trying to
>>>    > recover.  For example, with 3.4.2 the 2.1 Gbps throughput is quite consistent from one second to  > the next (as you would expect):
>>>>
>>>> [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t
>>>> 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2
>>>> 3.4.2-1.fc16.x86_64 #1 SMP Thu Jun 14 20:17:26 UTC 2012 x86_64
>>>> Time: Fri, 11 Apr 2014 18:19:15 GMT
>>>> Connecting to host 192.168.241.3, port 5201
>>>>          Cookie: Lab200slot2.1397240355.069035.0d5b0f
>>>> [  4] local 192.168.241.2 port 56030 connected to 192.168.241.3 port
>>>> 5201 Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
>>>> [ ID] Interval           Transfer     Bandwidth
>>>> [  4]   0.00-1.00   sec   255 MBytes  2.14 Gbits/sec
>>>> [  4]   1.00-2.00   sec   253 MBytes  2.12 Gbits/sec
>>>> [  4]   2.00-3.00   sec   255 MBytes  2.14 Gbits/sec
>>>> [  4]   3.00-4.00   sec   255 MBytes  2.14 Gbits/sec
>>>> [  4]   4.00-5.00   sec   255 MBytes  2.14 Gbits/sec
>>>> [  4]   5.00-6.00   sec   257 MBytes  2.15 Gbits/sec
>>>> [  4]   6.00-7.00   sec   253 MBytes  2.13 Gbits/sec
>>>> [  4]   7.00-8.00   sec   254 MBytes  2.13 Gbits/sec
>>>> [  4]   8.00-9.00   sec   255 MBytes  2.14 Gbits/sec
>>>> [  4]   9.00-10.00  sec   252 MBytes  2.12 Gbits/sec
>>>> (etc)
>>>>
>>>> but with 3.14 the numbers as all over the place:
>>>>
>>>> [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t
>>>> 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2 3.14.0 #1
>>>> SMP Thu Apr 3 23:18:29 EDT 2014 x86_64
>>>> Time: Fri, 11 Apr 2014 17:56:21 GMT
>>>> Connecting to host 192.168.241.3, port 5201
>>>>          Cookie: Lab200slot2.1397238981.812898.548918
>>>> [  4] local 192.168.241.2 port 38616 connected to 192.168.241.3 port
>>>> 5201 Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
>>>> [ ID] Interval           Transfer     Bandwidth
>>>> [  4]   0.00-1.09   sec  20.8 MBytes   161 Mbits/sec
>>>> [  4]   1.09-2.13   sec  10.8 MBytes  86.8 Mbits/sec
>>>> [  4]   2.13-3.15   sec  3.57 MBytes  29.5 Mbits/sec
>>>> [  4]   3.15-4.16   sec  4.33 MBytes  35.7 Mbits/sec
>>>> [  4]   4.16-6.21   sec  10.4 MBytes  42.7 Mbits/sec
>>>> [  4]   6.21-6.21   sec  0.00 Bytes  0.00 bits/sec
>>>> [  4]   6.21-7.35   sec  34.6 MBytes   253 Mbits/sec
>>>> [  4]   7.35-11.45  sec  22.0 MBytes  45.0 Mbits/sec
>>>> [  4]  11.45-11.45  sec  0.00 Bytes  0.00 bits/sec [  4]  11.45-11.45
>>>> sec  0.00 Bytes  0.00 bits/sec [  4]  11.45-11.45  sec  0.00 Bytes
>>>> 0.00 bits/sec
>>>> [  4]  11.45-12.51  sec  16.0 MBytes   126 Mbits/sec
>>>> [  4]  12.51-13.59  sec  20.3 MBytes   158 Mbits/sec
>>>> [  4]  13.59-14.65  sec  13.4 MBytes   107 Mbits/sec
>>>> [  4]  14.65-16.79  sec  33.3 MBytes   130 Mbits/sec
>>>> [  4]  16.79-16.79  sec  0.00 Bytes  0.00 bits/sec [  4]  16.79-17.82
>>>> sec  5.94 MBytes  48.7 Mbits/sec
>>>> (etc)
>>>>
>>>> Note: the difference appears to be SCTP-specific, as I get exactly
>>>> the same TCP
>>>    > throughput in both kernels.
>>>
>>> Hmm, okay. :/ Could you further bisect on your side to narrow down from which kernel onwards this behaviour can be seen?
>>>
>>> Thanks,
>>>
>>> Daniel
>>>
>>>> -----Original Message-----
>>>> From: Daniel Borkmann [mailto:dborkman@redhat.com]
>>>> Sent: April-11-14 11:22 AM
>>>> To: Butler, Peter
>>>> Cc: Vlad Yasevich; linux-sctp@vger.kernel.org
>>>> Subject: Re: Is SCTP throughput really this low compared to TCP?
>>>>
>>>> On 04/11/2014 05:07 PM, Butler, Peter wrote:
>>>>> Yes indeed this is ixgbe and I do have SCTP checksum offloading
>>>>> enabled.  (For what
>>>>     > it's worth, the checksum offload gives about a 20% throughput
>>>> gain
>>>> - but this is,  > of course, already included in the numbers I posted
>>>> to this thread as I've been using  > the CRC offload all along.)
>>>>
>>>> Ok, understood.
>>>>
>>>>> I re-did all the tests with TSO/GSO/LRO/GRO disabled (on both sides
>>>>> of the association -
>>>>     > i.e. on both endpoint nodes), and using 1452-byte messages instead of 1000-byte  > messages.  With this new setup, the TCP performance drops significantly, as expected,  > while the SCTP performance is boosted, and the playing field is somewhat more 'level'.
>>>>     > (Note that I could not use 1464-byte messages as suggested by
>>>> Vlad, as anything above  > 1452 cut the SCTP performance in half -
>>>> must have hit the segmentation limit at this  > slightly lower
>>>> message size.  MTU is 1500.)
>>>>>
>>>>> So comparing "apples to apples" now, TCP only out-performs SCTP by
>>>>> approximately 40-70%
>>>>     > over the various range of network latencies I tested with (RTTs of 0.2 ms, 10 ms, 20 ms,  > and 50 ms).  40-70% is still significant, but nowhere near the 200% better (i.e. 3  > times the throughput) I was getting before.
>>>>>
>>>>> Does this value (i.e. 40-70%) sound reasonable?  Is this the
>>>>> more-or-less accepted
>>>>     > performance difference with the current LKSCTP implementation?
>>>>
>>>> Yes, that sounds reasonable to me. There are still a lot of open todos in terms of performance that we need to tackle over time, e.g. the way chunks are handled, imho, and copies involved in fast path, also that we're heavily using atomic reference counting, and other open issues.
>>>>
>>>>> Also, for what it's worth, I get better SCTP throughput numbers with
>>>>> the older kernel
>>>>     > (3.4.2) than with the newer kernel (3.14)...
>>>>
>>>> Interesting, a lot of things happened in between; were you able to bisect/identify a possible commit that causes this? How big is the difference?
>>>>
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Is SCTP throughput really this low compared to TCP?
  2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
                   ` (30 preceding siblings ...)
  2014-04-14 18:54 ` Matija Glavinic Pecotic
@ 2014-04-14 19:46 ` Daniel Borkmann
  2014-04-17 15:26 ` Vlad Yasevich
                   ` (3 subsequent siblings)
  35 siblings, 0 replies; 37+ messages in thread
From: Daniel Borkmann @ 2014-04-14 19:46 UTC (permalink / raw)
  To: linux-sctp

On 04/14/2014 08:54 PM, Matija Glavinic Pecotic wrote:
> Hello,
>
> On 04/14/2014 06:45 PM, ext Daniel Borkmann wrote:
>> Ok, thanks! I'll send out a revert for now today as this is otherwise
>> catastrophic ... when that is done, we/the developer from NSN can still
>> think about his patch and how to solve that differently.
>
> Thanks from my side as well, I will for sure look into it. Congestion control seems to be completely broken. Seems stack doesnt like the new way how rwnd is calculated. Dependency to ipv4 is also interesting.
>
> I also wasnt able to reproduce with my laptop/desktop, it seems some higher processing power is also needed to hit the case.

Posted here, Peter: http://patchwork.ozlabs.org/patch/339046/

Thanks for reporting and testing!

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Is SCTP throughput really this low compared to TCP?
  2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
                   ` (31 preceding siblings ...)
  2014-04-14 19:46 ` Daniel Borkmann
@ 2014-04-17 15:26 ` Vlad Yasevich
  2014-04-17 16:15 ` Butler, Peter
                   ` (2 subsequent siblings)
  35 siblings, 0 replies; 37+ messages in thread
From: Vlad Yasevich @ 2014-04-17 15:26 UTC (permalink / raw)
  To: linux-sctp

On 04/14/2014 12:47 PM, Butler, Peter wrote:
> Glad to be of help :o)
>

Hi Peter

Would you be able to run this test again with the following patch
on top of the problematic code.

Thanks
-vlad


commit c9888a220916284403c5115d6c6c7e33a00d0b55
Author: Vlad Yasevich <vyasevic@redhat.com>
Date:   Thu Apr 17 09:21:52 2014 -0400

    sctp: Trigger window update SACK after skb has been freed.

    Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>

diff --git a/net/sctp/ulpevent.c b/net/sctp/ulpevent.c
index 8d198ae..b59a7c5 100644
--- a/net/sctp/ulpevent.c
+++ b/net/sctp/ulpevent.c
@@ -1011,7 +1011,6 @@ static void sctp_ulpevent_release_data(struct
sctp_ulpevent *event)
 {
 	struct sk_buff *skb, *frag;
 	unsigned int	len;
-	struct sctp_association *asoc;

 	/* Current stack structures assume that the rcv buffer is
 	 * per socket.   For UDP style sockets this is not true as
@@ -1036,11 +1035,7 @@ static void sctp_ulpevent_release_data(struct
sctp_ulpevent *event)
 	}

 done:
-	asoc = event->asoc;
-	sctp_association_hold(asoc);
 	sctp_ulpevent_release_owner(event);
-	sctp_assoc_rwnd_update(asoc, true);
-	sctp_association_put(asoc);
 }

 static void sctp_ulpevent_release_frag_data(struct sctp_ulpevent *event)
@@ -1071,12 +1066,21 @@ done:
  */
 void sctp_ulpevent_free(struct sctp_ulpevent *event)
 {
+	struct sctp_association *asoc = event->asoc;
+
 	if (sctp_ulpevent_is_notification(event))
 		sctp_ulpevent_release_owner(event);
 	else
 		sctp_ulpevent_release_data(event);

 	kfree_skb(sctp_event2skb(event));
+	/* The socket is locked and the assocaiton can't go anywhere
+	 * since we are walking the uplqueue.  No need to hold
+	 * another ref on the association.  Now that the skb has been
+	 * freed and accounted for everywhere, see if we need to send
+	 * a window update SACK.
+	 */
+	sctp_assoc_rwnd_update(asoc, true);
 }

 /* Purge the skb lists holding ulpevents. */

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* RE: Is SCTP throughput really this low compared to TCP?
  2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
                   ` (32 preceding siblings ...)
  2014-04-17 15:26 ` Vlad Yasevich
@ 2014-04-17 16:15 ` Butler, Peter
  2014-04-22 21:50 ` Butler, Peter
  2014-04-23 12:59 ` Vlad Yasevich
  35 siblings, 0 replies; 37+ messages in thread
From: Butler, Peter @ 2014-04-17 16:15 UTC (permalink / raw)
  To: linux-sctp

I should be able to, but won't be able to run the test until Tuesday April 22...

________________________________________
From: Vlad Yasevich [vyasevich@gmail.com]
Sent: Thursday, April 17, 2014 11:26 AM
To: Butler, Peter; Daniel Borkmann
Cc: linux-sctp@vger.kernel.org
Subject: Re: Is SCTP throughput really this low compared to TCP?

On 04/14/2014 12:47 PM, Butler, Peter wrote:
> Glad to be of help :o)
>

Hi Peter

Would you be able to run this test again with the following patch
on top of the problematic code.

Thanks
-vlad


commit c9888a220916284403c5115d6c6c7e33a00d0b55
Author: Vlad Yasevich <vyasevic@redhat.com>
Date:   Thu Apr 17 09:21:52 2014 -0400

    sctp: Trigger window update SACK after skb has been freed.

    Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>

diff --git a/net/sctp/ulpevent.c b/net/sctp/ulpevent.c
index 8d198ae..b59a7c5 100644
--- a/net/sctp/ulpevent.c
+++ b/net/sctp/ulpevent.c
@@ -1011,7 +1011,6 @@ static void sctp_ulpevent_release_data(struct
sctp_ulpevent *event)
 {
        struct sk_buff *skb, *frag;
        unsigned int    len;
-       struct sctp_association *asoc;

        /* Current stack structures assume that the rcv buffer is
         * per socket.   For UDP style sockets this is not true as
@@ -1036,11 +1035,7 @@ static void sctp_ulpevent_release_data(struct
sctp_ulpevent *event)
        }

 done:
-       asoc = event->asoc;
-       sctp_association_hold(asoc);
        sctp_ulpevent_release_owner(event);
-       sctp_assoc_rwnd_update(asoc, true);
-       sctp_association_put(asoc);
 }

 static void sctp_ulpevent_release_frag_data(struct sctp_ulpevent *event)
@@ -1071,12 +1066,21 @@ done:
  */
 void sctp_ulpevent_free(struct sctp_ulpevent *event)
 {
+       struct sctp_association *asoc = event->asoc;
+
        if (sctp_ulpevent_is_notification(event))
                sctp_ulpevent_release_owner(event);
        else
                sctp_ulpevent_release_data(event);

        kfree_skb(sctp_event2skb(event));
+       /* The socket is locked and the assocaiton can't go anywhere
+        * since we are walking the uplqueue.  No need to hold
+        * another ref on the association.  Now that the skb has been
+        * freed and accounted for everywhere, see if we need to send
+        * a window update SACK.
+        */
+       sctp_assoc_rwnd_update(asoc, true);
 }

 /* Purge the skb lists holding ulpevents. */

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* RE: Is SCTP throughput really this low compared to TCP?
  2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
                   ` (33 preceding siblings ...)
  2014-04-17 16:15 ` Butler, Peter
@ 2014-04-22 21:50 ` Butler, Peter
  2014-04-23 12:59 ` Vlad Yasevich
  35 siblings, 0 replies; 37+ messages in thread
From: Butler, Peter @ 2014-04-22 21:50 UTC (permalink / raw)
  To: linux-sctp

When I apply the patch you provided to the standard 3.14.0 kernel, I still get the highly erratic throughput (see output below).  It was only when I did the full "git revert" as suggested by Daniel that the erratic behaviour went away.

[root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.240.3 -V -l 1452 -t 60
iperf version 3.0.1 (10 January 2014)
Linux Lab200slot2 3.14.0 #1 SMP Thu Apr 3 23:18:29 EDT 2014 x86_64
Time: Tue, 22 Apr 2014 21:44:24 GMT
Connecting to host 192.168.240.3, port 5201
      Cookie: Lab200slot2.1398203064.823332.513f07
[  4] local 192.168.240.2 port 55819 connected to 192.168.240.3 port 5201
Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.08   sec  23.9 MBytes   186 Mbits/sec                  
[  4]   1.08-2.13   sec  16.0 MBytes   128 Mbits/sec                  
[  4]   2.13-3.95   sec   198 MBytes   913 Mbits/sec                  
[  4]   3.95-4.00   sec  15.8 MBytes  2.62 Gbits/sec                  
[  4]   4.00-5.00   sec   226 MBytes  1.90 Gbits/sec                  
[  4]   5.00-6.84   sec   180 MBytes   819 Mbits/sec                  
[  4]   6.84-7.00   sec  44.0 MBytes  2.30 Gbits/sec                  
[  4]   7.00-8.01   sec  6.31 MBytes  52.2 Mbits/sec                  
[  4]   8.01-9.08   sec  21.3 MBytes   167 Mbits/sec                  
[  4]   9.08-10.12  sec  13.2 MBytes   107 Mbits/sec                  
[  4]  10.12-11.17  sec  14.8 MBytes   119 Mbits/sec                  
[  4]  11.17-12.97  sec   180 MBytes   839 Mbits/sec                  
[  4]  12.97-13.00  sec  8.25 MBytes  2.27 Gbits/sec                  
[  4]  13.00-14.10  sec  30.6 MBytes   234 Mbits/sec                  
[  4]  14.10-15.95  sec   191 MBytes   866 Mbits/sec                  
[  4]  15.95-16.00  sec  15.1 MBytes  2.51 Gbits/sec                  
[  4]  16.00-17.00  sec   219 MBytes  1.84 Gbits/sec                  
[  4]  17.00-18.09  sec  28.5 MBytes   218 Mbits/sec                  
[  4]  18.09-19.13  sec  11.4 MBytes  92.5 Mbits/sec                  
[  4]  19.13-20.17  sec  14.1 MBytes   114 Mbits/sec                  
[  4]  20.17-21.21  sec  13.0 MBytes   105 Mbits/sec                  
[  4]  21.21-23.27  sec  16.8 MBytes  68.4 Mbits/sec                  
[  4]  23.27-23.27  sec  0.00 Bytes  0.00 bits/sec                  
[  4]  23.27-24.00  sec   168 MBytes  1.91 Gbits/sec                  
[  4]  24.00-25.76  sec   179 MBytes   852 Mbits/sec                  
[  4]  25.76-26.00  sec  21.2 MBytes   760 Mbits/sec                  






-----Original Message-----
From: Vlad Yasevich [mailto:vyasevich@gmail.com] 
Sent: April-17-14 11:27 AM
To: Butler, Peter; Daniel Borkmann
Cc: linux-sctp@vger.kernel.org
Subject: Re: Is SCTP throughput really this low compared to TCP?

On 04/14/2014 12:47 PM, Butler, Peter wrote:
> Glad to be of help :o)
>

Hi Peter

Would you be able to run this test again with the following patch on top of the problematic code.

Thanks
-vlad


commit c9888a220916284403c5115d6c6c7e33a00d0b55
Author: Vlad Yasevich <vyasevic@redhat.com>
Date:   Thu Apr 17 09:21:52 2014 -0400

    sctp: Trigger window update SACK after skb has been freed.

    Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>

diff --git a/net/sctp/ulpevent.c b/net/sctp/ulpevent.c index 8d198ae..b59a7c5 100644
--- a/net/sctp/ulpevent.c
+++ b/net/sctp/ulpevent.c
@@ -1011,7 +1011,6 @@ static void sctp_ulpevent_release_data(struct sctp_ulpevent *event)  {
 	struct sk_buff *skb, *frag;
 	unsigned int	len;
-	struct sctp_association *asoc;

 	/* Current stack structures assume that the rcv buffer is
 	 * per socket.   For UDP style sockets this is not true as
@@ -1036,11 +1035,7 @@ static void sctp_ulpevent_release_data(struct sctp_ulpevent *event)
 	}

 done:
-	asoc = event->asoc;
-	sctp_association_hold(asoc);
 	sctp_ulpevent_release_owner(event);
-	sctp_assoc_rwnd_update(asoc, true);
-	sctp_association_put(asoc);
 }

 static void sctp_ulpevent_release_frag_data(struct sctp_ulpevent *event) @@ -1071,12 +1066,21 @@ done:
  */
 void sctp_ulpevent_free(struct sctp_ulpevent *event)  {
+	struct sctp_association *asoc = event->asoc;
+
 	if (sctp_ulpevent_is_notification(event))
 		sctp_ulpevent_release_owner(event);
 	else
 		sctp_ulpevent_release_data(event);

 	kfree_skb(sctp_event2skb(event));
+	/* The socket is locked and the assocaiton can't go anywhere
+	 * since we are walking the uplqueue.  No need to hold
+	 * another ref on the association.  Now that the skb has been
+	 * freed and accounted for everywhere, see if we need to send
+	 * a window update SACK.
+	 */
+	sctp_assoc_rwnd_update(asoc, true);
 }

 /* Purge the skb lists holding ulpevents. */

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Is SCTP throughput really this low compared to TCP?
  2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
                   ` (34 preceding siblings ...)
  2014-04-22 21:50 ` Butler, Peter
@ 2014-04-23 12:59 ` Vlad Yasevich
  35 siblings, 0 replies; 37+ messages in thread
From: Vlad Yasevich @ 2014-04-23 12:59 UTC (permalink / raw)
  To: linux-sctp

On 04/22/2014 05:50 PM, Butler, Peter wrote:
> When I apply the patch you provided to the standard 3.14.0 kernel, I still get the highly erratic throughput (see output below).  It was only when I did the full "git revert" as suggested by Daniel that the erratic behaviour went away.
> 
> [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.240.3 -V -l 1452 -t 60
> iperf version 3.0.1 (10 January 2014)
> Linux Lab200slot2 3.14.0 #1 SMP Thu Apr 3 23:18:29 EDT 2014 x86_64
> Time: Tue, 22 Apr 2014 21:44:24 GMT
> Connecting to host 192.168.240.3, port 5201
>       Cookie: Lab200slot2.1398203064.823332.513f07
> [  4] local 192.168.240.2 port 55819 connected to 192.168.240.3 port 5201
> Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
> [ ID] Interval           Transfer     Bandwidth
> [  4]   0.00-1.08   sec  23.9 MBytes   186 Mbits/sec                  
> [  4]   1.08-2.13   sec  16.0 MBytes   128 Mbits/sec                  
> [  4]   2.13-3.95   sec   198 MBytes   913 Mbits/sec                  
> [  4]   3.95-4.00   sec  15.8 MBytes  2.62 Gbits/sec                  
> [  4]   4.00-5.00   sec   226 MBytes  1.90 Gbits/sec                  
> [  4]   5.00-6.84   sec   180 MBytes   819 Mbits/sec                  
> [  4]   6.84-7.00   sec  44.0 MBytes  2.30 Gbits/sec                  
> [  4]   7.00-8.01   sec  6.31 MBytes  52.2 Mbits/sec                  
> [  4]   8.01-9.08   sec  21.3 MBytes   167 Mbits/sec                  
> [  4]   9.08-10.12  sec  13.2 MBytes   107 Mbits/sec                  
> [  4]  10.12-11.17  sec  14.8 MBytes   119 Mbits/sec                  
> [  4]  11.17-12.97  sec   180 MBytes   839 Mbits/sec                  
> [  4]  12.97-13.00  sec  8.25 MBytes  2.27 Gbits/sec                  
> [  4]  13.00-14.10  sec  30.6 MBytes   234 Mbits/sec                  
> [  4]  14.10-15.95  sec   191 MBytes   866 Mbits/sec                  
> [  4]  15.95-16.00  sec  15.1 MBytes  2.51 Gbits/sec                  
> [  4]  16.00-17.00  sec   219 MBytes  1.84 Gbits/sec                  
> [  4]  17.00-18.09  sec  28.5 MBytes   218 Mbits/sec                  
> [  4]  18.09-19.13  sec  11.4 MBytes  92.5 Mbits/sec                  
> [  4]  19.13-20.17  sec  14.1 MBytes   114 Mbits/sec                  
> [  4]  20.17-21.21  sec  13.0 MBytes   105 Mbits/sec                  
> [  4]  21.21-23.27  sec  16.8 MBytes  68.4 Mbits/sec                  
> [  4]  23.27-23.27  sec  0.00 Bytes  0.00 bits/sec                  
> [  4]  23.27-24.00  sec   168 MBytes  1.91 Gbits/sec                  
> [  4]  24.00-25.76  sec   179 MBytes   852 Mbits/sec                  
> [  4]  25.76-26.00  sec  21.2 MBytes   760 Mbits/sec                  
> 
> 

Thanks Peter.  This means there is something else wrong...

-vlad

> 
> 
> 
> 
> -----Original Message-----
> From: Vlad Yasevich [mailto:vyasevich@gmail.com] 
> Sent: April-17-14 11:27 AM
> To: Butler, Peter; Daniel Borkmann
> Cc: linux-sctp@vger.kernel.org
> Subject: Re: Is SCTP throughput really this low compared to TCP?
> 
> On 04/14/2014 12:47 PM, Butler, Peter wrote:
>> Glad to be of help :o)
>>
> 
> Hi Peter
> 
> Would you be able to run this test again with the following patch on top of the problematic code.
> 
> Thanks
> -vlad
> 
> 
> commit c9888a220916284403c5115d6c6c7e33a00d0b55
> Author: Vlad Yasevich <vyasevic@redhat.com>
> Date:   Thu Apr 17 09:21:52 2014 -0400
> 
>     sctp: Trigger window update SACK after skb has been freed.
> 
>     Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
> 
> diff --git a/net/sctp/ulpevent.c b/net/sctp/ulpevent.c index 8d198ae..b59a7c5 100644
> --- a/net/sctp/ulpevent.c
> +++ b/net/sctp/ulpevent.c
> @@ -1011,7 +1011,6 @@ static void sctp_ulpevent_release_data(struct sctp_ulpevent *event)  {
>  	struct sk_buff *skb, *frag;
>  	unsigned int	len;
> -	struct sctp_association *asoc;
> 
>  	/* Current stack structures assume that the rcv buffer is
>  	 * per socket.   For UDP style sockets this is not true as
> @@ -1036,11 +1035,7 @@ static void sctp_ulpevent_release_data(struct sctp_ulpevent *event)
>  	}
> 
>  done:
> -	asoc = event->asoc;
> -	sctp_association_hold(asoc);
>  	sctp_ulpevent_release_owner(event);
> -	sctp_assoc_rwnd_update(asoc, true);
> -	sctp_association_put(asoc);
>  }
> 
>  static void sctp_ulpevent_release_frag_data(struct sctp_ulpevent *event) @@ -1071,12 +1066,21 @@ done:
>   */
>  void sctp_ulpevent_free(struct sctp_ulpevent *event)  {
> +	struct sctp_association *asoc = event->asoc;
> +
>  	if (sctp_ulpevent_is_notification(event))
>  		sctp_ulpevent_release_owner(event);
>  	else
>  		sctp_ulpevent_release_data(event);
> 
>  	kfree_skb(sctp_event2skb(event));
> +	/* The socket is locked and the assocaiton can't go anywhere
> +	 * since we are walking the uplqueue.  No need to hold
> +	 * another ref on the association.  Now that the skb has been
> +	 * freed and accounted for everywhere, see if we need to send
> +	 * a window update SACK.
> +	 */
> +	sctp_assoc_rwnd_update(asoc, true);
>  }
> 
>  /* Purge the skb lists holding ulpevents. */
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2014-04-23 12:59 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
2014-04-10 20:21 ` Vlad Yasevich
2014-04-10 20:40 ` Butler, Peter
2014-04-10 21:00 ` Vlad Yasevich
2014-04-11  7:42 ` Daniel Borkmann
2014-04-11 15:07 ` Butler, Peter
2014-04-11 15:21 ` Daniel Borkmann
2014-04-11 15:27 ` Vlad Yasevich
2014-04-11 15:35 ` Daniel Borkmann
2014-04-11 18:19 ` Vlad Yasevich
2014-04-11 18:22 ` Butler, Peter
2014-04-11 18:40 ` Daniel Borkmann
2014-04-11 18:41 ` Daniel Borkmann
2014-04-11 18:58 ` Butler, Peter
2014-04-11 19:16 ` Butler, Peter
2014-04-11 19:20 ` Vlad Yasevich
2014-04-11 19:24 ` Butler, Peter
2014-04-11 20:14 ` Butler, Peter
2014-04-11 20:18 ` Butler, Peter
2014-04-11 20:51 ` Vlad Yasevich
2014-04-11 20:53 ` Vlad Yasevich
2014-04-11 20:57 ` Butler, Peter
2014-04-11 23:58 ` Daniel Borkmann
2014-04-12  7:27 ` Dongsheng Song
2014-04-14 14:52 ` Butler, Peter
2014-04-14 15:49 ` Butler, Peter
2014-04-14 16:43 ` Butler, Peter
2014-04-14 16:45 ` Daniel Borkmann
2014-04-14 16:47 ` Butler, Peter
2014-04-14 17:06 ` Butler, Peter
2014-04-14 17:10 ` Butler, Peter
2014-04-14 18:54 ` Matija Glavinic Pecotic
2014-04-14 19:46 ` Daniel Borkmann
2014-04-17 15:26 ` Vlad Yasevich
2014-04-17 16:15 ` Butler, Peter
2014-04-22 21:50 ` Butler, Peter
2014-04-23 12:59 ` Vlad Yasevich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.