* Is SCTP throughput really this low compared to TCP?
@ 2014-04-10 19:12 Butler, Peter
2014-04-10 20:21 ` Vlad Yasevich
` (35 more replies)
0 siblings, 36 replies; 37+ messages in thread
From: Butler, Peter @ 2014-04-10 19:12 UTC (permalink / raw)
To: linux-sctp
I've been testing SCTP throughput between two nodes over a 10Gb-Ethernet backplane, and am finding that at best, its throughput is about a third of that of TCP. Is this number generally accepted for current LKSCTP performance?
All TCP/SCTP tests performed with 1000-byte (payload) messages, between 8-core Xeon nodes @ 2.13GHz, with no CPU throttling (always running at 100%) on otherwise idle systems. Test applications include netperf, iperf and proprietary in-house stubs.
The latency between nodes is generally 0.2 ms. Tests were run using this low-latency scenario, as well as using traffic control (tc) to simulate networks with 10 ms, 20 ms and 50 ms latency (i.e. 20 ms, 40 ms and 100 ms RTT, respectively).
In addition, each of these network scenarios were tested using various kernel socket buffer sizes, ranging from the default kernel size (100-200 kB), to several MB for send and receive buffers, and multiple send:receive ratios for these buffer sizes (generally using larger receive buffer sizes, up to a factor of about 6).
Finally, tests were performed on kernels as old as 3.4.2 and as recent as 3.14.
The TCP throughput is about 3x higher than that of SCTP as a best-case scenario (i.e. from an SCTP perspective), and much higher still in worst-case scenarios.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Is SCTP throughput really this low compared to TCP?
2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
@ 2014-04-10 20:21 ` Vlad Yasevich
2014-04-10 20:40 ` Butler, Peter
` (34 subsequent siblings)
35 siblings, 0 replies; 37+ messages in thread
From: Vlad Yasevich @ 2014-04-10 20:21 UTC (permalink / raw)
To: linux-sctp
On 04/10/2014 03:12 PM, Butler, Peter wrote:
> I've been testing SCTP throughput between two nodes over a 10Gb-Ethernet backplane, and am finding that at best, its throughput is about a third of that of TCP. Is this number generally accepted for current LKSCTP performance?
>
> All TCP/SCTP tests performed with 1000-byte (payload) messages, between 8-core Xeon nodes @ 2.13GHz, with no CPU throttling (always running at 100%) on otherwise idle systems. Test applications include netperf, iperf and proprietary in-house stubs.
>
> The latency between nodes is generally 0.2 ms. Tests were run using this low-latency scenario, as well as using traffic control (tc) to simulate networks with 10 ms, 20 ms and 50 ms latency (i.e. 20 ms, 40 ms and 100 ms RTT, respectively).
>
> In addition, each of these network scenarios were tested using various kernel socket buffer sizes, ranging from the default kernel size (100-200 kB), to several MB for send and receive buffers, and multiple send:receive ratios for these buffer sizes (generally using larger receive buffer sizes, up to a factor of about 6).
>
> Finally, tests were performed on kernels as old as 3.4.2 and as recent as 3.14.
>
> The TCP throughput is about 3x higher than that of SCTP as a best-case scenario (i.e. from an SCTP perspective), and much higher still in worst-case scenarios.
>
>
To do a more of apples-to-apples comparison, you need to disable tso/gso
on the sending node.
The reason is that even if you limit buffer sizes, tcp will still try to
do tso on the transmit size, thus coalescing you 1000-byte messages into
something much larger, thus utilizing your MTU much more efficiently.
SCTP, on the other hand, has to preserve message boundaries which
results in sub-optimal mtu utilization when using 1000-byte payloads.
My recommendation is to use 1464 byte message for SCTP on a 1500 byte
MTU nic.
I would be interested to see the results. There could very well be issues.
-vlad
^ permalink raw reply [flat|nested] 37+ messages in thread
* RE: Is SCTP throughput really this low compared to TCP?
2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
2014-04-10 20:21 ` Vlad Yasevich
@ 2014-04-10 20:40 ` Butler, Peter
2014-04-10 21:00 ` Vlad Yasevich
` (33 subsequent siblings)
35 siblings, 0 replies; 37+ messages in thread
From: Butler, Peter @ 2014-04-10 20:40 UTC (permalink / raw)
To: linux-sctp
Thanks - I will give that a try. What about generic-receive-offload and large-receive-offload ?
-----Original Message-----
From: Vlad Yasevich [mailto:vyasevich@gmail.com]
Sent: April-10-14 4:21 PM
To: Butler, Peter; linux-sctp@vger.kernel.org
Subject: Re: Is SCTP throughput really this low compared to TCP?
On 04/10/2014 03:12 PM, Butler, Peter wrote:
> I've been testing SCTP throughput between two nodes over a 10Gb-Ethernet backplane, and am finding that at best, its throughput is about a third of that of TCP. Is this number generally accepted for current LKSCTP performance?
>
> All TCP/SCTP tests performed with 1000-byte (payload) messages, between 8-core Xeon nodes @ 2.13GHz, with no CPU throttling (always running at 100%) on otherwise idle systems. Test applications include netperf, iperf and proprietary in-house stubs.
>
> The latency between nodes is generally 0.2 ms. Tests were run using this low-latency scenario, as well as using traffic control (tc) to simulate networks with 10 ms, 20 ms and 50 ms latency (i.e. 20 ms, 40 ms and 100 ms RTT, respectively).
>
> In addition, each of these network scenarios were tested using various kernel socket buffer sizes, ranging from the default kernel size (100-200 kB), to several MB for send and receive buffers, and multiple send:receive ratios for these buffer sizes (generally using larger receive buffer sizes, up to a factor of about 6).
>
> Finally, tests were performed on kernels as old as 3.4.2 and as recent as 3.14.
>
> The TCP throughput is about 3x higher than that of SCTP as a best-case scenario (i.e. from an SCTP perspective), and much higher still in worst-case scenarios.
>
>
To do a more of apples-to-apples comparison, you need to disable tso/gso on the sending node.
The reason is that even if you limit buffer sizes, tcp will still try to do tso on the transmit size, thus coalescing you 1000-byte messages into something much larger, thus utilizing your MTU much more efficiently.
SCTP, on the other hand, has to preserve message boundaries which results in sub-optimal mtu utilization when using 1000-byte payloads.
My recommendation is to use 1464 byte message for SCTP on a 1500 byte MTU nic.
I would be interested to see the results. There could very well be issues.
-vlad
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Is SCTP throughput really this low compared to TCP?
2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
2014-04-10 20:21 ` Vlad Yasevich
2014-04-10 20:40 ` Butler, Peter
@ 2014-04-10 21:00 ` Vlad Yasevich
2014-04-11 7:42 ` Daniel Borkmann
` (32 subsequent siblings)
35 siblings, 0 replies; 37+ messages in thread
From: Vlad Yasevich @ 2014-04-10 21:00 UTC (permalink / raw)
To: linux-sctp
On 04/10/2014 04:40 PM, Butler, Peter wrote:
> Thanks - I will give that a try. What about generic-receive-offload and large-receive-offload ?
They do help tcp a bit by allowing it to ack more data in one shot.
If they are on, might make sense to turn them off.
I suppose sctp could benefit from GRO a bit...
-vlad
>
> -----Original Message-----
> From: Vlad Yasevich [mailto:vyasevich@gmail.com]
> Sent: April-10-14 4:21 PM
> To: Butler, Peter; linux-sctp@vger.kernel.org
> Subject: Re: Is SCTP throughput really this low compared to TCP?
>
> On 04/10/2014 03:12 PM, Butler, Peter wrote:
>> I've been testing SCTP throughput between two nodes over a 10Gb-Ethernet backplane, and am finding that at best, its throughput is about a third of that of TCP. Is this number generally accepted for current LKSCTP performance?
>>
>> All TCP/SCTP tests performed with 1000-byte (payload) messages, between 8-core Xeon nodes @ 2.13GHz, with no CPU throttling (always running at 100%) on otherwise idle systems. Test applications include netperf, iperf and proprietary in-house stubs.
>>
>> The latency between nodes is generally 0.2 ms. Tests were run using this low-latency scenario, as well as using traffic control (tc) to simulate networks with 10 ms, 20 ms and 50 ms latency (i.e. 20 ms, 40 ms and 100 ms RTT, respectively).
>>
>> In addition, each of these network scenarios were tested using various kernel socket buffer sizes, ranging from the default kernel size (100-200 kB), to several MB for send and receive buffers, and multiple send:receive ratios for these buffer sizes (generally using larger receive buffer sizes, up to a factor of about 6).
>>
>> Finally, tests were performed on kernels as old as 3.4.2 and as recent as 3.14.
>>
>> The TCP throughput is about 3x higher than that of SCTP as a best-case scenario (i.e. from an SCTP perspective), and much higher still in worst-case scenarios.
>>
>>
>
> To do a more of apples-to-apples comparison, you need to disable tso/gso on the sending node.
>
> The reason is that even if you limit buffer sizes, tcp will still try to do tso on the transmit size, thus coalescing you 1000-byte messages into something much larger, thus utilizing your MTU much more efficiently.
>
> SCTP, on the other hand, has to preserve message boundaries which results in sub-optimal mtu utilization when using 1000-byte payloads.
>
> My recommendation is to use 1464 byte message for SCTP on a 1500 byte MTU nic.
>
> I would be interested to see the results. There could very well be issues.
>
> -vlad
>
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Is SCTP throughput really this low compared to TCP?
2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
` (2 preceding siblings ...)
2014-04-10 21:00 ` Vlad Yasevich
@ 2014-04-11 7:42 ` Daniel Borkmann
2014-04-11 15:07 ` Butler, Peter
` (31 subsequent siblings)
35 siblings, 0 replies; 37+ messages in thread
From: Daniel Borkmann @ 2014-04-11 7:42 UTC (permalink / raw)
To: linux-sctp
Hi Peter,
On 04/10/2014 10:21 PM, Vlad Yasevich wrote:
> On 04/10/2014 03:12 PM, Butler, Peter wrote:
>> I've been testing SCTP throughput between two nodes over a 10Gb-Ethernet backplane, and am finding that at best, its throughput is about a third of that of TCP. Is this number generally accepted for current LKSCTP performance?
>>
>> All TCP/SCTP tests performed with 1000-byte (payload) messages, between 8-core Xeon nodes @ 2.13GHz, with no CPU throttling (always running at 100%) on otherwise idle systems. Test applications include netperf, iperf and proprietary in-house stubs.
>>
>> The latency between nodes is generally 0.2 ms. Tests were run using this low-latency scenario, as well as using traffic control (tc) to simulate networks with 10 ms, 20 ms and 50 ms latency (i.e. 20 ms, 40 ms and 100 ms RTT, respectively).
>>
>> In addition, each of these network scenarios were tested using various kernel socket buffer sizes, ranging from the default kernel size (100-200 kB), to several MB for send and receive buffers, and multiple send:receive ratios for these buffer sizes (generally using larger receive buffer sizes, up to a factor of about 6).
>>
>> Finally, tests were performed on kernels as old as 3.4.2 and as recent as 3.14.
>>
>> The TCP throughput is about 3x higher than that of SCTP as a best-case scenario (i.e. from an SCTP perspective), and much higher still in worst-case scenarios.
>
> To do a more of apples-to-apples comparison, you need to disable tso/gso
> on the sending node.
>
> The reason is that even if you limit buffer sizes, tcp will still try to
> do tso on the transmit size, thus coalescing you 1000-byte messages into
> something much larger, thus utilizing your MTU much more efficiently.
>
> SCTP, on the other hand, has to preserve message boundaries which
> results in sub-optimal mtu utilization when using 1000-byte payloads.
>
> My recommendation is to use 1464 byte message for SCTP on a 1500 byte
> MTU nic.
>
> I would be interested to see the results. There could very well be issues.
Agreed.
Also, what NIC are you using? It seems only Intel provides SCTP checksum
offloading so far, i.e. ixgbe/i40e NICs.
> -vlad
^ permalink raw reply [flat|nested] 37+ messages in thread
* RE: Is SCTP throughput really this low compared to TCP?
2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
` (3 preceding siblings ...)
2014-04-11 7:42 ` Daniel Borkmann
@ 2014-04-11 15:07 ` Butler, Peter
2014-04-11 15:21 ` Daniel Borkmann
` (30 subsequent siblings)
35 siblings, 0 replies; 37+ messages in thread
From: Butler, Peter @ 2014-04-11 15:07 UTC (permalink / raw)
To: linux-sctp
Yes indeed this is ixgbe and I do have SCTP checksum offloading enabled. (For what it's worth, the checksum offload gives about a 20% throughput gain - but this is, of course, already included in the numbers I posted to this thread as I've been using the CRC offload all along.)
I re-did all the tests with TSO/GSO/LRO/GRO disabled (on both sides of the association - i.e. on both endpoint nodes), and using 1452-byte messages instead of 1000-byte messages. With this new setup, the TCP performance drops significantly, as expected, while the SCTP performance is boosted, and the playing field is somewhat more 'level'. (Note that I could not use 1464-byte messages as suggested by Vlad, as anything above 1452 cut the SCTP performance in half - must have hit the segmentation limit at this slightly lower message size. MTU is 1500.)
So comparing "apples to apples" now, TCP only out-performs SCTP by approximately 40-70% over the various range of network latencies I tested with (RTTs of 0.2 ms, 10 ms, 20 ms, and 50 ms). 40-70% is still significant, but nowhere near the 200% better (i.e. 3 times the throughput) I was getting before.
Does this value (i.e. 40-70%) sound reasonable? Is this the more-or-less accepted performance difference with the current LKSCTP implementation?
Also, for what it's worth, I get better SCTP throughput numbers with the older kernel (3.4.2) than with the newer kernel (3.14)...
-----Original Message-----
From: Daniel Borkmann [mailto:dborkman@redhat.com]
Sent: April-11-14 3:43 AM
To: Vlad Yasevich
Cc: Butler, Peter; linux-sctp@vger.kernel.org
Subject: Re: Is SCTP throughput really this low compared to TCP?
Hi Peter,
On 04/10/2014 10:21 PM, Vlad Yasevich wrote:
> On 04/10/2014 03:12 PM, Butler, Peter wrote:
>> I've been testing SCTP throughput between two nodes over a 10Gb-Ethernet backplane, and am finding that at best, its throughput is about a third of that of TCP. Is this number generally accepted for current LKSCTP performance?
>>
>> All TCP/SCTP tests performed with 1000-byte (payload) messages, between 8-core Xeon nodes @ 2.13GHz, with no CPU throttling (always running at 100%) on otherwise idle systems. Test applications include netperf, iperf and proprietary in-house stubs.
>>
>> The latency between nodes is generally 0.2 ms. Tests were run using this low-latency scenario, as well as using traffic control (tc) to simulate networks with 10 ms, 20 ms and 50 ms latency (i.e. 20 ms, 40 ms and 100 ms RTT, respectively).
>>
>> In addition, each of these network scenarios were tested using various kernel socket buffer sizes, ranging from the default kernel size (100-200 kB), to several MB for send and receive buffers, and multiple send:receive ratios for these buffer sizes (generally using larger receive buffer sizes, up to a factor of about 6).
>>
>> Finally, tests were performed on kernels as old as 3.4.2 and as recent as 3.14.
>>
>> The TCP throughput is about 3x higher than that of SCTP as a best-case scenario (i.e. from an SCTP perspective), and much higher still in worst-case scenarios.
>
> To do a more of apples-to-apples comparison, you need to disable
> tso/gso on the sending node.
>
> The reason is that even if you limit buffer sizes, tcp will still try
> to do tso on the transmit size, thus coalescing you 1000-byte messages
> into something much larger, thus utilizing your MTU much more efficiently.
>
> SCTP, on the other hand, has to preserve message boundaries which
> results in sub-optimal mtu utilization when using 1000-byte payloads.
>
> My recommendation is to use 1464 byte message for SCTP on a 1500 byte
> MTU nic.
>
> I would be interested to see the results. There could very well be issues.
Agreed.
Also, what NIC are you using? It seems only Intel provides SCTP checksum offloading so far, i.e. ixgbe/i40e NICs.
> -vlad
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Is SCTP throughput really this low compared to TCP?
2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
` (4 preceding siblings ...)
2014-04-11 15:07 ` Butler, Peter
@ 2014-04-11 15:21 ` Daniel Borkmann
2014-04-11 15:27 ` Vlad Yasevich
` (29 subsequent siblings)
35 siblings, 0 replies; 37+ messages in thread
From: Daniel Borkmann @ 2014-04-11 15:21 UTC (permalink / raw)
To: linux-sctp
On 04/11/2014 05:07 PM, Butler, Peter wrote:
> Yes indeed this is ixgbe and I do have SCTP checksum offloading enabled. (For what
> it's worth, the checksum offload gives about a 20% throughput gain - but this is,
> of course, already included in the numbers I posted to this thread as I've been using
> the CRC offload all along.)
Ok, understood.
> I re-did all the tests with TSO/GSO/LRO/GRO disabled (on both sides of the association -
> i.e. on both endpoint nodes), and using 1452-byte messages instead of 1000-byte
> messages. With this new setup, the TCP performance drops significantly, as expected,
> while the SCTP performance is boosted, and the playing field is somewhat more 'level'.
> (Note that I could not use 1464-byte messages as suggested by Vlad, as anything above
> 1452 cut the SCTP performance in half - must have hit the segmentation limit at this
> slightly lower message size. MTU is 1500.)
>
> So comparing "apples to apples" now, TCP only out-performs SCTP by approximately 40-70%
> over the various range of network latencies I tested with (RTTs of 0.2 ms, 10 ms, 20 ms,
> and 50 ms). 40-70% is still significant, but nowhere near the 200% better (i.e. 3
> times the throughput) I was getting before.
>
> Does this value (i.e. 40-70%) sound reasonable? Is this the more-or-less accepted
> performance difference with the current LKSCTP implementation?
Yes, that sounds reasonable to me. There are still a lot of open todos in terms of
performance that we need to tackle over time, e.g. the way chunks are handled, imho,
and copies involved in fast path, also that we're heavily using atomic reference
counting, and other open issues.
> Also, for what it's worth, I get better SCTP throughput numbers with the older kernel
> (3.4.2) than with the newer kernel (3.14)...
Interesting, a lot of things happened in between; were you able to bisect/identify
a possible commit that causes this? How big is the difference?
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Is SCTP throughput really this low compared to TCP?
2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
` (5 preceding siblings ...)
2014-04-11 15:21 ` Daniel Borkmann
@ 2014-04-11 15:27 ` Vlad Yasevich
2014-04-11 15:35 ` Daniel Borkmann
` (28 subsequent siblings)
35 siblings, 0 replies; 37+ messages in thread
From: Vlad Yasevich @ 2014-04-11 15:27 UTC (permalink / raw)
To: linux-sctp
On 04/11/2014 11:07 AM, Butler, Peter wrote:
> Yes indeed this is ixgbe and I do have SCTP checksum offloading enabled. (For what it's worth, the checksum offload gives about a 20% throughput gain - but this is, of course, already included in the numbers I posted to this thread as I've been using the CRC offload all along.)
>
> I re-did all the tests with TSO/GSO/LRO/GRO disabled (on both sides of the association - i.e. on both endpoint nodes), and using 1452-byte messages instead of 1000-byte messages. With this new setup, the TCP performance drops significantly, as expected, while the SCTP performance is boosted, and the playing field is somewhat more 'level'. (Note that I could not use 1464-byte messages as suggested by Vlad, as anything above 1452 cut the SCTP performance in half - must have hit the segmentation limit at this slightly lower message size. MTU is 1500.)
>
> So comparing "apples to apples" now, TCP only out-performs SCTP by approximately 40-70% over the various range of network latencies I tested with (RTTs of 0.2 ms, 10 ms, 20 ms, and 50 ms). 40-70% is still significant, but nowhere near the 200% better (i.e. 3 times the throughput) I was getting before.
>
> Does this value (i.e. 40-70%) sound reasonable?
This still looks high. Could you run 'perf record -a' and 'perf report'
to see where we are spending all of our time in sctp.
My guess is that a lot of it is going to be in memcpy(), but I am
curious.
> Is this the more-or-less accepted performance difference with the current LKSCTP implementation?
>
> Also, for what it's worth, I get better SCTP throughput numbers with the older kernel (3.4.2) than with the newer kernel (3.14)...
>
That's interesting. I'll have to look at see what might have changed here.
-vlad
>
>
> -----Original Message-----
> From: Daniel Borkmann [mailto:dborkman@redhat.com]
> Sent: April-11-14 3:43 AM
> To: Vlad Yasevich
> Cc: Butler, Peter; linux-sctp@vger.kernel.org
> Subject: Re: Is SCTP throughput really this low compared to TCP?
>
> Hi Peter,
>
> On 04/10/2014 10:21 PM, Vlad Yasevich wrote:
>> On 04/10/2014 03:12 PM, Butler, Peter wrote:
>>> I've been testing SCTP throughput between two nodes over a 10Gb-Ethernet backplane, and am finding that at best, its throughput is about a third of that of TCP. Is this number generally accepted for current LKSCTP performance?
>>>
>>> All TCP/SCTP tests performed with 1000-byte (payload) messages, between 8-core Xeon nodes @ 2.13GHz, with no CPU throttling (always running at 100%) on otherwise idle systems. Test applications include netperf, iperf and proprietary in-house stubs.
>>>
>>> The latency between nodes is generally 0.2 ms. Tests were run using this low-latency scenario, as well as using traffic control (tc) to simulate networks with 10 ms, 20 ms and 50 ms latency (i.e. 20 ms, 40 ms and 100 ms RTT, respectively).
>>>
>>> In addition, each of these network scenarios were tested using various kernel socket buffer sizes, ranging from the default kernel size (100-200 kB), to several MB for send and receive buffers, and multiple send:receive ratios for these buffer sizes (generally using larger receive buffer sizes, up to a factor of about 6).
>>>
>>> Finally, tests were performed on kernels as old as 3.4.2 and as recent as 3.14.
>>>
>>> The TCP throughput is about 3x higher than that of SCTP as a best-case scenario (i.e. from an SCTP perspective), and much higher still in worst-case scenarios.
>>
>> To do a more of apples-to-apples comparison, you need to disable
>> tso/gso on the sending node.
>>
>> The reason is that even if you limit buffer sizes, tcp will still try
>> to do tso on the transmit size, thus coalescing you 1000-byte messages
>> into something much larger, thus utilizing your MTU much more efficiently.
>>
>> SCTP, on the other hand, has to preserve message boundaries which
>> results in sub-optimal mtu utilization when using 1000-byte payloads.
>>
>> My recommendation is to use 1464 byte message for SCTP on a 1500 byte
>> MTU nic.
>>
>> I would be interested to see the results. There could very well be issues.
>
> Agreed.
>
> Also, what NIC are you using? It seems only Intel provides SCTP checksum offloading so far, i.e. ixgbe/i40e NICs.
>
>> -vlad
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Is SCTP throughput really this low compared to TCP?
2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
` (6 preceding siblings ...)
2014-04-11 15:27 ` Vlad Yasevich
@ 2014-04-11 15:35 ` Daniel Borkmann
2014-04-11 18:19 ` Vlad Yasevich
` (27 subsequent siblings)
35 siblings, 0 replies; 37+ messages in thread
From: Daniel Borkmann @ 2014-04-11 15:35 UTC (permalink / raw)
To: linux-sctp
On 04/11/2014 05:27 PM, Vlad Yasevich wrote:
> On 04/11/2014 11:07 AM, Butler, Peter wrote:
>> Yes indeed this is ixgbe and I do have SCTP checksum offloading enabled. (For what it's worth, the checksum offload gives about a 20% throughput gain - but this is, of course, already included in the numbers I posted to this thread as I've been using the CRC offload all along.)
>>
>> I re-did all the tests with TSO/GSO/LRO/GRO disabled (on both sides of the association - i.e. on both endpoint nodes), and using 1452-byte messages instead of 1000-byte messages. With this new setup, the TCP performance drops significantly, as expected, while the SCTP performance is boosted, and the playing field is somewhat more 'level'. (Note that I could not use 1464-byte messages as suggested by Vlad, as anything above 1452 cut the SCTP performance in half - must have hit the segmentation limit at this slightly lower message size. MTU is 1500.)
>>
>> So comparing "apples to apples" now, TCP only out-performs SCTP by approximately 40-70% over the various range of network latencies I tested with (RTTs of 0.2 ms, 10 ms, 20 ms, and 50 ms). 40-70% is still significant, but nowhere near the 200% better (i.e. 3 times the throughput) I was getting before.
>>
>> Does this value (i.e. 40-70%) sound reasonable?
>
> This still looks high. Could you run 'perf record -a' and 'perf report'
> to see where we are spending all of our time in sctp.
+1
> My guess is that a lot of it is going to be in memcpy(), but I am
> curious.
>
>> Is this the more-or-less accepted performance difference with the current LKSCTP implementation?
>>
>> Also, for what it's worth, I get better SCTP throughput numbers with the older kernel (3.4.2) than with the newer kernel (3.14)...
>>
>
> That's interesting. I'll have to look at see what might have changed here.
I remember in Fengguang tests reporting about the one below (from 3.11),
but the starting baseline was already quite low ...
commit ef2820a735f74ea60335f8ba3801b844f0cb184d
Author: Matija Glavinic Pecotic <matija.glavinic-pecotic.ext@nsn.com>
Date: Fri Feb 14 14:51:18 2014 +0100
net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer
> -vlad
>
>>
>>
>> -----Original Message-----
>> From: Daniel Borkmann [mailto:dborkman@redhat.com]
>> Sent: April-11-14 3:43 AM
>> To: Vlad Yasevich
>> Cc: Butler, Peter; linux-sctp@vger.kernel.org
>> Subject: Re: Is SCTP throughput really this low compared to TCP?
>>
>> Hi Peter,
>>
>> On 04/10/2014 10:21 PM, Vlad Yasevich wrote:
>>> On 04/10/2014 03:12 PM, Butler, Peter wrote:
>>>> I've been testing SCTP throughput between two nodes over a 10Gb-Ethernet backplane, and am finding that at best, its throughput is about a third of that of TCP. Is this number generally accepted for current LKSCTP performance?
>>>>
>>>> All TCP/SCTP tests performed with 1000-byte (payload) messages, between 8-core Xeon nodes @ 2.13GHz, with no CPU throttling (always running at 100%) on otherwise idle systems. Test applications include netperf, iperf and proprietary in-house stubs.
>>>>
>>>> The latency between nodes is generally 0.2 ms. Tests were run using this low-latency scenario, as well as using traffic control (tc) to simulate networks with 10 ms, 20 ms and 50 ms latency (i.e. 20 ms, 40 ms and 100 ms RTT, respectively).
>>>>
>>>> In addition, each of these network scenarios were tested using various kernel socket buffer sizes, ranging from the default kernel size (100-200 kB), to several MB for send and receive buffers, and multiple send:receive ratios for these buffer sizes (generally using larger receive buffer sizes, up to a factor of about 6).
>>>>
>>>> Finally, tests were performed on kernels as old as 3.4.2 and as recent as 3.14.
>>>>
>>>> The TCP throughput is about 3x higher than that of SCTP as a best-case scenario (i.e. from an SCTP perspective), and much higher still in worst-case scenarios.
>>>
>>> To do a more of apples-to-apples comparison, you need to disable
>>> tso/gso on the sending node.
>>>
>>> The reason is that even if you limit buffer sizes, tcp will still try
>>> to do tso on the transmit size, thus coalescing you 1000-byte messages
>>> into something much larger, thus utilizing your MTU much more efficiently.
>>>
>>> SCTP, on the other hand, has to preserve message boundaries which
>>> results in sub-optimal mtu utilization when using 1000-byte payloads.
>>>
>>> My recommendation is to use 1464 byte message for SCTP on a 1500 byte
>>> MTU nic.
>>>
>>> I would be interested to see the results. There could very well be issues.
>>
>> Agreed.
>>
>> Also, what NIC are you using? It seems only Intel provides SCTP checksum offloading so far, i.e. ixgbe/i40e NICs.
>>
>>> -vlad
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Is SCTP throughput really this low compared to TCP?
2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
` (7 preceding siblings ...)
2014-04-11 15:35 ` Daniel Borkmann
@ 2014-04-11 18:19 ` Vlad Yasevich
2014-04-11 18:22 ` Butler, Peter
` (26 subsequent siblings)
35 siblings, 0 replies; 37+ messages in thread
From: Vlad Yasevich @ 2014-04-11 18:19 UTC (permalink / raw)
To: linux-sctp
On 04/11/2014 11:07 AM, Butler, Peter wrote:
> Yes indeed this is ixgbe and I do have SCTP checksum offloading enabled. (For what it's worth, the checksum offload gives about a 20% throughput gain - but this is, of course, already included in the numbers I posted to this thread as I've been using the CRC offload all along.)
>
> I re-did all the tests with TSO/GSO/LRO/GRO disabled (on both sides of the association - i.e. on both endpoint nodes), and using 1452-byte messages instead of 1000-byte messages. With this new setup, the TCP performance drops significantly, as expected, while the SCTP performance is boosted, and the playing field is somewhat more 'level'. (Note that I could not use 1464-byte messages as suggested by Vlad, as anything above 1452 cut the SCTP performance in half - must have hit the segmentation limit at this slightly lower message size. MTU is 1500.)
>
> So comparing "apples to apples" now, TCP only out-performs SCTP by approximately 40-70% over the various range of network latencies I tested with (RTTs of 0.2 ms, 10 ms, 20 ms, and 50 ms). 40-70% is still significant, but nowhere near the 200% better (i.e. 3 times the throughput) I was getting before.
>
> Does this value (i.e. 40-70%) sound reasonable? Is this the more-or-less accepted performance difference with the current LKSCTP implementation?
>
> Also, for what it's worth, I get better SCTP throughput numbers with the older kernel (3.4.2) than with the newer kernel (3.14)...
>
Hi Peter
Could you run an experiment setting max_burst to 0?
The way the SCTP spec is written, upon every SACK, if the
stack has new data to send, it will only sent max_burst*mtu
bytes. This may not always fill the current congestion window
and might preclude growth.
Setting max_burst to 0 will disable burst limitation. I am
curious to see if this would impact throughput on a low-rtt
link like yours.
-vlad
>
>
> -----Original Message-----
> From: Daniel Borkmann [mailto:dborkman@redhat.com]
> Sent: April-11-14 3:43 AM
> To: Vlad Yasevich
> Cc: Butler, Peter; linux-sctp@vger.kernel.org
> Subject: Re: Is SCTP throughput really this low compared to TCP?
>
> Hi Peter,
>
> On 04/10/2014 10:21 PM, Vlad Yasevich wrote:
>> On 04/10/2014 03:12 PM, Butler, Peter wrote:
>>> I've been testing SCTP throughput between two nodes over a 10Gb-Ethernet backplane, and am finding that at best, its throughput is about a third of that of TCP. Is this number generally accepted for current LKSCTP performance?
>>>
>>> All TCP/SCTP tests performed with 1000-byte (payload) messages, between 8-core Xeon nodes @ 2.13GHz, with no CPU throttling (always running at 100%) on otherwise idle systems. Test applications include netperf, iperf and proprietary in-house stubs.
>>>
>>> The latency between nodes is generally 0.2 ms. Tests were run using this low-latency scenario, as well as using traffic control (tc) to simulate networks with 10 ms, 20 ms and 50 ms latency (i.e. 20 ms, 40 ms and 100 ms RTT, respectively).
>>>
>>> In addition, each of these network scenarios were tested using various kernel socket buffer sizes, ranging from the default kernel size (100-200 kB), to several MB for send and receive buffers, and multiple send:receive ratios for these buffer sizes (generally using larger receive buffer sizes, up to a factor of about 6).
>>>
>>> Finally, tests were performed on kernels as old as 3.4.2 and as recent as 3.14.
>>>
>>> The TCP throughput is about 3x higher than that of SCTP as a best-case scenario (i.e. from an SCTP perspective), and much higher still in worst-case scenarios.
>>
>> To do a more of apples-to-apples comparison, you need to disable
>> tso/gso on the sending node.
>>
>> The reason is that even if you limit buffer sizes, tcp will still try
>> to do tso on the transmit size, thus coalescing you 1000-byte messages
>> into something much larger, thus utilizing your MTU much more efficiently.
>>
>> SCTP, on the other hand, has to preserve message boundaries which
>> results in sub-optimal mtu utilization when using 1000-byte payloads.
>>
>> My recommendation is to use 1464 byte message for SCTP on a 1500 byte
>> MTU nic.
>>
>> I would be interested to see the results. There could very well be issues.
>
> Agreed.
>
> Also, what NIC are you using? It seems only Intel provides SCTP checksum offloading so far, i.e. ixgbe/i40e NICs.
>
>> -vlad
^ permalink raw reply [flat|nested] 37+ messages in thread
* RE: Is SCTP throughput really this low compared to TCP?
2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
` (8 preceding siblings ...)
2014-04-11 18:19 ` Vlad Yasevich
@ 2014-04-11 18:22 ` Butler, Peter
2014-04-11 18:40 ` Daniel Borkmann
` (25 subsequent siblings)
35 siblings, 0 replies; 37+ messages in thread
From: Butler, Peter @ 2014-04-11 18:22 UTC (permalink / raw)
To: linux-sctp
The difference between 3.14 and 3.4.2 is staggering. An order of magnitude or so. For example, using the precisely same setup as before, whereas I get about 2.1 Gbps throughput with 3.4 2, I can only manage between 70-150 Mbps with 3.14 - a staggering difference.
Moreover, the SCTP throughput seems to 'choke' itself with 3.14, such that it is always trying to recover. For example, with 3.4.2 the 2.1 Gbps throughput is quite consistent from one second to the next (as you would expect):
[root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t 60
iperf version 3.0.1 (10 January 2014)
Linux Lab200slot2 3.4.2-1.fc16.x86_64 #1 SMP Thu Jun 14 20:17:26 UTC 2012 x86_64
Time: Fri, 11 Apr 2014 18:19:15 GMT
Connecting to host 192.168.241.3, port 5201
Cookie: Lab200slot2.1397240355.069035.0d5b0f
[ 4] local 192.168.241.2 port 56030 connected to 192.168.241.3 port 5201
Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-1.00 sec 255 MBytes 2.14 Gbits/sec
[ 4] 1.00-2.00 sec 253 MBytes 2.12 Gbits/sec
[ 4] 2.00-3.00 sec 255 MBytes 2.14 Gbits/sec
[ 4] 3.00-4.00 sec 255 MBytes 2.14 Gbits/sec
[ 4] 4.00-5.00 sec 255 MBytes 2.14 Gbits/sec
[ 4] 5.00-6.00 sec 257 MBytes 2.15 Gbits/sec
[ 4] 6.00-7.00 sec 253 MBytes 2.13 Gbits/sec
[ 4] 7.00-8.00 sec 254 MBytes 2.13 Gbits/sec
[ 4] 8.00-9.00 sec 255 MBytes 2.14 Gbits/sec
[ 4] 9.00-10.00 sec 252 MBytes 2.12 Gbits/sec
(etc)
but with 3.14 the numbers as all over the place:
[root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t 60
iperf version 3.0.1 (10 January 2014)
Linux Lab200slot2 3.14.0 #1 SMP Thu Apr 3 23:18:29 EDT 2014 x86_64
Time: Fri, 11 Apr 2014 17:56:21 GMT
Connecting to host 192.168.241.3, port 5201
Cookie: Lab200slot2.1397238981.812898.548918
[ 4] local 192.168.241.2 port 38616 connected to 192.168.241.3 port 5201
Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-1.09 sec 20.8 MBytes 161 Mbits/sec
[ 4] 1.09-2.13 sec 10.8 MBytes 86.8 Mbits/sec
[ 4] 2.13-3.15 sec 3.57 MBytes 29.5 Mbits/sec
[ 4] 3.15-4.16 sec 4.33 MBytes 35.7 Mbits/sec
[ 4] 4.16-6.21 sec 10.4 MBytes 42.7 Mbits/sec
[ 4] 6.21-6.21 sec 0.00 Bytes 0.00 bits/sec
[ 4] 6.21-7.35 sec 34.6 MBytes 253 Mbits/sec
[ 4] 7.35-11.45 sec 22.0 MBytes 45.0 Mbits/sec
[ 4] 11.45-11.45 sec 0.00 Bytes 0.00 bits/sec
[ 4] 11.45-11.45 sec 0.00 Bytes 0.00 bits/sec
[ 4] 11.45-11.45 sec 0.00 Bytes 0.00 bits/sec
[ 4] 11.45-12.51 sec 16.0 MBytes 126 Mbits/sec
[ 4] 12.51-13.59 sec 20.3 MBytes 158 Mbits/sec
[ 4] 13.59-14.65 sec 13.4 MBytes 107 Mbits/sec
[ 4] 14.65-16.79 sec 33.3 MBytes 130 Mbits/sec
[ 4] 16.79-16.79 sec 0.00 Bytes 0.00 bits/sec
[ 4] 16.79-17.82 sec 5.94 MBytes 48.7 Mbits/sec
(etc)
Note: the difference appears to be SCTP-specific, as I get exactly the same TCP throughput in both kernels.
-----Original Message-----
From: Daniel Borkmann [mailto:dborkman@redhat.com]
Sent: April-11-14 11:22 AM
To: Butler, Peter
Cc: Vlad Yasevich; linux-sctp@vger.kernel.org
Subject: Re: Is SCTP throughput really this low compared to TCP?
On 04/11/2014 05:07 PM, Butler, Peter wrote:
> Yes indeed this is ixgbe and I do have SCTP checksum offloading
> enabled. (For what
> it's worth, the checksum offload gives about a 20% throughput gain - but this is, > of course, already included in the numbers I posted to this thread as I've been using > the CRC offload all along.)
Ok, understood.
> I re-did all the tests with TSO/GSO/LRO/GRO disabled (on both sides of
> the association -
> i.e. on both endpoint nodes), and using 1452-byte messages instead of 1000-byte > messages. With this new setup, the TCP performance drops significantly, as expected, > while the SCTP performance is boosted, and the playing field is somewhat more 'level'.
> (Note that I could not use 1464-byte messages as suggested by Vlad, as anything above > 1452 cut the SCTP performance in half - must have hit the segmentation limit at this > slightly lower message size. MTU is 1500.)
>
> So comparing "apples to apples" now, TCP only out-performs SCTP by
> approximately 40-70%
> over the various range of network latencies I tested with (RTTs of 0.2 ms, 10 ms, 20 ms, > and 50 ms). 40-70% is still significant, but nowhere near the 200% better (i.e. 3 > times the throughput) I was getting before.
>
> Does this value (i.e. 40-70%) sound reasonable? Is this the
> more-or-less accepted
> performance difference with the current LKSCTP implementation?
Yes, that sounds reasonable to me. There are still a lot of open todos in terms of performance that we need to tackle over time, e.g. the way chunks are handled, imho, and copies involved in fast path, also that we're heavily using atomic reference counting, and other open issues.
> Also, for what it's worth, I get better SCTP throughput numbers with
> the older kernel
> (3.4.2) than with the newer kernel (3.14)...
Interesting, a lot of things happened in between; were you able to bisect/identify a possible commit that causes this? How big is the difference?
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Is SCTP throughput really this low compared to TCP?
2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
` (9 preceding siblings ...)
2014-04-11 18:22 ` Butler, Peter
@ 2014-04-11 18:40 ` Daniel Borkmann
2014-04-11 18:41 ` Daniel Borkmann
` (24 subsequent siblings)
35 siblings, 0 replies; 37+ messages in thread
From: Daniel Borkmann @ 2014-04-11 18:40 UTC (permalink / raw)
To: linux-sctp
On 04/11/2014 08:22 PM, Butler, Peter wrote:
> The difference between 3.14 and 3.4.2 is staggering. An order of magnitude or so. For example,
> using the precisely same setup as before, whereas I get about 2.1 Gbps throughput with 3.4 2, I
> can only manage between 70-150 Mbps with 3.14 - a staggering difference.
>
> Moreover, the SCTP throughput seems to 'choke' itself with 3.14, such that it is always trying to
> recover. For example, with 3.4.2 the 2.1 Gbps throughput is quite consistent from one second to
> the next (as you would expect):
>
> [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t 60
> iperf version 3.0.1 (10 January 2014)
> Linux Lab200slot2 3.4.2-1.fc16.x86_64 #1 SMP Thu Jun 14 20:17:26 UTC 2012 x86_64
> Time: Fri, 11 Apr 2014 18:19:15 GMT
> Connecting to host 192.168.241.3, port 5201
> Cookie: Lab200slot2.1397240355.069035.0d5b0f
> [ 4] local 192.168.241.2 port 56030 connected to 192.168.241.3 port 5201
> Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
> [ ID] Interval Transfer Bandwidth
> [ 4] 0.00-1.00 sec 255 MBytes 2.14 Gbits/sec
> [ 4] 1.00-2.00 sec 253 MBytes 2.12 Gbits/sec
> [ 4] 2.00-3.00 sec 255 MBytes 2.14 Gbits/sec
> [ 4] 3.00-4.00 sec 255 MBytes 2.14 Gbits/sec
> [ 4] 4.00-5.00 sec 255 MBytes 2.14 Gbits/sec
> [ 4] 5.00-6.00 sec 257 MBytes 2.15 Gbits/sec
> [ 4] 6.00-7.00 sec 253 MBytes 2.13 Gbits/sec
> [ 4] 7.00-8.00 sec 254 MBytes 2.13 Gbits/sec
> [ 4] 8.00-9.00 sec 255 MBytes 2.14 Gbits/sec
> [ 4] 9.00-10.00 sec 252 MBytes 2.12 Gbits/sec
> (etc)
>
> but with 3.14 the numbers as all over the place:
>
> [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t 60
> iperf version 3.0.1 (10 January 2014)
> Linux Lab200slot2 3.14.0 #1 SMP Thu Apr 3 23:18:29 EDT 2014 x86_64
> Time: Fri, 11 Apr 2014 17:56:21 GMT
> Connecting to host 192.168.241.3, port 5201
> Cookie: Lab200slot2.1397238981.812898.548918
> [ 4] local 192.168.241.2 port 38616 connected to 192.168.241.3 port 5201
> Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
> [ ID] Interval Transfer Bandwidth
> [ 4] 0.00-1.09 sec 20.8 MBytes 161 Mbits/sec
> [ 4] 1.09-2.13 sec 10.8 MBytes 86.8 Mbits/sec
> [ 4] 2.13-3.15 sec 3.57 MBytes 29.5 Mbits/sec
> [ 4] 3.15-4.16 sec 4.33 MBytes 35.7 Mbits/sec
> [ 4] 4.16-6.21 sec 10.4 MBytes 42.7 Mbits/sec
> [ 4] 6.21-6.21 sec 0.00 Bytes 0.00 bits/sec
> [ 4] 6.21-7.35 sec 34.6 MBytes 253 Mbits/sec
> [ 4] 7.35-11.45 sec 22.0 MBytes 45.0 Mbits/sec
> [ 4] 11.45-11.45 sec 0.00 Bytes 0.00 bits/sec
> [ 4] 11.45-11.45 sec 0.00 Bytes 0.00 bits/sec
> [ 4] 11.45-11.45 sec 0.00 Bytes 0.00 bits/sec
> [ 4] 11.45-12.51 sec 16.0 MBytes 126 Mbits/sec
> [ 4] 12.51-13.59 sec 20.3 MBytes 158 Mbits/sec
> [ 4] 13.59-14.65 sec 13.4 MBytes 107 Mbits/sec
> [ 4] 14.65-16.79 sec 33.3 MBytes 130 Mbits/sec
> [ 4] 16.79-16.79 sec 0.00 Bytes 0.00 bits/sec
> [ 4] 16.79-17.82 sec 5.94 MBytes 48.7 Mbits/sec
> (etc)
>
> Note: the difference appears to be SCTP-specific, as I get exactly the same TCP
> throughput in both kernels.
Hmm, okay. :/ Could you further bisect on your side to narrow down from which
kernel onwards this behaviour can be seen?
Thanks,
Daniel
> -----Original Message-----
> From: Daniel Borkmann [mailto:dborkman@redhat.com]
> Sent: April-11-14 11:22 AM
> To: Butler, Peter
> Cc: Vlad Yasevich; linux-sctp@vger.kernel.org
> Subject: Re: Is SCTP throughput really this low compared to TCP?
>
> On 04/11/2014 05:07 PM, Butler, Peter wrote:
>> Yes indeed this is ixgbe and I do have SCTP checksum offloading
>> enabled. (For what
> > it's worth, the checksum offload gives about a 20% throughput gain - but this is, > of course, already included in the numbers I posted to this thread as I've been using > the CRC offload all along.)
>
> Ok, understood.
>
>> I re-did all the tests with TSO/GSO/LRO/GRO disabled (on both sides of
>> the association -
> > i.e. on both endpoint nodes), and using 1452-byte messages instead of 1000-byte > messages. With this new setup, the TCP performance drops significantly, as expected, > while the SCTP performance is boosted, and the playing field is somewhat more 'level'.
> > (Note that I could not use 1464-byte messages as suggested by Vlad, as anything above > 1452 cut the SCTP performance in half - must have hit the segmentation limit at this > slightly lower message size. MTU is 1500.)
>>
>> So comparing "apples to apples" now, TCP only out-performs SCTP by
>> approximately 40-70%
> > over the various range of network latencies I tested with (RTTs of 0.2 ms, 10 ms, 20 ms, > and 50 ms). 40-70% is still significant, but nowhere near the 200% better (i.e. 3 > times the throughput) I was getting before.
>>
>> Does this value (i.e. 40-70%) sound reasonable? Is this the
>> more-or-less accepted
> > performance difference with the current LKSCTP implementation?
>
> Yes, that sounds reasonable to me. There are still a lot of open todos in terms of performance that we need to tackle over time, e.g. the way chunks are handled, imho, and copies involved in fast path, also that we're heavily using atomic reference counting, and other open issues.
>
>> Also, for what it's worth, I get better SCTP throughput numbers with
>> the older kernel
> > (3.4.2) than with the newer kernel (3.14)...
>
> Interesting, a lot of things happened in between; were you able to bisect/identify a possible commit that causes this? How big is the difference?
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Is SCTP throughput really this low compared to TCP?
2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
` (10 preceding siblings ...)
2014-04-11 18:40 ` Daniel Borkmann
@ 2014-04-11 18:41 ` Daniel Borkmann
2014-04-11 18:58 ` Butler, Peter
` (23 subsequent siblings)
35 siblings, 0 replies; 37+ messages in thread
From: Daniel Borkmann @ 2014-04-11 18:41 UTC (permalink / raw)
To: linux-sctp
On 04/11/2014 08:40 PM, Daniel Borkmann wrote:
> On 04/11/2014 08:22 PM, Butler, Peter wrote:
>> The difference between 3.14 and 3.4.2 is staggering. An order of magnitude or so. For example,
> > using the precisely same setup as before, whereas I get about 2.1 Gbps throughput with 3.4 2, I
> > can only manage between 70-150 Mbps with 3.14 - a staggering difference.
>>
>> Moreover, the SCTP throughput seems to 'choke' itself with 3.14, such that it is always trying to
> > recover. For example, with 3.4.2 the 2.1 Gbps throughput is quite consistent from one second to
> > the next (as you would expect):
>>
>> [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t 60
>> iperf version 3.0.1 (10 January 2014)
>> Linux Lab200slot2 3.4.2-1.fc16.x86_64 #1 SMP Thu Jun 14 20:17:26 UTC 2012 x86_64
>> Time: Fri, 11 Apr 2014 18:19:15 GMT
>> Connecting to host 192.168.241.3, port 5201
>> Cookie: Lab200slot2.1397240355.069035.0d5b0f
>> [ 4] local 192.168.241.2 port 56030 connected to 192.168.241.3 port 5201
>> Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
>> [ ID] Interval Transfer Bandwidth
>> [ 4] 0.00-1.00 sec 255 MBytes 2.14 Gbits/sec
>> [ 4] 1.00-2.00 sec 253 MBytes 2.12 Gbits/sec
>> [ 4] 2.00-3.00 sec 255 MBytes 2.14 Gbits/sec
>> [ 4] 3.00-4.00 sec 255 MBytes 2.14 Gbits/sec
>> [ 4] 4.00-5.00 sec 255 MBytes 2.14 Gbits/sec
>> [ 4] 5.00-6.00 sec 257 MBytes 2.15 Gbits/sec
>> [ 4] 6.00-7.00 sec 253 MBytes 2.13 Gbits/sec
>> [ 4] 7.00-8.00 sec 254 MBytes 2.13 Gbits/sec
>> [ 4] 8.00-9.00 sec 255 MBytes 2.14 Gbits/sec
>> [ 4] 9.00-10.00 sec 252 MBytes 2.12 Gbits/sec
>> (etc)
>>
>> but with 3.14 the numbers as all over the place:
>>
>> [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t 60
>> iperf version 3.0.1 (10 January 2014)
>> Linux Lab200slot2 3.14.0 #1 SMP Thu Apr 3 23:18:29 EDT 2014 x86_64
>> Time: Fri, 11 Apr 2014 17:56:21 GMT
>> Connecting to host 192.168.241.3, port 5201
>> Cookie: Lab200slot2.1397238981.812898.548918
>> [ 4] local 192.168.241.2 port 38616 connected to 192.168.241.3 port 5201
>> Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
>> [ ID] Interval Transfer Bandwidth
>> [ 4] 0.00-1.09 sec 20.8 MBytes 161 Mbits/sec
>> [ 4] 1.09-2.13 sec 10.8 MBytes 86.8 Mbits/sec
>> [ 4] 2.13-3.15 sec 3.57 MBytes 29.5 Mbits/sec
>> [ 4] 3.15-4.16 sec 4.33 MBytes 35.7 Mbits/sec
>> [ 4] 4.16-6.21 sec 10.4 MBytes 42.7 Mbits/sec
>> [ 4] 6.21-6.21 sec 0.00 Bytes 0.00 bits/sec
>> [ 4] 6.21-7.35 sec 34.6 MBytes 253 Mbits/sec
>> [ 4] 7.35-11.45 sec 22.0 MBytes 45.0 Mbits/sec
>> [ 4] 11.45-11.45 sec 0.00 Bytes 0.00 bits/sec
>> [ 4] 11.45-11.45 sec 0.00 Bytes 0.00 bits/sec
>> [ 4] 11.45-11.45 sec 0.00 Bytes 0.00 bits/sec
>> [ 4] 11.45-12.51 sec 16.0 MBytes 126 Mbits/sec
>> [ 4] 12.51-13.59 sec 20.3 MBytes 158 Mbits/sec
>> [ 4] 13.59-14.65 sec 13.4 MBytes 107 Mbits/sec
>> [ 4] 14.65-16.79 sec 33.3 MBytes 130 Mbits/sec
>> [ 4] 16.79-16.79 sec 0.00 Bytes 0.00 bits/sec
>> [ 4] 16.79-17.82 sec 5.94 MBytes 48.7 Mbits/sec
>> (etc)
>>
>> Note: the difference appears to be SCTP-specific, as I get exactly the same TCP
> > throughput in both kernels.
>
> Hmm, okay. :/ Could you further bisect on your side to narrow down from which
> kernel onwards this behaviour can be seen?
Is that behaviour consistent between IPv4 and IPv6?
^ permalink raw reply [flat|nested] 37+ messages in thread
* RE: Is SCTP throughput really this low compared to TCP?
2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
` (11 preceding siblings ...)
2014-04-11 18:41 ` Daniel Borkmann
@ 2014-04-11 18:58 ` Butler, Peter
2014-04-11 19:16 ` Butler, Peter
` (22 subsequent siblings)
35 siblings, 0 replies; 37+ messages in thread
From: Butler, Peter @ 2014-04-11 18:58 UTC (permalink / raw)
To: linux-sctp
I have the perf data (operf/opreport) and am trying to send it out - but our email server is rejecting it with: "Reason: content policy violation". I've contacted the IT help desk and will get it out to this mailing list ASAP
-----Original Message-----
From: Vlad Yasevich [mailto:vyasevich@gmail.com]
Sent: April-11-14 11:28 AM
To: Butler, Peter; Daniel Borkmann
Cc: linux-sctp@vger.kernel.org
Subject: Re: Is SCTP throughput really this low compared to TCP?
On 04/11/2014 11:07 AM, Butler, Peter wrote:
> Yes indeed this is ixgbe and I do have SCTP checksum offloading
> enabled. (For what it's worth, the checksum offload gives about a 20%
> throughput gain - but this is, of course, already included in the
> numbers I posted to this thread as I've been using the CRC offload all
> along.)
>
> I re-did all the tests with TSO/GSO/LRO/GRO disabled (on both sides of the association - i.e. on both endpoint nodes), and using 1452-byte messages instead of 1000-byte messages. With this new setup, the TCP performance drops significantly, as expected, while the SCTP performance is boosted, and the playing field is somewhat more 'level'. (Note that I could not use 1464-byte messages as suggested by Vlad, as anything above 1452 cut the SCTP performance in half - must have hit the segmentation limit at this slightly lower message size. MTU is 1500.)
>
> So comparing "apples to apples" now, TCP only out-performs SCTP by approximately 40-70% over the various range of network latencies I tested with (RTTs of 0.2 ms, 10 ms, 20 ms, and 50 ms). 40-70% is still significant, but nowhere near the 200% better (i.e. 3 times the throughput) I was getting before.
>
> Does this value (i.e. 40-70%) sound reasonable?
This still looks high. Could you run 'perf record -a' and 'perf report'
to see where we are spending all of our time in sctp.
My guess is that a lot of it is going to be in memcpy(), but I am curious.
> Is this the more-or-less accepted performance difference with the current LKSCTP implementation?
>
> Also, for what it's worth, I get better SCTP throughput numbers with the older kernel (3.4.2) than with the newer kernel (3.14)...
>
That's interesting. I'll have to look at see what might have changed here.
-vlad
>
>
> -----Original Message-----
> From: Daniel Borkmann [mailto:dborkman@redhat.com]
> Sent: April-11-14 3:43 AM
> To: Vlad Yasevich
> Cc: Butler, Peter; linux-sctp@vger.kernel.org
> Subject: Re: Is SCTP throughput really this low compared to TCP?
>
> Hi Peter,
>
> On 04/10/2014 10:21 PM, Vlad Yasevich wrote:
>> On 04/10/2014 03:12 PM, Butler, Peter wrote:
>>> I've been testing SCTP throughput between two nodes over a 10Gb-Ethernet backplane, and am finding that at best, its throughput is about a third of that of TCP. Is this number generally accepted for current LKSCTP performance?
>>>
>>> All TCP/SCTP tests performed with 1000-byte (payload) messages, between 8-core Xeon nodes @ 2.13GHz, with no CPU throttling (always running at 100%) on otherwise idle systems. Test applications include netperf, iperf and proprietary in-house stubs.
>>>
>>> The latency between nodes is generally 0.2 ms. Tests were run using this low-latency scenario, as well as using traffic control (tc) to simulate networks with 10 ms, 20 ms and 50 ms latency (i.e. 20 ms, 40 ms and 100 ms RTT, respectively).
>>>
>>> In addition, each of these network scenarios were tested using various kernel socket buffer sizes, ranging from the default kernel size (100-200 kB), to several MB for send and receive buffers, and multiple send:receive ratios for these buffer sizes (generally using larger receive buffer sizes, up to a factor of about 6).
>>>
>>> Finally, tests were performed on kernels as old as 3.4.2 and as recent as 3.14.
>>>
>>> The TCP throughput is about 3x higher than that of SCTP as a best-case scenario (i.e. from an SCTP perspective), and much higher still in worst-case scenarios.
>>
>> To do a more of apples-to-apples comparison, you need to disable
>> tso/gso on the sending node.
>>
>> The reason is that even if you limit buffer sizes, tcp will still try
>> to do tso on the transmit size, thus coalescing you 1000-byte
>> messages into something much larger, thus utilizing your MTU much more efficiently.
>>
>> SCTP, on the other hand, has to preserve message boundaries which
>> results in sub-optimal mtu utilization when using 1000-byte payloads.
>>
>> My recommendation is to use 1464 byte message for SCTP on a 1500 byte
>> MTU nic.
>>
>> I would be interested to see the results. There could very well be issues.
>
> Agreed.
>
> Also, what NIC are you using? It seems only Intel provides SCTP checksum offloading so far, i.e. ixgbe/i40e NICs.
>
>> -vlad
^ permalink raw reply [flat|nested] 37+ messages in thread
* RE: Is SCTP throughput really this low compared to TCP?
2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
` (12 preceding siblings ...)
2014-04-11 18:58 ` Butler, Peter
@ 2014-04-11 19:16 ` Butler, Peter
2014-04-11 19:20 ` Vlad Yasevich
` (21 subsequent siblings)
35 siblings, 0 replies; 37+ messages in thread
From: Butler, Peter @ 2014-04-11 19:16 UTC (permalink / raw)
To: linux-sctp
What an excellent question. I just tried that now and you are definitely on to something. Whereas IPv4 is erratic on 3.14 (for SCTP), IPv6 is fairly smooth (see results below).
However, note that as previously mentioned I still get better throughput numbers with 3.4.2. For the no-latency (0.2 ms) test, the 3.4.2 kernel yields about 2.1 Gbps, whereas the 3.14 kernel yields only about 1.6 Gbps.
These results show that the erratic behaviour seen with kernel 3.14 appears to be confined to IPv4 SCTP only:
IPv6:
[root@Lab200slot2 ~]# iperf3 --sctp -6 -c 2001:db8:0:f101::1 -V -l 1400 -t 60
iperf version 3.0.1 (10 January 2014)
Linux Lab200slot2 3.14.0 #1 SMP Thu Apr 3 23:18:29 EDT 2014 x86_64
Time: Fri, 11 Apr 2014 19:08:41 GMT
Connecting to host 2001:db8:0:f101::1, port 5201
Cookie: Lab200slot2.1397243321.714295.2b3f7c
[ 4] local 2001:db8:0:f101::2 port 55804 connected to 2001:db8:0:f101::1 port 5201
Starting Test: protocol: SCTP, 1 streams, 1400 byte blocks, omitting 0 seconds, 60 second test
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-1.00 sec 169 MBytes 1.42 Gbits/sec
[ 4] 1.00-2.00 sec 201 MBytes 1.69 Gbits/sec
[ 4] 2.00-3.00 sec 188 MBytes 1.58 Gbits/sec
[ 4] 3.00-4.00 sec 174 MBytes 1.46 Gbits/sec
[ 4] 4.00-5.00 sec 165 MBytes 1.39 Gbits/sec
[ 4] 5.00-6.00 sec 199 MBytes 1.67 Gbits/sec
[ 4] 6.00-7.00 sec 163 MBytes 1.36 Gbits/sec
[ 4] 7.00-8.00 sec 174 MBytes 1.46 Gbits/sec
[ 4] 8.00-9.00 sec 193 MBytes 1.62 Gbits/sec
[ 4] 9.00-10.00 sec 196 MBytes 1.65 Gbits/sec
[ 4] 10.00-11.00 sec 157 MBytes 1.31 Gbits/sec
[ 4] 11.00-12.00 sec 175 MBytes 1.47 Gbits/sec
[ 4] 12.00-13.00 sec 192 MBytes 1.61 Gbits/sec
[ 4] 13.00-14.00 sec 199 MBytes 1.67 Gbits/sec
(etc)
IPv4:
[root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.240.3 -V -l 1400 -t 60
iperf version 3.0.1 (10 January 2014)
Linux Lab200slot2 3.14.0 #1 SMP Thu Apr 3 23:18:29 EDT 2014 x86_64
Time: Fri, 11 Apr 2014 19:09:28 GMT
Connecting to host 192.168.240.3, port 5201
Cookie: Lab200slot2.1397243368.815040.7ecb3d
[ 4] local 192.168.240.2 port 36273 connected to 192.168.240.3 port 5201
Starting Test: protocol: SCTP, 1 streams, 1400 byte blocks, omitting 0 seconds, 60 second test
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-1.00 sec 221 MBytes 1.85 Gbits/sec
[ 4] 1.00-2.42 sec 91.3 MBytes 541 Mbits/sec
[ 4] 2.42-3.00 sec 127 MBytes 1.83 Gbits/sec
[ 4] 3.00-4.00 sec 216 MBytes 1.81 Gbits/sec
[ 4] 4.00-5.51 sec 111 MBytes 617 Mbits/sec
[ 4] 5.51-6.75 sec 54.0 MBytes 365 Mbits/sec
[ 4] 6.75-7.00 sec 57.4 MBytes 1.89 Gbits/sec
[ 4] 7.00-9.55 sec 121 MBytes 399 Mbits/sec
[ 4] 9.55-9.56 sec 0.00 Bytes 0.00 bits/sec
[ 4] 9.56-10.00 sec 99.7 MBytes 1.88 Gbits/sec
[ 4] 10.00-11.00 sec 220 MBytes 1.85 Gbits/sec
[ 4] 11.00-12.34 sec 74.3 MBytes 466 Mbits/sec
(etc)
-----Original Message-----
From: Daniel Borkmann [mailto:dborkman@redhat.com]
Sent: April-11-14 2:42 PM
To: Butler, Peter
Cc: Vlad Yasevich; linux-sctp@vger.kernel.org
Subject: Re: Is SCTP throughput really this low compared to TCP?
On 04/11/2014 08:40 PM, Daniel Borkmann wrote:
> On 04/11/2014 08:22 PM, Butler, Peter wrote:
>> The difference between 3.14 and 3.4.2 is staggering. An order of
>> magnitude or so. For example,
> > using the precisely same setup as before, whereas I get about 2.1
> Gbps throughput with 3.4 2, I > can only manage between 70-150 Mbps with 3.14 - a staggering difference.
>>
>> Moreover, the SCTP throughput seems to 'choke' itself with 3.14, such
>> that it is always trying to
> > recover. For example, with 3.4.2 the 2.1 Gbps throughput is quite
> consistent from one second to > the next (as you would expect):
>>
>> [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t
>> 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2
>> 3.4.2-1.fc16.x86_64 #1 SMP Thu Jun 14 20:17:26 UTC 2012 x86_64
>> Time: Fri, 11 Apr 2014 18:19:15 GMT
>> Connecting to host 192.168.241.3, port 5201
>> Cookie: Lab200slot2.1397240355.069035.0d5b0f
>> [ 4] local 192.168.241.2 port 56030 connected to 192.168.241.3 port
>> 5201 Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
>> [ ID] Interval Transfer Bandwidth
>> [ 4] 0.00-1.00 sec 255 MBytes 2.14 Gbits/sec
>> [ 4] 1.00-2.00 sec 253 MBytes 2.12 Gbits/sec
>> [ 4] 2.00-3.00 sec 255 MBytes 2.14 Gbits/sec
>> [ 4] 3.00-4.00 sec 255 MBytes 2.14 Gbits/sec
>> [ 4] 4.00-5.00 sec 255 MBytes 2.14 Gbits/sec
>> [ 4] 5.00-6.00 sec 257 MBytes 2.15 Gbits/sec
>> [ 4] 6.00-7.00 sec 253 MBytes 2.13 Gbits/sec
>> [ 4] 7.00-8.00 sec 254 MBytes 2.13 Gbits/sec
>> [ 4] 8.00-9.00 sec 255 MBytes 2.14 Gbits/sec
>> [ 4] 9.00-10.00 sec 252 MBytes 2.12 Gbits/sec
>> (etc)
>>
>> but with 3.14 the numbers as all over the place:
>>
>> [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t
>> 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2 3.14.0 #1
>> SMP Thu Apr 3 23:18:29 EDT 2014 x86_64
>> Time: Fri, 11 Apr 2014 17:56:21 GMT
>> Connecting to host 192.168.241.3, port 5201
>> Cookie: Lab200slot2.1397238981.812898.548918
>> [ 4] local 192.168.241.2 port 38616 connected to 192.168.241.3 port
>> 5201 Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
>> [ ID] Interval Transfer Bandwidth
>> [ 4] 0.00-1.09 sec 20.8 MBytes 161 Mbits/sec
>> [ 4] 1.09-2.13 sec 10.8 MBytes 86.8 Mbits/sec
>> [ 4] 2.13-3.15 sec 3.57 MBytes 29.5 Mbits/sec
>> [ 4] 3.15-4.16 sec 4.33 MBytes 35.7 Mbits/sec
>> [ 4] 4.16-6.21 sec 10.4 MBytes 42.7 Mbits/sec
>> [ 4] 6.21-6.21 sec 0.00 Bytes 0.00 bits/sec
>> [ 4] 6.21-7.35 sec 34.6 MBytes 253 Mbits/sec
>> [ 4] 7.35-11.45 sec 22.0 MBytes 45.0 Mbits/sec
>> [ 4] 11.45-11.45 sec 0.00 Bytes 0.00 bits/sec [ 4] 11.45-11.45
>> sec 0.00 Bytes 0.00 bits/sec [ 4] 11.45-11.45 sec 0.00 Bytes
>> 0.00 bits/sec
>> [ 4] 11.45-12.51 sec 16.0 MBytes 126 Mbits/sec
>> [ 4] 12.51-13.59 sec 20.3 MBytes 158 Mbits/sec
>> [ 4] 13.59-14.65 sec 13.4 MBytes 107 Mbits/sec
>> [ 4] 14.65-16.79 sec 33.3 MBytes 130 Mbits/sec
>> [ 4] 16.79-16.79 sec 0.00 Bytes 0.00 bits/sec [ 4] 16.79-17.82
>> sec 5.94 MBytes 48.7 Mbits/sec
>> (etc)
>>
>> Note: the difference appears to be SCTP-specific, as I get exactly
>> the same TCP
> > throughput in both kernels.
>
> Hmm, okay. :/ Could you further bisect on your side to narrow down
> from which kernel onwards this behaviour can be seen?
Is that behaviour consistent between IPv4 and IPv6?
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Is SCTP throughput really this low compared to TCP?
2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
` (13 preceding siblings ...)
2014-04-11 19:16 ` Butler, Peter
@ 2014-04-11 19:20 ` Vlad Yasevich
2014-04-11 19:24 ` Butler, Peter
` (20 subsequent siblings)
35 siblings, 0 replies; 37+ messages in thread
From: Vlad Yasevich @ 2014-04-11 19:20 UTC (permalink / raw)
To: linux-sctp
On 04/11/2014 02:22 PM, Butler, Peter wrote:
> The difference between 3.14 and 3.4.2 is staggering. An order of
magnitude or so. For example, using the precisely same setup as before,
whereas I get about 2.1 Gbps throughput with 3.4 2, I can only manage
between 70-150 Mbps with 3.14 - a staggering difference.
>
> Moreover, the SCTP throughput seems to 'choke' itself with 3.14, such
that it is always trying to recover. For example, with 3.4.2 the 2.1
Gbps throughput is quite consistent from one second to the next (as you
would expect):
>
> but with 3.14 the numbers as all over the place:
>
> [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t 60
> iperf version 3.0.1 (10 January 2014)
> Linux Lab200slot2 3.14.0 #1 SMP Thu Apr 3 23:18:29 EDT 2014 x86_64
> Time: Fri, 11 Apr 2014 17:56:21 GMT
> Connecting to host 192.168.241.3, port 5201
> Cookie: Lab200slot2.1397238981.812898.548918
> [ 4] local 192.168.241.2 port 38616 connected to 192.168.241.3 port 5201
> Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0
seconds, 60 second test
> [ ID] Interval Transfer Bandwidth
> [ 4] 0.00-1.09 sec 20.8 MBytes 161 Mbits/sec
> [ 4] 1.09-2.13 sec 10.8 MBytes 86.8 Mbits/sec
> [ 4] 2.13-3.15 sec 3.57 MBytes 29.5 Mbits/sec
> [ 4] 3.15-4.16 sec 4.33 MBytes 35.7 Mbits/sec
> [ 4] 4.16-6.21 sec 10.4 MBytes 42.7 Mbits/sec
> [ 4] 6.21-6.21 sec 0.00 Bytes 0.00 bits/sec
> [ 4] 6.21-7.35 sec 34.6 MBytes 253 Mbits/sec
> [ 4] 7.35-11.45 sec 22.0 MBytes 45.0 Mbits/sec
> [ 4] 11.45-11.45 sec 0.00 Bytes 0.00 bits/sec
> [ 4] 11.45-11.45 sec 0.00 Bytes 0.00 bits/sec
> [ 4] 11.45-11.45 sec 0.00 Bytes 0.00 bits/sec
> [ 4] 11.45-12.51 sec 16.0 MBytes 126 Mbits/sec
> [ 4] 12.51-13.59 sec 20.3 MBytes 158 Mbits/sec
> [ 4] 13.59-14.65 sec 13.4 MBytes 107 Mbits/sec
> [ 4] 14.65-16.79 sec 33.3 MBytes 130 Mbits/sec
> [ 4] 16.79-16.79 sec 0.00 Bytes 0.00 bits/sec
> [ 4] 16.79-17.82 sec 5.94 MBytes 48.7 Mbits/sec
> (etc)
>
> Note: the difference appears to be SCTP-specific, as I get exactly the
same TCP throughput in both kernels.
>
Ouch. That is not very good behavior... I wonder if this
a side-effect of the new rwnd algorithm...
In fact, I think I do see a small problem with the algorithm.
Can you try this patch:
diff --git a/net/sctp/ulpevent.c b/net/sctp/ulpevent.c
index 8d198ae..c17592a 100644
--- a/net/sctp/ulpevent.c
+++ b/net/sctp/ulpevent.c
@@ -738,7 +738,7 @@ struct sctp_ulpevent
*sctp_ulpevent_make_rcvmsg(struct sctp_association *asoc,
* Since this is a clone of the original skb, only account for
* the data of this chunk as other chunks will be accounted separately.
*/
- sctp_ulpevent_init(event, 0, skb->len + sizeof(struct sk_buff));
+ sctp_ulpevent_init(event, 0, skb->len);
sctp_ulpevent_receive_data(event, asoc);
^ permalink raw reply related [flat|nested] 37+ messages in thread
* RE: Is SCTP throughput really this low compared to TCP?
2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
` (14 preceding siblings ...)
2014-04-11 19:20 ` Vlad Yasevich
@ 2014-04-11 19:24 ` Butler, Peter
2014-04-11 20:14 ` Butler, Peter
` (19 subsequent siblings)
35 siblings, 0 replies; 37+ messages in thread
From: Butler, Peter @ 2014-04-11 19:24 UTC (permalink / raw)
To: linux-sctp
I can certainly try that patch, however see the previous email where Daniel suggested that the issue may be IPv4 only. I have subsequently tested it (email sent out 5 minutes ago) and he was right: IPv6 is smooth, whereas IPv4 is erratic.
Although even when using the smooth IPv6 behaviour, the 3.4.2 throughput is still better than 3.14; for example, 2.1 Gbps in the 'no' latency case (0.2 ms RTT) on 3.4.2 but only 1.6 Gbps with 3.14.
Should I try out the patch, or does the IPv4 vs IPv6 data shed new light on this?
-----Original Message-----
From: Vlad Yasevich [mailto:vyasevich@gmail.com]
Sent: April-11-14 3:21 PM
To: Butler, Peter; Daniel Borkmann
Cc: linux-sctp@vger.kernel.org
Subject: Re: Is SCTP throughput really this low compared to TCP?
On 04/11/2014 02:22 PM, Butler, Peter wrote:
> The difference between 3.14 and 3.4.2 is staggering. An order of
magnitude or so. For example, using the precisely same setup as before, whereas I get about 2.1 Gbps throughput with 3.4 2, I can only manage between 70-150 Mbps with 3.14 - a staggering difference.
>
> Moreover, the SCTP throughput seems to 'choke' itself with 3.14, such
that it is always trying to recover. For example, with 3.4.2 the 2.1 Gbps throughput is quite consistent from one second to the next (as you would expect):
>
> but with 3.14 the numbers as all over the place:
>
> [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t
> 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2 3.14.0 #1
> SMP Thu Apr 3 23:18:29 EDT 2014 x86_64
> Time: Fri, 11 Apr 2014 17:56:21 GMT
> Connecting to host 192.168.241.3, port 5201
> Cookie: Lab200slot2.1397238981.812898.548918
> [ 4] local 192.168.241.2 port 38616 connected to 192.168.241.3 port
> 5201 Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks,
> omitting 0
seconds, 60 second test
> [ ID] Interval Transfer Bandwidth
> [ 4] 0.00-1.09 sec 20.8 MBytes 161 Mbits/sec
> [ 4] 1.09-2.13 sec 10.8 MBytes 86.8 Mbits/sec
> [ 4] 2.13-3.15 sec 3.57 MBytes 29.5 Mbits/sec
> [ 4] 3.15-4.16 sec 4.33 MBytes 35.7 Mbits/sec
> [ 4] 4.16-6.21 sec 10.4 MBytes 42.7 Mbits/sec
> [ 4] 6.21-6.21 sec 0.00 Bytes 0.00 bits/sec
> [ 4] 6.21-7.35 sec 34.6 MBytes 253 Mbits/sec
> [ 4] 7.35-11.45 sec 22.0 MBytes 45.0 Mbits/sec
> [ 4] 11.45-11.45 sec 0.00 Bytes 0.00 bits/sec [ 4] 11.45-11.45
> sec 0.00 Bytes 0.00 bits/sec [ 4] 11.45-11.45 sec 0.00 Bytes
> 0.00 bits/sec
> [ 4] 11.45-12.51 sec 16.0 MBytes 126 Mbits/sec
> [ 4] 12.51-13.59 sec 20.3 MBytes 158 Mbits/sec
> [ 4] 13.59-14.65 sec 13.4 MBytes 107 Mbits/sec
> [ 4] 14.65-16.79 sec 33.3 MBytes 130 Mbits/sec
> [ 4] 16.79-16.79 sec 0.00 Bytes 0.00 bits/sec [ 4] 16.79-17.82
> sec 5.94 MBytes 48.7 Mbits/sec
> (etc)
>
> Note: the difference appears to be SCTP-specific, as I get exactly the
same TCP throughput in both kernels.
>
Ouch. That is not very good behavior... I wonder if this a side-effect of the new rwnd algorithm...
In fact, I think I do see a small problem with the algorithm.
Can you try this patch:
<snipped>
^ permalink raw reply [flat|nested] 37+ messages in thread
* RE: Is SCTP throughput really this low compared to TCP?
2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
` (15 preceding siblings ...)
2014-04-11 19:24 ` Butler, Peter
@ 2014-04-11 20:14 ` Butler, Peter
2014-04-11 20:18 ` Butler, Peter
` (18 subsequent siblings)
35 siblings, 0 replies; 37+ messages in thread
From: Butler, Peter @ 2014-04-11 20:14 UTC (permalink / raw)
To: linux-sctp
I tried setting /proc/sys/net/sctp/max_burst to 0 on both nodes (both sides of the association): doing so prevented iperf from sending any traffic at all. Only the initial 4-way handshake and subsequent heartbeat packets were transmitted. When I kill the process (client and/or server) it reports 0 bits/s throughput.
I then tried various different values from 1 to 16 (default was 4), they all result in about the same 2.1 Gbps throughput (for the low-latency (0.2 ms RTT) case) as originally reported.
-----Original Message-----
From: Vlad Yasevich [mailto:vyasevich@gmail.com]
Sent: April-11-14 2:20 PM
To: Butler, Peter; Daniel Borkmann
Cc: linux-sctp@vger.kernel.org
Subject: Re: Is SCTP throughput really this low compared to TCP?
On 04/11/2014 11:07 AM, Butler, Peter wrote:
> Yes indeed this is ixgbe and I do have SCTP checksum offloading
> enabled. (For what it's worth, the checksum offload gives about a 20%
> throughput gain - but this is, of course, already included in the
> numbers I posted to this thread as I've been using the CRC offload all
> along.)
>
> I re-did all the tests with TSO/GSO/LRO/GRO disabled (on both sides of the association - i.e. on both endpoint nodes), and using 1452-byte messages instead of 1000-byte messages. With this new setup, the TCP performance drops significantly, as expected, while the SCTP performance is boosted, and the playing field is somewhat more 'level'. (Note that I could not use 1464-byte messages as suggested by Vlad, as anything above 1452 cut the SCTP performance in half - must have hit the segmentation limit at this slightly lower message size. MTU is 1500.)
>
> So comparing "apples to apples" now, TCP only out-performs SCTP by approximately 40-70% over the various range of network latencies I tested with (RTTs of 0.2 ms, 10 ms, 20 ms, and 50 ms). 40-70% is still significant, but nowhere near the 200% better (i.e. 3 times the throughput) I was getting before.
>
> Does this value (i.e. 40-70%) sound reasonable? Is this the more-or-less accepted performance difference with the current LKSCTP implementation?
>
> Also, for what it's worth, I get better SCTP throughput numbers with the older kernel (3.4.2) than with the newer kernel (3.14)...
>
Hi Peter
Could you run an experiment setting max_burst to 0?
The way the SCTP spec is written, upon every SACK, if the stack has new data to send, it will only sent max_burst*mtu bytes. This may not always fill the current congestion window and might preclude growth.
Setting max_burst to 0 will disable burst limitation. I am curious to see if this would impact throughput on a low-rtt link like yours.
-vlad
>
>
> -----Original Message-----
> From: Daniel Borkmann [mailto:dborkman@redhat.com]
> Sent: April-11-14 3:43 AM
> To: Vlad Yasevich
> Cc: Butler, Peter; linux-sctp@vger.kernel.org
> Subject: Re: Is SCTP throughput really this low compared to TCP?
>
> Hi Peter,
>
> On 04/10/2014 10:21 PM, Vlad Yasevich wrote:
>> On 04/10/2014 03:12 PM, Butler, Peter wrote:
>>> I've been testing SCTP throughput between two nodes over a 10Gb-Ethernet backplane, and am finding that at best, its throughput is about a third of that of TCP. Is this number generally accepted for current LKSCTP performance?
>>>
>>> All TCP/SCTP tests performed with 1000-byte (payload) messages, between 8-core Xeon nodes @ 2.13GHz, with no CPU throttling (always running at 100%) on otherwise idle systems. Test applications include netperf, iperf and proprietary in-house stubs.
>>>
>>> The latency between nodes is generally 0.2 ms. Tests were run using this low-latency scenario, as well as using traffic control (tc) to simulate networks with 10 ms, 20 ms and 50 ms latency (i.e. 20 ms, 40 ms and 100 ms RTT, respectively).
>>>
>>> In addition, each of these network scenarios were tested using various kernel socket buffer sizes, ranging from the default kernel size (100-200 kB), to several MB for send and receive buffers, and multiple send:receive ratios for these buffer sizes (generally using larger receive buffer sizes, up to a factor of about 6).
>>>
>>> Finally, tests were performed on kernels as old as 3.4.2 and as recent as 3.14.
>>>
>>> The TCP throughput is about 3x higher than that of SCTP as a best-case scenario (i.e. from an SCTP perspective), and much higher still in worst-case scenarios.
>>
>> To do a more of apples-to-apples comparison, you need to disable
>> tso/gso on the sending node.
>>
>> The reason is that even if you limit buffer sizes, tcp will still try
>> to do tso on the transmit size, thus coalescing you 1000-byte
>> messages into something much larger, thus utilizing your MTU much more efficiently.
>>
>> SCTP, on the other hand, has to preserve message boundaries which
>> results in sub-optimal mtu utilization when using 1000-byte payloads.
>>
>> My recommendation is to use 1464 byte message for SCTP on a 1500 byte
>> MTU nic.
>>
>> I would be interested to see the results. There could very well be issues.
>
> Agreed.
>
> Also, what NIC are you using? It seems only Intel provides SCTP checksum offloading so far, i.e. ixgbe/i40e NICs.
>
>> -vlad
^ permalink raw reply [flat|nested] 37+ messages in thread
* RE: Is SCTP throughput really this low compared to TCP?
2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
` (16 preceding siblings ...)
2014-04-11 20:14 ` Butler, Peter
@ 2014-04-11 20:18 ` Butler, Peter
2014-04-11 20:51 ` Vlad Yasevich
` (17 subsequent siblings)
35 siblings, 0 replies; 37+ messages in thread
From: Butler, Peter @ 2014-04-11 20:18 UTC (permalink / raw)
To: linux-sctp
It may take a little time to do the bisection. Can you provide me with a guesstimate as to which kernel(s) may have introduced this behaviour?
-----Original Message-----
From: Daniel Borkmann [mailto:dborkman@redhat.com]
Sent: April-11-14 2:40 PM
To: Butler, Peter
Cc: Vlad Yasevich; linux-sctp@vger.kernel.org
Subject: Re: Is SCTP throughput really this low compared to TCP?
On 04/11/2014 08:22 PM, Butler, Peter wrote:
> The difference between 3.14 and 3.4.2 is staggering. An order of
> magnitude or so. For example,
> using the precisely same setup as before, whereas I get about 2.1 Gbps throughput with 3.4 2, I > can only manage between 70-150 Mbps with 3.14 - a staggering difference.
>
> Moreover, the SCTP throughput seems to 'choke' itself with 3.14, such
> that it is always trying to
> recover. For example, with 3.4.2 the 2.1 Gbps throughput is quite consistent from one second to > the next (as you would expect):
>
> [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t
> 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2
> 3.4.2-1.fc16.x86_64 #1 SMP Thu Jun 14 20:17:26 UTC 2012 x86_64
> Time: Fri, 11 Apr 2014 18:19:15 GMT
> Connecting to host 192.168.241.3, port 5201
> Cookie: Lab200slot2.1397240355.069035.0d5b0f
> [ 4] local 192.168.241.2 port 56030 connected to 192.168.241.3 port
> 5201 Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
> [ ID] Interval Transfer Bandwidth
> [ 4] 0.00-1.00 sec 255 MBytes 2.14 Gbits/sec
> [ 4] 1.00-2.00 sec 253 MBytes 2.12 Gbits/sec
> [ 4] 2.00-3.00 sec 255 MBytes 2.14 Gbits/sec
> [ 4] 3.00-4.00 sec 255 MBytes 2.14 Gbits/sec
> [ 4] 4.00-5.00 sec 255 MBytes 2.14 Gbits/sec
> [ 4] 5.00-6.00 sec 257 MBytes 2.15 Gbits/sec
> [ 4] 6.00-7.00 sec 253 MBytes 2.13 Gbits/sec
> [ 4] 7.00-8.00 sec 254 MBytes 2.13 Gbits/sec
> [ 4] 8.00-9.00 sec 255 MBytes 2.14 Gbits/sec
> [ 4] 9.00-10.00 sec 252 MBytes 2.12 Gbits/sec
> (etc)
>
> but with 3.14 the numbers as all over the place:
>
> [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t
> 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2 3.14.0 #1
> SMP Thu Apr 3 23:18:29 EDT 2014 x86_64
> Time: Fri, 11 Apr 2014 17:56:21 GMT
> Connecting to host 192.168.241.3, port 5201
> Cookie: Lab200slot2.1397238981.812898.548918
> [ 4] local 192.168.241.2 port 38616 connected to 192.168.241.3 port
> 5201 Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
> [ ID] Interval Transfer Bandwidth
> [ 4] 0.00-1.09 sec 20.8 MBytes 161 Mbits/sec
> [ 4] 1.09-2.13 sec 10.8 MBytes 86.8 Mbits/sec
> [ 4] 2.13-3.15 sec 3.57 MBytes 29.5 Mbits/sec
> [ 4] 3.15-4.16 sec 4.33 MBytes 35.7 Mbits/sec
> [ 4] 4.16-6.21 sec 10.4 MBytes 42.7 Mbits/sec
> [ 4] 6.21-6.21 sec 0.00 Bytes 0.00 bits/sec
> [ 4] 6.21-7.35 sec 34.6 MBytes 253 Mbits/sec
> [ 4] 7.35-11.45 sec 22.0 MBytes 45.0 Mbits/sec
> [ 4] 11.45-11.45 sec 0.00 Bytes 0.00 bits/sec [ 4] 11.45-11.45
> sec 0.00 Bytes 0.00 bits/sec [ 4] 11.45-11.45 sec 0.00 Bytes
> 0.00 bits/sec
> [ 4] 11.45-12.51 sec 16.0 MBytes 126 Mbits/sec
> [ 4] 12.51-13.59 sec 20.3 MBytes 158 Mbits/sec
> [ 4] 13.59-14.65 sec 13.4 MBytes 107 Mbits/sec
> [ 4] 14.65-16.79 sec 33.3 MBytes 130 Mbits/sec
> [ 4] 16.79-16.79 sec 0.00 Bytes 0.00 bits/sec [ 4] 16.79-17.82
> sec 5.94 MBytes 48.7 Mbits/sec
> (etc)
>
> Note: the difference appears to be SCTP-specific, as I get exactly the
> same TCP
> throughput in both kernels.
Hmm, okay. :/ Could you further bisect on your side to narrow down from which kernel onwards this behaviour can be seen?
Thanks,
Daniel
> -----Original Message-----
> From: Daniel Borkmann [mailto:dborkman@redhat.com]
> Sent: April-11-14 11:22 AM
> To: Butler, Peter
> Cc: Vlad Yasevich; linux-sctp@vger.kernel.org
> Subject: Re: Is SCTP throughput really this low compared to TCP?
>
> On 04/11/2014 05:07 PM, Butler, Peter wrote:
>> Yes indeed this is ixgbe and I do have SCTP checksum offloading
>> enabled. (For what
> > it's worth, the checksum offload gives about a 20% throughput gain
> - but this is, > of course, already included in the numbers I posted
> to this thread as I've been using > the CRC offload all along.)
>
> Ok, understood.
>
>> I re-did all the tests with TSO/GSO/LRO/GRO disabled (on both sides
>> of the association -
> > i.e. on both endpoint nodes), and using 1452-byte messages instead of 1000-byte > messages. With this new setup, the TCP performance drops significantly, as expected, > while the SCTP performance is boosted, and the playing field is somewhat more 'level'.
> > (Note that I could not use 1464-byte messages as suggested by
> Vlad, as anything above > 1452 cut the SCTP performance in half -
> must have hit the segmentation limit at this > slightly lower message
> size. MTU is 1500.)
>>
>> So comparing "apples to apples" now, TCP only out-performs SCTP by
>> approximately 40-70%
> > over the various range of network latencies I tested with (RTTs of 0.2 ms, 10 ms, 20 ms, > and 50 ms). 40-70% is still significant, but nowhere near the 200% better (i.e. 3 > times the throughput) I was getting before.
>>
>> Does this value (i.e. 40-70%) sound reasonable? Is this the
>> more-or-less accepted
> > performance difference with the current LKSCTP implementation?
>
> Yes, that sounds reasonable to me. There are still a lot of open todos in terms of performance that we need to tackle over time, e.g. the way chunks are handled, imho, and copies involved in fast path, also that we're heavily using atomic reference counting, and other open issues.
>
>> Also, for what it's worth, I get better SCTP throughput numbers with
>> the older kernel
> > (3.4.2) than with the newer kernel (3.14)...
>
> Interesting, a lot of things happened in between; were you able to bisect/identify a possible commit that causes this? How big is the difference?
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Is SCTP throughput really this low compared to TCP?
2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
` (17 preceding siblings ...)
2014-04-11 20:18 ` Butler, Peter
@ 2014-04-11 20:51 ` Vlad Yasevich
2014-04-11 20:53 ` Vlad Yasevich
` (16 subsequent siblings)
35 siblings, 0 replies; 37+ messages in thread
From: Vlad Yasevich @ 2014-04-11 20:51 UTC (permalink / raw)
To: linux-sctp
On 04/11/2014 03:24 PM, Butler, Peter wrote:
> I can certainly try that patch, however see the previous email where Daniel suggested that the issue may be IPv4 only. I have subsequently tested it (email sent out 5 minutes ago) and he was right: IPv6 is smooth, whereas IPv4 is erratic.
>
> Although even when using the smooth IPv6 behaviour, the 3.4.2 throughput is still better than 3.14; for example, 2.1 Gbps in the 'no' latency case (0.2 ms RTT) on 3.4.2 but only 1.6 Gbps with 3.14.
>
> Should I try out the patch, or does the IPv4 vs IPv6 data shed new light on this?
No, the patch is actually wrong, so don't worry about it.
The v4 vs v6 is data is definitely something we need to address.
Thanks
-vlad
>
>
>
>
> -----Original Message-----
> From: Vlad Yasevich [mailto:vyasevich@gmail.com]
> Sent: April-11-14 3:21 PM
> To: Butler, Peter; Daniel Borkmann
> Cc: linux-sctp@vger.kernel.org
> Subject: Re: Is SCTP throughput really this low compared to TCP?
>
> On 04/11/2014 02:22 PM, Butler, Peter wrote:
>> The difference between 3.14 and 3.4.2 is staggering. An order of
> magnitude or so. For example, using the precisely same setup as before, whereas I get about 2.1 Gbps throughput with 3.4 2, I can only manage between 70-150 Mbps with 3.14 - a staggering difference.
>>
>> Moreover, the SCTP throughput seems to 'choke' itself with 3.14, such
> that it is always trying to recover. For example, with 3.4.2 the 2.1 Gbps throughput is quite consistent from one second to the next (as you would expect):
>>
>> but with 3.14 the numbers as all over the place:
>>
>> [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t
>> 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2 3.14.0 #1
>> SMP Thu Apr 3 23:18:29 EDT 2014 x86_64
>> Time: Fri, 11 Apr 2014 17:56:21 GMT
>> Connecting to host 192.168.241.3, port 5201
>> Cookie: Lab200slot2.1397238981.812898.548918
>> [ 4] local 192.168.241.2 port 38616 connected to 192.168.241.3 port
>> 5201 Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks,
>> omitting 0
> seconds, 60 second test
>> [ ID] Interval Transfer Bandwidth
>> [ 4] 0.00-1.09 sec 20.8 MBytes 161 Mbits/sec
>> [ 4] 1.09-2.13 sec 10.8 MBytes 86.8 Mbits/sec
>> [ 4] 2.13-3.15 sec 3.57 MBytes 29.5 Mbits/sec
>> [ 4] 3.15-4.16 sec 4.33 MBytes 35.7 Mbits/sec
>> [ 4] 4.16-6.21 sec 10.4 MBytes 42.7 Mbits/sec
>> [ 4] 6.21-6.21 sec 0.00 Bytes 0.00 bits/sec
>> [ 4] 6.21-7.35 sec 34.6 MBytes 253 Mbits/sec
>> [ 4] 7.35-11.45 sec 22.0 MBytes 45.0 Mbits/sec
>> [ 4] 11.45-11.45 sec 0.00 Bytes 0.00 bits/sec [ 4] 11.45-11.45
>> sec 0.00 Bytes 0.00 bits/sec [ 4] 11.45-11.45 sec 0.00 Bytes
>> 0.00 bits/sec
>> [ 4] 11.45-12.51 sec 16.0 MBytes 126 Mbits/sec
>> [ 4] 12.51-13.59 sec 20.3 MBytes 158 Mbits/sec
>> [ 4] 13.59-14.65 sec 13.4 MBytes 107 Mbits/sec
>> [ 4] 14.65-16.79 sec 33.3 MBytes 130 Mbits/sec
>> [ 4] 16.79-16.79 sec 0.00 Bytes 0.00 bits/sec [ 4] 16.79-17.82
>> sec 5.94 MBytes 48.7 Mbits/sec
>> (etc)
>>
>> Note: the difference appears to be SCTP-specific, as I get exactly the
> same TCP throughput in both kernels.
>>
>
> Ouch. That is not very good behavior... I wonder if this a side-effect of the new rwnd algorithm...
>
> In fact, I think I do see a small problem with the algorithm.
>
> Can you try this patch:
>
> <snipped>
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Is SCTP throughput really this low compared to TCP?
2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
` (18 preceding siblings ...)
2014-04-11 20:51 ` Vlad Yasevich
@ 2014-04-11 20:53 ` Vlad Yasevich
2014-04-11 20:57 ` Butler, Peter
` (15 subsequent siblings)
35 siblings, 0 replies; 37+ messages in thread
From: Vlad Yasevich @ 2014-04-11 20:53 UTC (permalink / raw)
To: linux-sctp
On 04/11/2014 04:14 PM, Butler, Peter wrote:
> I tried setting /proc/sys/net/sctp/max_burst to 0 on both nodes (both sides of the association): doing so prevented iperf from sending any traffic at all. Only the initial 4-way handshake and subsequent heartbeat packets were transmitted. When I kill the process (client and/or server) it reports 0 bits/s throughput.
>
> I then tried various different values from 1 to 16 (default was 4), they all result in about the same 2.1 Gbps throughput (for the low-latency (0.2 ms RTT) case) as originally reported.
>
>
I didn't realize that max_burst=0 only works in the 3.14 kernel.
However, the data for burst of 16 definitely helps. I was concerned
that this might be a congestion window issue.
Thanks
-vlad
>
>
> -----Original Message-----
> From: Vlad Yasevich [mailto:vyasevich@gmail.com]
> Sent: April-11-14 2:20 PM
> To: Butler, Peter; Daniel Borkmann
> Cc: linux-sctp@vger.kernel.org
> Subject: Re: Is SCTP throughput really this low compared to TCP?
>
> On 04/11/2014 11:07 AM, Butler, Peter wrote:
>> Yes indeed this is ixgbe and I do have SCTP checksum offloading
>> enabled. (For what it's worth, the checksum offload gives about a 20%
>> throughput gain - but this is, of course, already included in the
>> numbers I posted to this thread as I've been using the CRC offload all
>> along.)
>>
>> I re-did all the tests with TSO/GSO/LRO/GRO disabled (on both sides of the association - i.e. on both endpoint nodes), and using 1452-byte messages instead of 1000-byte messages. With this new setup, the TCP performance drops significantly, as expected, while the SCTP performance is boosted, and the playing field is somewhat more 'level'. (Note that I could not use 1464-byte messages as suggested by Vlad, as anything above 1452 cut the SCTP performance in half - must have hit the segmentation limit at this slightly lower message size. MTU is 1500.)
>>
>> So comparing "apples to apples" now, TCP only out-performs SCTP by approximately 40-70% over the various range of network latencies I tested with (RTTs of 0.2 ms, 10 ms, 20 ms, and 50 ms). 40-70% is still significant, but nowhere near the 200% better (i.e. 3 times the throughput) I was getting before.
>>
>> Does this value (i.e. 40-70%) sound reasonable? Is this the more-or-less accepted performance difference with the current LKSCTP implementation?
>>
>> Also, for what it's worth, I get better SCTP throughput numbers with the older kernel (3.4.2) than with the newer kernel (3.14)...
>>
>
> Hi Peter
>
> Could you run an experiment setting max_burst to 0?
>
> The way the SCTP spec is written, upon every SACK, if the stack has new data to send, it will only sent max_burst*mtu bytes. This may not always fill the current congestion window and might preclude growth.
>
> Setting max_burst to 0 will disable burst limitation. I am curious to see if this would impact throughput on a low-rtt link like yours.
>
> -vlad
>
>
>>
>>
>> -----Original Message-----
>> From: Daniel Borkmann [mailto:dborkman@redhat.com]
>> Sent: April-11-14 3:43 AM
>> To: Vlad Yasevich
>> Cc: Butler, Peter; linux-sctp@vger.kernel.org
>> Subject: Re: Is SCTP throughput really this low compared to TCP?
>>
>> Hi Peter,
>>
>> On 04/10/2014 10:21 PM, Vlad Yasevich wrote:
>>> On 04/10/2014 03:12 PM, Butler, Peter wrote:
>>>> I've been testing SCTP throughput between two nodes over a 10Gb-Ethernet backplane, and am finding that at best, its throughput is about a third of that of TCP. Is this number generally accepted for current LKSCTP performance?
>>>>
>>>> All TCP/SCTP tests performed with 1000-byte (payload) messages, between 8-core Xeon nodes @ 2.13GHz, with no CPU throttling (always running at 100%) on otherwise idle systems. Test applications include netperf, iperf and proprietary in-house stubs.
>>>>
>>>> The latency between nodes is generally 0.2 ms. Tests were run using this low-latency scenario, as well as using traffic control (tc) to simulate networks with 10 ms, 20 ms and 50 ms latency (i.e. 20 ms, 40 ms and 100 ms RTT, respectively).
>>>>
>>>> In addition, each of these network scenarios were tested using various kernel socket buffer sizes, ranging from the default kernel size (100-200 kB), to several MB for send and receive buffers, and multiple send:receive ratios for these buffer sizes (generally using larger receive buffer sizes, up to a factor of about 6).
>>>>
>>>> Finally, tests were performed on kernels as old as 3.4.2 and as recent as 3.14.
>>>>
>>>> The TCP throughput is about 3x higher than that of SCTP as a best-case scenario (i.e. from an SCTP perspective), and much higher still in worst-case scenarios.
>>>
>>> To do a more of apples-to-apples comparison, you need to disable
>>> tso/gso on the sending node.
>>>
>>> The reason is that even if you limit buffer sizes, tcp will still try
>>> to do tso on the transmit size, thus coalescing you 1000-byte
>>> messages into something much larger, thus utilizing your MTU much more efficiently.
>>>
>>> SCTP, on the other hand, has to preserve message boundaries which
>>> results in sub-optimal mtu utilization when using 1000-byte payloads.
>>>
>>> My recommendation is to use 1464 byte message for SCTP on a 1500 byte
>>> MTU nic.
>>>
>>> I would be interested to see the results. There could very well be issues.
>>
>> Agreed.
>>
>> Also, what NIC are you using? It seems only Intel provides SCTP checksum offloading so far, i.e. ixgbe/i40e NICs.
>>
>>> -vlad
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* RE: Is SCTP throughput really this low compared to TCP?
2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
` (19 preceding siblings ...)
2014-04-11 20:53 ` Vlad Yasevich
@ 2014-04-11 20:57 ` Butler, Peter
2014-04-11 23:58 ` Daniel Borkmann
` (14 subsequent siblings)
35 siblings, 0 replies; 37+ messages in thread
From: Butler, Peter @ 2014-04-11 20:57 UTC (permalink / raw)
To: linux-sctp
I just realized that I didn't mention which kernel I tested the max_burst values with: it was 3.4.2. (I haven't tested it with the 3.14 kernel)
So, to recap: on 3.4.2 max_burst=0 prevented any DATA traffic from being sent, and max_burst values from 1 to 16 yielded the same throughput of ~2.1 Gbps on the low latency network setup.
-----Original Message-----
From: Vlad Yasevich [mailto:vyasevich@gmail.com]
Sent: April-11-14 4:54 PM
To: Butler, Peter; Daniel Borkmann
Cc: linux-sctp@vger.kernel.org
Subject: Re: Is SCTP throughput really this low compared to TCP?
On 04/11/2014 04:14 PM, Butler, Peter wrote:
> I tried setting /proc/sys/net/sctp/max_burst to 0 on both nodes (both sides of the association): doing so prevented iperf from sending any traffic at all. Only the initial 4-way handshake and subsequent heartbeat packets were transmitted. When I kill the process (client and/or server) it reports 0 bits/s throughput.
>
> I then tried various different values from 1 to 16 (default was 4), they all result in about the same 2.1 Gbps throughput (for the low-latency (0.2 ms RTT) case) as originally reported.
>
>
I didn't realize that max_burst=0 only works in the 3.14 kernel.
However, the data for burst of 16 definitely helps. I was concerned that this might be a congestion window issue.
Thanks
-vlad
>
>
> -----Original Message-----
> From: Vlad Yasevich [mailto:vyasevich@gmail.com]
> Sent: April-11-14 2:20 PM
> To: Butler, Peter; Daniel Borkmann
> Cc: linux-sctp@vger.kernel.org
> Subject: Re: Is SCTP throughput really this low compared to TCP?
>
> On 04/11/2014 11:07 AM, Butler, Peter wrote:
>> Yes indeed this is ixgbe and I do have SCTP checksum offloading
>> enabled. (For what it's worth, the checksum offload gives about a
>> 20% throughput gain - but this is, of course, already included in the
>> numbers I posted to this thread as I've been using the CRC offload
>> all
>> along.)
>>
>> I re-did all the tests with TSO/GSO/LRO/GRO disabled (on both sides of the association - i.e. on both endpoint nodes), and using 1452-byte messages instead of 1000-byte messages. With this new setup, the TCP performance drops significantly, as expected, while the SCTP performance is boosted, and the playing field is somewhat more 'level'. (Note that I could not use 1464-byte messages as suggested by Vlad, as anything above 1452 cut the SCTP performance in half - must have hit the segmentation limit at this slightly lower message size. MTU is 1500.)
>>
>> So comparing "apples to apples" now, TCP only out-performs SCTP by approximately 40-70% over the various range of network latencies I tested with (RTTs of 0.2 ms, 10 ms, 20 ms, and 50 ms). 40-70% is still significant, but nowhere near the 200% better (i.e. 3 times the throughput) I was getting before.
>>
>> Does this value (i.e. 40-70%) sound reasonable? Is this the more-or-less accepted performance difference with the current LKSCTP implementation?
>>
>> Also, for what it's worth, I get better SCTP throughput numbers with the older kernel (3.4.2) than with the newer kernel (3.14)...
>>
>
> Hi Peter
>
> Could you run an experiment setting max_burst to 0?
>
> The way the SCTP spec is written, upon every SACK, if the stack has new data to send, it will only sent max_burst*mtu bytes. This may not always fill the current congestion window and might preclude growth.
>
> Setting max_burst to 0 will disable burst limitation. I am curious to see if this would impact throughput on a low-rtt link like yours.
>
> -vlad
>
>
>>
>>
>> -----Original Message-----
>> From: Daniel Borkmann [mailto:dborkman@redhat.com]
>> Sent: April-11-14 3:43 AM
>> To: Vlad Yasevich
>> Cc: Butler, Peter; linux-sctp@vger.kernel.org
>> Subject: Re: Is SCTP throughput really this low compared to TCP?
>>
>> Hi Peter,
>>
>> On 04/10/2014 10:21 PM, Vlad Yasevich wrote:
>>> On 04/10/2014 03:12 PM, Butler, Peter wrote:
>>>> I've been testing SCTP throughput between two nodes over a 10Gb-Ethernet backplane, and am finding that at best, its throughput is about a third of that of TCP. Is this number generally accepted for current LKSCTP performance?
>>>>
>>>> All TCP/SCTP tests performed with 1000-byte (payload) messages, between 8-core Xeon nodes @ 2.13GHz, with no CPU throttling (always running at 100%) on otherwise idle systems. Test applications include netperf, iperf and proprietary in-house stubs.
>>>>
>>>> The latency between nodes is generally 0.2 ms. Tests were run using this low-latency scenario, as well as using traffic control (tc) to simulate networks with 10 ms, 20 ms and 50 ms latency (i.e. 20 ms, 40 ms and 100 ms RTT, respectively).
>>>>
>>>> In addition, each of these network scenarios were tested using various kernel socket buffer sizes, ranging from the default kernel size (100-200 kB), to several MB for send and receive buffers, and multiple send:receive ratios for these buffer sizes (generally using larger receive buffer sizes, up to a factor of about 6).
>>>>
>>>> Finally, tests were performed on kernels as old as 3.4.2 and as recent as 3.14.
>>>>
>>>> The TCP throughput is about 3x higher than that of SCTP as a best-case scenario (i.e. from an SCTP perspective), and much higher still in worst-case scenarios.
>>>
>>> To do a more of apples-to-apples comparison, you need to disable
>>> tso/gso on the sending node.
>>>
>>> The reason is that even if you limit buffer sizes, tcp will still
>>> try to do tso on the transmit size, thus coalescing you 1000-byte
>>> messages into something much larger, thus utilizing your MTU much more efficiently.
>>>
>>> SCTP, on the other hand, has to preserve message boundaries which
>>> results in sub-optimal mtu utilization when using 1000-byte payloads.
>>>
>>> My recommendation is to use 1464 byte message for SCTP on a 1500
>>> byte MTU nic.
>>>
>>> I would be interested to see the results. There could very well be issues.
>>
>> Agreed.
>>
>> Also, what NIC are you using? It seems only Intel provides SCTP checksum offloading so far, i.e. ixgbe/i40e NICs.
>>
>>> -vlad
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Is SCTP throughput really this low compared to TCP?
2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
` (20 preceding siblings ...)
2014-04-11 20:57 ` Butler, Peter
@ 2014-04-11 23:58 ` Daniel Borkmann
2014-04-12 7:27 ` Dongsheng Song
` (13 subsequent siblings)
35 siblings, 0 replies; 37+ messages in thread
From: Daniel Borkmann @ 2014-04-11 23:58 UTC (permalink / raw)
To: linux-sctp
On 04/11/2014 10:18 PM, Butler, Peter wrote:
> It may take a little time to do the bisection. Can you provide me with a
> guesstimate as to which kernel(s) may have introduced this behaviour?
Just out of curiosity, could you do a ...
git revert ef2820a735f74ea60335f8ba3801b844f0cb184d
... on your tree and try to see what you get with that?
> -----Original Message-----
> From: Daniel Borkmann [mailto:dborkman@redhat.com]
> Sent: April-11-14 2:40 PM
> To: Butler, Peter
> Cc: Vlad Yasevich; linux-sctp@vger.kernel.org
> Subject: Re: Is SCTP throughput really this low compared to TCP?
>
> On 04/11/2014 08:22 PM, Butler, Peter wrote:
>> The difference between 3.14 and 3.4.2 is staggering. An order of
>> magnitude or so. For example,
> > using the precisely same setup as before, whereas I get about 2.1 Gbps throughput with 3.4 2, I > can only manage between 70-150 Mbps with 3.14 - a staggering difference.
>>
>> Moreover, the SCTP throughput seems to 'choke' itself with 3.14, such
>> that it is always trying to
> > recover. For example, with 3.4.2 the 2.1 Gbps throughput is quite consistent from one second to > the next (as you would expect):
>>
>> [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t
>> 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2
>> 3.4.2-1.fc16.x86_64 #1 SMP Thu Jun 14 20:17:26 UTC 2012 x86_64
>> Time: Fri, 11 Apr 2014 18:19:15 GMT
>> Connecting to host 192.168.241.3, port 5201
>> Cookie: Lab200slot2.1397240355.069035.0d5b0f
>> [ 4] local 192.168.241.2 port 56030 connected to 192.168.241.3 port
>> 5201 Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
>> [ ID] Interval Transfer Bandwidth
>> [ 4] 0.00-1.00 sec 255 MBytes 2.14 Gbits/sec
>> [ 4] 1.00-2.00 sec 253 MBytes 2.12 Gbits/sec
>> [ 4] 2.00-3.00 sec 255 MBytes 2.14 Gbits/sec
>> [ 4] 3.00-4.00 sec 255 MBytes 2.14 Gbits/sec
>> [ 4] 4.00-5.00 sec 255 MBytes 2.14 Gbits/sec
>> [ 4] 5.00-6.00 sec 257 MBytes 2.15 Gbits/sec
>> [ 4] 6.00-7.00 sec 253 MBytes 2.13 Gbits/sec
>> [ 4] 7.00-8.00 sec 254 MBytes 2.13 Gbits/sec
>> [ 4] 8.00-9.00 sec 255 MBytes 2.14 Gbits/sec
>> [ 4] 9.00-10.00 sec 252 MBytes 2.12 Gbits/sec
>> (etc)
>>
>> but with 3.14 the numbers as all over the place:
>>
>> [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t
>> 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2 3.14.0 #1
>> SMP Thu Apr 3 23:18:29 EDT 2014 x86_64
>> Time: Fri, 11 Apr 2014 17:56:21 GMT
>> Connecting to host 192.168.241.3, port 5201
>> Cookie: Lab200slot2.1397238981.812898.548918
>> [ 4] local 192.168.241.2 port 38616 connected to 192.168.241.3 port
>> 5201 Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
>> [ ID] Interval Transfer Bandwidth
>> [ 4] 0.00-1.09 sec 20.8 MBytes 161 Mbits/sec
>> [ 4] 1.09-2.13 sec 10.8 MBytes 86.8 Mbits/sec
>> [ 4] 2.13-3.15 sec 3.57 MBytes 29.5 Mbits/sec
>> [ 4] 3.15-4.16 sec 4.33 MBytes 35.7 Mbits/sec
>> [ 4] 4.16-6.21 sec 10.4 MBytes 42.7 Mbits/sec
>> [ 4] 6.21-6.21 sec 0.00 Bytes 0.00 bits/sec
>> [ 4] 6.21-7.35 sec 34.6 MBytes 253 Mbits/sec
>> [ 4] 7.35-11.45 sec 22.0 MBytes 45.0 Mbits/sec
>> [ 4] 11.45-11.45 sec 0.00 Bytes 0.00 bits/sec [ 4] 11.45-11.45
>> sec 0.00 Bytes 0.00 bits/sec [ 4] 11.45-11.45 sec 0.00 Bytes
>> 0.00 bits/sec
>> [ 4] 11.45-12.51 sec 16.0 MBytes 126 Mbits/sec
>> [ 4] 12.51-13.59 sec 20.3 MBytes 158 Mbits/sec
>> [ 4] 13.59-14.65 sec 13.4 MBytes 107 Mbits/sec
>> [ 4] 14.65-16.79 sec 33.3 MBytes 130 Mbits/sec
>> [ 4] 16.79-16.79 sec 0.00 Bytes 0.00 bits/sec [ 4] 16.79-17.82
>> sec 5.94 MBytes 48.7 Mbits/sec
>> (etc)
>>
>> Note: the difference appears to be SCTP-specific, as I get exactly the
>> same TCP
> > throughput in both kernels.
>
> Hmm, okay. :/ Could you further bisect on your side to narrow down from which kernel onwards this behaviour can be seen?
>
> Thanks,
>
> Daniel
>
>> -----Original Message-----
>> From: Daniel Borkmann [mailto:dborkman@redhat.com]
>> Sent: April-11-14 11:22 AM
>> To: Butler, Peter
>> Cc: Vlad Yasevich; linux-sctp@vger.kernel.org
>> Subject: Re: Is SCTP throughput really this low compared to TCP?
>>
>> On 04/11/2014 05:07 PM, Butler, Peter wrote:
>>> Yes indeed this is ixgbe and I do have SCTP checksum offloading
>>> enabled. (For what
>> > it's worth, the checksum offload gives about a 20% throughput gain
>> - but this is, > of course, already included in the numbers I posted
>> to this thread as I've been using > the CRC offload all along.)
>>
>> Ok, understood.
>>
>>> I re-did all the tests with TSO/GSO/LRO/GRO disabled (on both sides
>>> of the association -
>> > i.e. on both endpoint nodes), and using 1452-byte messages instead of 1000-byte > messages. With this new setup, the TCP performance drops significantly, as expected, > while the SCTP performance is boosted, and the playing field is somewhat more 'level'.
>> > (Note that I could not use 1464-byte messages as suggested by
>> Vlad, as anything above > 1452 cut the SCTP performance in half -
>> must have hit the segmentation limit at this > slightly lower message
>> size. MTU is 1500.)
>>>
>>> So comparing "apples to apples" now, TCP only out-performs SCTP by
>>> approximately 40-70%
>> > over the various range of network latencies I tested with (RTTs of 0.2 ms, 10 ms, 20 ms, > and 50 ms). 40-70% is still significant, but nowhere near the 200% better (i.e. 3 > times the throughput) I was getting before.
>>>
>>> Does this value (i.e. 40-70%) sound reasonable? Is this the
>>> more-or-less accepted
>> > performance difference with the current LKSCTP implementation?
>>
>> Yes, that sounds reasonable to me. There are still a lot of open todos in terms of performance that we need to tackle over time, e.g. the way chunks are handled, imho, and copies involved in fast path, also that we're heavily using atomic reference counting, and other open issues.
>>
>>> Also, for what it's worth, I get better SCTP throughput numbers with
>>> the older kernel
>> > (3.4.2) than with the newer kernel (3.14)...
>>
>> Interesting, a lot of things happened in between; were you able to bisect/identify a possible commit that causes this? How big is the difference?
>>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Is SCTP throughput really this low compared to TCP?
2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
` (21 preceding siblings ...)
2014-04-11 23:58 ` Daniel Borkmann
@ 2014-04-12 7:27 ` Dongsheng Song
2014-04-14 14:52 ` Butler, Peter
` (12 subsequent siblings)
35 siblings, 0 replies; 37+ messages in thread
From: Dongsheng Song @ 2014-04-12 7:27 UTC (permalink / raw)
To: linux-sctp
Hi all,
I found a strange SCTP stream behavior, when the message size equal to
the recv buffer size, the performance is VERY LOW ( just 1% !!!). TCP
no such issue, just drop 20%.
*) SCTP message size = 128K, recvbuffer size = 128K
netperf -H 10.0.0.201 -p 1234 -l 10 -t SCTP_STREAM -- -m 128K -S 64K
SCTP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.201
() port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
131072 212992 131072 10.04 6.16
*) message size = 64K, recvbuffer size = 128K
Then I double the send buffer size:
netperf -H 10.0.0.201 -p 1234 -l 10 -t SCTP_STREAM -- -m 64K -S 64K
SCTP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.201
() port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
131072 212992 65536 10.00 596.26
But TCP no such issue,
*) TCP message size = 128K, recvbuffer size = 128K
netperf -H 10.0.0.201 -p 1234 -l 10 -t TCP_STREAM -- -m 128K -S 64K
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
10.0.0.201 () port 0 AF_INET
./netperf -H 10.0.0.201 -p 1234 -l 10 -t TCP_STREAM -- -m 128K -S 64K
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
131072 16384 131072 10.00 2368.84
*) TCP message size = 64K, recvbuffer size = 128K
netperf -H 10.0.0.201 -p 1234 -l 10 -t TCP_STREAM -- -m 128K -S 128K
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
10.0.0.201 () port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
262144 16384 131072 10.00 2993.15
I'm doing these test 2 VMware VMs with Intel Core i3-2120 @ 3.30GHz,
running Ubuntu 14.04 LTS, Linux kernel3.13.0-24-generic.
SCTP_RR is very good, but SCTP_STREAM have many improve room:
[9250.98] netperf -H 10.0.0.201 -p 1234 -l 10 -t UDP_RR -- -r 960,512
[9070.85] netperf -H 10.0.0.201 -p 1234 -l 10 -t TCP_RR -- -r 960,512
[2127.47] netperf -H 10.0.0.201 -p 1234 -l 10 -t TCP_CRR -- -r 960,512
[8126.60] netperf -H 10.0.0.201 -p 1234 -l 10 -t SCTP_RR -- -r 960,512
[3380.84] netperf -H 10.0.0.201 -p 1234 -l 10 -t TCP_STREAM -- -m 16384
[ 952.17] netperf -H 10.0.0.201 -p 1234 -l 10 -t UDP_STREAM -- -m 65504
[ 699.98] netperf -H 10.0.0.201 -p 1234 -l 10 -t SCTP_STREAM -- -m
128K -s 512K -M 128K -S 256K
Regards,
Dongsheng
^ permalink raw reply [flat|nested] 37+ messages in thread
* RE: Is SCTP throughput really this low compared to TCP?
2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
` (22 preceding siblings ...)
2014-04-12 7:27 ` Dongsheng Song
@ 2014-04-14 14:52 ` Butler, Peter
2014-04-14 15:49 ` Butler, Peter
` (11 subsequent siblings)
35 siblings, 0 replies; 37+ messages in thread
From: Butler, Peter @ 2014-04-14 14:52 UTC (permalink / raw)
To: linux-sctp
I started the bisection process - again this will take some time and I (unfortunately) can't dedicate all my time to this at work (other fires for me to put out here as well). However on a hunch, based on TIPC work I had previously done that seemed to suggest that the 3.10.x kernel stream was the last stream before significant changes were made to the underlying net API, I tested the latest kernel in this stream, namely 3.10.36.
As I suspected, this kernel is still 'good' - that is, there is no erratic behaviour with SCTP/IPv4 and the throughput. However the throughput is still better with the 3.4 2 kernel (2.1 Gbps as opposed to 1.75 Gbps).
SUMMARY so far for SCTP+IPv4:
3.4.2 kernel: smooth throughput 2.1 Gbps
3.10.36 kernel: smooth throughput 1.75 Gbps
3.14 kernel: highly erratic, terrible throughput
-----Original Message-----
From: Daniel Borkmann [mailto:dborkman@redhat.com]
Sent: April-11-14 2:40 PM
To: Butler, Peter
Cc: Vlad Yasevich; linux-sctp@vger.kernel.org
Subject: Re: Is SCTP throughput really this low compared to TCP?
On 04/11/2014 08:22 PM, Butler, Peter wrote:
> The difference between 3.14 and 3.4.2 is staggering. An order of
> magnitude or so. For example,
> using the precisely same setup as before, whereas I get about 2.1 Gbps throughput with 3.4 2, I > can only manage between 70-150 Mbps with 3.14 - a staggering difference.
>
> Moreover, the SCTP throughput seems to 'choke' itself with 3.14, such
> that it is always trying to
> recover. For example, with 3.4.2 the 2.1 Gbps throughput is quite consistent from one second to > the next (as you would expect):
>
> [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t
> 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2
> 3.4.2-1.fc16.x86_64 #1 SMP Thu Jun 14 20:17:26 UTC 2012 x86_64
> Time: Fri, 11 Apr 2014 18:19:15 GMT
> Connecting to host 192.168.241.3, port 5201
> Cookie: Lab200slot2.1397240355.069035.0d5b0f
> [ 4] local 192.168.241.2 port 56030 connected to 192.168.241.3 port
> 5201 Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
> [ ID] Interval Transfer Bandwidth
> [ 4] 0.00-1.00 sec 255 MBytes 2.14 Gbits/sec
> [ 4] 1.00-2.00 sec 253 MBytes 2.12 Gbits/sec
> [ 4] 2.00-3.00 sec 255 MBytes 2.14 Gbits/sec
> [ 4] 3.00-4.00 sec 255 MBytes 2.14 Gbits/sec
> [ 4] 4.00-5.00 sec 255 MBytes 2.14 Gbits/sec
> [ 4] 5.00-6.00 sec 257 MBytes 2.15 Gbits/sec
> [ 4] 6.00-7.00 sec 253 MBytes 2.13 Gbits/sec
> [ 4] 7.00-8.00 sec 254 MBytes 2.13 Gbits/sec
> [ 4] 8.00-9.00 sec 255 MBytes 2.14 Gbits/sec
> [ 4] 9.00-10.00 sec 252 MBytes 2.12 Gbits/sec
> (etc)
>
> but with 3.14 the numbers as all over the place:
>
> [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t
> 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2 3.14.0 #1
> SMP Thu Apr 3 23:18:29 EDT 2014 x86_64
> Time: Fri, 11 Apr 2014 17:56:21 GMT
> Connecting to host 192.168.241.3, port 5201
> Cookie: Lab200slot2.1397238981.812898.548918
> [ 4] local 192.168.241.2 port 38616 connected to 192.168.241.3 port
> 5201 Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
> [ ID] Interval Transfer Bandwidth
> [ 4] 0.00-1.09 sec 20.8 MBytes 161 Mbits/sec
> [ 4] 1.09-2.13 sec 10.8 MBytes 86.8 Mbits/sec
> [ 4] 2.13-3.15 sec 3.57 MBytes 29.5 Mbits/sec
> [ 4] 3.15-4.16 sec 4.33 MBytes 35.7 Mbits/sec
> [ 4] 4.16-6.21 sec 10.4 MBytes 42.7 Mbits/sec
> [ 4] 6.21-6.21 sec 0.00 Bytes 0.00 bits/sec
> [ 4] 6.21-7.35 sec 34.6 MBytes 253 Mbits/sec
> [ 4] 7.35-11.45 sec 22.0 MBytes 45.0 Mbits/sec
> [ 4] 11.45-11.45 sec 0.00 Bytes 0.00 bits/sec [ 4] 11.45-11.45
> sec 0.00 Bytes 0.00 bits/sec [ 4] 11.45-11.45 sec 0.00 Bytes
> 0.00 bits/sec
> [ 4] 11.45-12.51 sec 16.0 MBytes 126 Mbits/sec
> [ 4] 12.51-13.59 sec 20.3 MBytes 158 Mbits/sec
> [ 4] 13.59-14.65 sec 13.4 MBytes 107 Mbits/sec
> [ 4] 14.65-16.79 sec 33.3 MBytes 130 Mbits/sec
> [ 4] 16.79-16.79 sec 0.00 Bytes 0.00 bits/sec [ 4] 16.79-17.82
> sec 5.94 MBytes 48.7 Mbits/sec
> (etc)
>
> Note: the difference appears to be SCTP-specific, as I get exactly the
> same TCP
> throughput in both kernels.
Hmm, okay. :/ Could you further bisect on your side to narrow down from which kernel onwards this behaviour can be seen?
Thanks,
Daniel
> -----Original Message-----
> From: Daniel Borkmann [mailto:dborkman@redhat.com]
> Sent: April-11-14 11:22 AM
> To: Butler, Peter
> Cc: Vlad Yasevich; linux-sctp@vger.kernel.org
> Subject: Re: Is SCTP throughput really this low compared to TCP?
>
> On 04/11/2014 05:07 PM, Butler, Peter wrote:
>> Yes indeed this is ixgbe and I do have SCTP checksum offloading
>> enabled. (For what
> > it's worth, the checksum offload gives about a 20% throughput gain
> - but this is, > of course, already included in the numbers I posted
> to this thread as I've been using > the CRC offload all along.)
>
> Ok, understood.
>
>> I re-did all the tests with TSO/GSO/LRO/GRO disabled (on both sides
>> of the association -
> > i.e. on both endpoint nodes), and using 1452-byte messages instead of 1000-byte > messages. With this new setup, the TCP performance drops significantly, as expected, > while the SCTP performance is boosted, and the playing field is somewhat more 'level'.
> > (Note that I could not use 1464-byte messages as suggested by
> Vlad, as anything above > 1452 cut the SCTP performance in half -
> must have hit the segmentation limit at this > slightly lower message
> size. MTU is 1500.)
>>
>> So comparing "apples to apples" now, TCP only out-performs SCTP by
>> approximately 40-70%
> > over the various range of network latencies I tested with (RTTs of 0.2 ms, 10 ms, 20 ms, > and 50 ms). 40-70% is still significant, but nowhere near the 200% better (i.e. 3 > times the throughput) I was getting before.
>>
>> Does this value (i.e. 40-70%) sound reasonable? Is this the
>> more-or-less accepted
> > performance difference with the current LKSCTP implementation?
>
> Yes, that sounds reasonable to me. There are still a lot of open todos in terms of performance that we need to tackle over time, e.g. the way chunks are handled, imho, and copies involved in fast path, also that we're heavily using atomic reference counting, and other open issues.
>
>> Also, for what it's worth, I get better SCTP throughput numbers with
>> the older kernel
> > (3.4.2) than with the newer kernel (3.14)...
>
> Interesting, a lot of things happened in between; were you able to bisect/identify a possible commit that causes this? How big is the difference?
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* RE: Is SCTP throughput really this low compared to TCP?
2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
` (23 preceding siblings ...)
2014-04-14 14:52 ` Butler, Peter
@ 2014-04-14 15:49 ` Butler, Peter
2014-04-14 16:43 ` Butler, Peter
` (10 subsequent siblings)
35 siblings, 0 replies; 37+ messages in thread
From: Butler, Peter @ 2014-04-14 15:49 UTC (permalink / raw)
To: linux-sctp
Here are some perf numbers. Note that these were obtained with operf/opreport. Only the top 20 or so lines are shown here.
Identical load test performed on 3.4.2 and 3.14.
3.4.2:
CPU: Intel Core/i7, speed 2.134e+06 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000
samples cumulative samples % cumulative % linenr info image name symbol name
130002 130002 4.9435 4.9435 copy_user_64.S:240 vmlinux copy_user_generic_string
125955 255957 4.7896 9.7330 memcpy_64.S:59 vmlinux memcpy
71026 326983 2.7008 12.4339 spinlock.c:136 vmlinux _raw_spin_lock
57138 384121 2.1727 14.6066 slub.c:409 vmlinux cmpxchg_double_slab
56559 440680 2.1507 16.7573 slub.c:2208 vmlinux __slab_alloc
53058 493738 2.0176 18.7749 ixgbe_main.c:2952 ixgbe.ko ixgbe_poll
51541 545279 1.9599 20.7348 slub.c:2439 vmlinux __slab_free
49916 595195 1.8981 22.6329 ip_tables.c:294 vmlinux ipt_do_table
42406 637601 1.6125 24.2455 ixgbe_main.c:7824 ixgbe.ko ixgbe_xmit_frame_ring
40929 678530 1.5564 25.8018 slub.c:3463 vmlinux kfree
40349 718879 1.5343 27.3361 core.c:132 vmlinux nf_iterate
35521 754400 1.3507 28.6869 output.c:347 sctp.ko sctp_packet_transmit
34071 788471 1.2956 29.9825 outqueue.c:1342 sctp.ko sctp_check_transmitted
33962 822433 1.2914 31.2739 slub.c:2601 vmlinux kmem_cache_free
33450 855883 1.2720 32.5459 outqueue.c:735 sctp.ko sctp_outq_flush
33005 888888 1.2551 33.8009 skbuff.c:172 vmlinux __alloc_skb
29231 918119 1.1115 34.9125 socket.c:1565 sctp.ko sctp_sendmsg
27950 946069 1.0628 35.9753 (no location information) libc-2.14.90.so __memmove_ssse3_back
26718 972787 1.0160 36.9913 (no location information) nf_conntrack.ko nf_conntrack_in
26589 999376 1.0111 38.0023 slub.c:4049 vmlinux __kmalloc_node_track_caller
26449 1025825 1.0058 39.0081 slub.c:2375 vmlinux kmem_cache_alloc
26211 1052036 0.9967 40.0048 sm_sideeffect.c:1074 sctp.ko sctp_do_sm
25527 1077563 0.9707 40.9755 slub.c:2404 vmlinux kmem_cache_alloc_node
23970 1101533 0.9115 41.8870 (no location information) libc-2.14.90.so _int_free
23266 1124799 0.8847 42.7717 memset_64.S:62 vmlinux memset
22976 1147775 0.8737 43.6454 (no location information) nf_conntrack.ko hash_conntrack_raw
21855 1169630 0.8311 44.4764 chunk.c:175 sctp.ko sctp_datamsg_from_user
21730 1191360 0.8263 45.3027 list_debug.c:24 vmlinux __list_add
21252 1212612 0.8081 46.1109 dev.c:3151 vmlinux __netif_receive_skb
20742 1233354 0.7887 46.8996 (no location information) libc-2.14.90.so _int_malloc
19955 1253309 0.7588 47.6584 input.c:130 sctp.ko sctp_rcv
3.14:
CPU: Intel Core/i7, speed 2.134e+06 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000
samples cumulative samples % cumulative % linenr info image name symbol name
168021 168021 6.1446 6.1446 copy_user_64.S:183 vmlinux-3.14.0 copy_user_generic_string
85199 253220 3.1158 9.2604 memcpy_64.S:59 vmlinux-3.14.0 memcpy
80133 333353 2.9305 12.1909 spinlock.c:174 vmlinux-3.14.0 _raw_spin_lock_bh
74086 407439 2.7094 14.9003 spinlock.c:150 vmlinux-3.14.0 _raw_spin_lock
51878 459317 1.8972 16.7975 ixgbe_main.c:6930 ixgbe.ko ixgbe_xmit_frame_ring
49354 508671 1.8049 18.6024 slub.c:2538 vmlinux-3.14.0 __slab_free
39103 547774 1.4300 20.0324 outqueue.c:706 sctp.ko sctp_outq_flush
37775 585549 1.3815 21.4139 outqueue.c:1304 sctp.ko sctp_check_transmitted
37514 623063 1.3719 22.7858 output.c:380 sctp.ko sctp_packet_transmit
37320 660383 1.3648 24.1506 slub.c:2700 vmlinux-3.14.0 kmem_cache_free
36147 696530 1.3219 25.4725 ip_tables.c:294 vmlinux-3.14.0 ipt_do_table
35494 732024 1.2980 26.7705 sm_sideeffect.c:1100 sctp.ko sctp_do_sm
35452 767476 1.2965 28.0670 core.c:135 vmlinux-3.14.0 nf_iterate
34697 802173 1.2689 29.3359 slub.c:2281 vmlinux-3.14.0 __slab_alloc
33890 836063 1.2394 30.5753 slub.c:415 vmlinux-3.14.0 cmpxchg_double_slab
33566 869629 1.2275 31.8028 (no location information) libc-2.14.90.so _int_free
33228 902857 1.2152 33.0180 socket.c:1590 sctp.ko sctp_sendmsg
32774 935631 1.1986 34.2166 slub.c:3381 vmlinux-3.14.0 kfree
30359 965990 1.1102 35.3268 (no location information) libc-2.14.90.so __memmove_ssse3_back
28905 994895 1.0571 36.3839 list_debug.c:25 vmlinux-3.14.0 __list_add
25888 1020783 0.9467 37.3306 skbuff.c:199 vmlinux-3.14.0 __alloc_skb
25490 1046273 0.9322 38.2628 fib_trie.c:1399 vmlinux-3.14.0 fib_table_lookup
25232 1071505 0.9227 39.1856 nf_conntrack_core.c:376 nf_conntrack.ko __nf_conntrack_find_get
24114 1095619 0.8819 40.0674 chunk.c:168 sctp.ko sctp_datamsg_from_user
24067 1119686 0.8801 40.9476 ixgbe_main.c:2020 ixgbe.ko ixgbe_clean_rx_irq
23972 1143658 0.8767 41.8242 (no location information) libc-2.14.90.so _int_malloc
22117 1165775 0.8088 42.6331 ip_output.c:215 vmlinux-3.14.0 ip_finish_output
22037 1187812 0.8059 43.4390 slub.c:3854 vmlinux-3.14.0 __kmalloc_node_track_caller
21847 1209659 0.7990 44.2379 dev.c:2546 vmlinux-3.14.0 dev_hard_start_xmit
21564 1231223 0.7886 45.0265 slub.c:2481 vmlinux-3.14.0 kmem_cache_alloc
20659 1251882 0.7555 45.7820 socket.c:2049 sctp.ko sctp_recvmsg
-----Original Message-----
From: Vlad Yasevich [mailto:vyasevich@gmail.com]
Sent: April-11-14 11:28 AM
To: Butler, Peter; Daniel Borkmann
Cc: linux-sctp@vger.kernel.org
Subject: Re: Is SCTP throughput really this low compared to TCP?
On 04/11/2014 11:07 AM, Butler, Peter wrote:
> Yes indeed this is ixgbe and I do have SCTP checksum offloading
> enabled. (For what it's worth, the checksum offload gives about a 20%
> throughput gain - but this is, of course, already included in the
> numbers I posted to this thread as I've been using the CRC offload all
> along.)
>
> I re-did all the tests with TSO/GSO/LRO/GRO disabled (on both sides of the association - i.e. on both endpoint nodes), and using 1452-byte messages instead of 1000-byte messages. With this new setup, the TCP performance drops significantly, as expected, while the SCTP performance is boosted, and the playing field is somewhat more 'level'. (Note that I could not use 1464-byte messages as suggested by Vlad, as anything above 1452 cut the SCTP performance in half - must have hit the segmentation limit at this slightly lower message size. MTU is 1500.)
>
> So comparing "apples to apples" now, TCP only out-performs SCTP by approximately 40-70% over the various range of network latencies I tested with (RTTs of 0.2 ms, 10 ms, 20 ms, and 50 ms). 40-70% is still significant, but nowhere near the 200% better (i.e. 3 times the throughput) I was getting before.
>
> Does this value (i.e. 40-70%) sound reasonable?
This still looks high. Could you run 'perf record -a' and 'perf report'
to see where we are spending all of our time in sctp.
My guess is that a lot of it is going to be in memcpy(), but I am curious.
> Is this the more-or-less accepted performance difference with the current LKSCTP implementation?
>
> Also, for what it's worth, I get better SCTP throughput numbers with the older kernel (3.4.2) than with the newer kernel (3.14)...
>
That's interesting. I'll have to look at see what might have changed here.
-vlad
>
>
> -----Original Message-----
> From: Daniel Borkmann [mailto:dborkman@redhat.com]
> Sent: April-11-14 3:43 AM
> To: Vlad Yasevich
> Cc: Butler, Peter; linux-sctp@vger.kernel.org
> Subject: Re: Is SCTP throughput really this low compared to TCP?
>
> Hi Peter,
>
> On 04/10/2014 10:21 PM, Vlad Yasevich wrote:
>> On 04/10/2014 03:12 PM, Butler, Peter wrote:
>>> I've been testing SCTP throughput between two nodes over a 10Gb-Ethernet backplane, and am finding that at best, its throughput is about a third of that of TCP. Is this number generally accepted for current LKSCTP performance?
>>>
>>> All TCP/SCTP tests performed with 1000-byte (payload) messages, between 8-core Xeon nodes @ 2.13GHz, with no CPU throttling (always running at 100%) on otherwise idle systems. Test applications include netperf, iperf and proprietary in-house stubs.
>>>
>>> The latency between nodes is generally 0.2 ms. Tests were run using this low-latency scenario, as well as using traffic control (tc) to simulate networks with 10 ms, 20 ms and 50 ms latency (i.e. 20 ms, 40 ms and 100 ms RTT, respectively).
>>>
>>> In addition, each of these network scenarios were tested using various kernel socket buffer sizes, ranging from the default kernel size (100-200 kB), to several MB for send and receive buffers, and multiple send:receive ratios for these buffer sizes (generally using larger receive buffer sizes, up to a factor of about 6).
>>>
>>> Finally, tests were performed on kernels as old as 3.4.2 and as recent as 3.14.
>>>
>>> The TCP throughput is about 3x higher than that of SCTP as a best-case scenario (i.e. from an SCTP perspective), and much higher still in worst-case scenarios.
>>
>> To do a more of apples-to-apples comparison, you need to disable
>> tso/gso on the sending node.
>>
>> The reason is that even if you limit buffer sizes, tcp will still try
>> to do tso on the transmit size, thus coalescing you 1000-byte
>> messages into something much larger, thus utilizing your MTU much more efficiently.
>>
>> SCTP, on the other hand, has to preserve message boundaries which
>> results in sub-optimal mtu utilization when using 1000-byte payloads.
>>
>> My recommendation is to use 1464 byte message for SCTP on a 1500 byte
>> MTU nic.
>>
>> I would be interested to see the results. There could very well be issues.
>
> Agreed.
>
> Also, what NIC are you using? It seems only Intel provides SCTP checksum offloading so far, i.e. ixgbe/i40e NICs.
>
>> -vlad
^ permalink raw reply [flat|nested] 37+ messages in thread
* RE: Is SCTP throughput really this low compared to TCP?
2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
` (24 preceding siblings ...)
2014-04-14 15:49 ` Butler, Peter
@ 2014-04-14 16:43 ` Butler, Peter
2014-04-14 16:45 ` Daniel Borkmann
` (9 subsequent siblings)
35 siblings, 0 replies; 37+ messages in thread
From: Butler, Peter @ 2014-04-14 16:43 UTC (permalink / raw)
To: linux-sctp
With the git revert command applied, the SCTP+IPv4 performance on 3.14.0 is back to normal! Same throughput as 3.4.2 and steady and smooth:
[root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.240.3 -V -l 1452 -t 60
iperf version 3.0.1 (10 January 2014)
Linux Lab200slot2 3.14.0+ #1 SMP Mon Apr 14 12:06:40 EDT 2014 x86_64
Time: Mon, 14 Apr 2014 16:40:48 GMT
Connecting to host 192.168.240.3, port 5201
Cookie: Lab200slot2.1397493648.413274.65e131
[ 4] local 192.168.240.2 port 50548 connected to 192.168.240.3 port 5201
Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-1.00 sec 240 MBytes 2.02 Gbits/sec
[ 4] 1.00-2.00 sec 239 MBytes 2.01 Gbits/sec
[ 4] 2.00-3.00 sec 240 MBytes 2.01 Gbits/sec
[ 4] 3.00-4.00 sec 239 MBytes 2.00 Gbits/sec
[ 4] 4.00-5.00 sec 245 MBytes 2.05 Gbits/sec
[ 4] 5.00-6.00 sec 240 MBytes 2.01 Gbits/sec
[ 4] 6.00-7.00 sec 240 MBytes 2.02 Gbits/sec
[ 4] 7.00-8.00 sec 239 MBytes 2.01 Gbits/sec
-----Original Message-----
From: Daniel Borkmann [mailto:dborkman@redhat.com]
Sent: April-11-14 7:58 PM
To: Butler, Peter
Cc: Vlad Yasevich; linux-sctp@vger.kernel.org
Subject: Re: Is SCTP throughput really this low compared to TCP?
On 04/11/2014 10:18 PM, Butler, Peter wrote:
> It may take a little time to do the bisection. Can you provide me
> with a
> guesstimate as to which kernel(s) may have introduced this behaviour?
Just out of curiosity, could you do a ...
git revert ef2820a735f74ea60335f8ba3801b844f0cb184d
... on your tree and try to see what you get with that?
> -----Original Message-----
> From: Daniel Borkmann [mailto:dborkman@redhat.com]
> Sent: April-11-14 2:40 PM
> To: Butler, Peter
> Cc: Vlad Yasevich; linux-sctp@vger.kernel.org
> Subject: Re: Is SCTP throughput really this low compared to TCP?
>
> On 04/11/2014 08:22 PM, Butler, Peter wrote:
>> The difference between 3.14 and 3.4.2 is staggering. An order of
>> magnitude or so. For example,
> > using the precisely same setup as before, whereas I get about 2.1 Gbps throughput with 3.4 2, I > can only manage between 70-150 Mbps with 3.14 - a staggering difference.
>>
>> Moreover, the SCTP throughput seems to 'choke' itself with 3.14, such
>> that it is always trying to
> > recover. For example, with 3.4.2 the 2.1 Gbps throughput is quite consistent from one second to > the next (as you would expect):
>>
>> [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t
>> 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2
>> 3.4.2-1.fc16.x86_64 #1 SMP Thu Jun 14 20:17:26 UTC 2012 x86_64
>> Time: Fri, 11 Apr 2014 18:19:15 GMT
>> Connecting to host 192.168.241.3, port 5201
>> Cookie: Lab200slot2.1397240355.069035.0d5b0f
>> [ 4] local 192.168.241.2 port 56030 connected to 192.168.241.3 port
>> 5201 Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
>> [ ID] Interval Transfer Bandwidth
>> [ 4] 0.00-1.00 sec 255 MBytes 2.14 Gbits/sec
>> [ 4] 1.00-2.00 sec 253 MBytes 2.12 Gbits/sec
>> [ 4] 2.00-3.00 sec 255 MBytes 2.14 Gbits/sec
>> [ 4] 3.00-4.00 sec 255 MBytes 2.14 Gbits/sec
>> [ 4] 4.00-5.00 sec 255 MBytes 2.14 Gbits/sec
>> [ 4] 5.00-6.00 sec 257 MBytes 2.15 Gbits/sec
>> [ 4] 6.00-7.00 sec 253 MBytes 2.13 Gbits/sec
>> [ 4] 7.00-8.00 sec 254 MBytes 2.13 Gbits/sec
>> [ 4] 8.00-9.00 sec 255 MBytes 2.14 Gbits/sec
>> [ 4] 9.00-10.00 sec 252 MBytes 2.12 Gbits/sec
>> (etc)
>>
>> but with 3.14 the numbers as all over the place:
>>
>> [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t
>> 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2 3.14.0 #1
>> SMP Thu Apr 3 23:18:29 EDT 2014 x86_64
>> Time: Fri, 11 Apr 2014 17:56:21 GMT
>> Connecting to host 192.168.241.3, port 5201
>> Cookie: Lab200slot2.1397238981.812898.548918
>> [ 4] local 192.168.241.2 port 38616 connected to 192.168.241.3 port
>> 5201 Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
>> [ ID] Interval Transfer Bandwidth
>> [ 4] 0.00-1.09 sec 20.8 MBytes 161 Mbits/sec
>> [ 4] 1.09-2.13 sec 10.8 MBytes 86.8 Mbits/sec
>> [ 4] 2.13-3.15 sec 3.57 MBytes 29.5 Mbits/sec
>> [ 4] 3.15-4.16 sec 4.33 MBytes 35.7 Mbits/sec
>> [ 4] 4.16-6.21 sec 10.4 MBytes 42.7 Mbits/sec
>> [ 4] 6.21-6.21 sec 0.00 Bytes 0.00 bits/sec
>> [ 4] 6.21-7.35 sec 34.6 MBytes 253 Mbits/sec
>> [ 4] 7.35-11.45 sec 22.0 MBytes 45.0 Mbits/sec
>> [ 4] 11.45-11.45 sec 0.00 Bytes 0.00 bits/sec [ 4] 11.45-11.45
>> sec 0.00 Bytes 0.00 bits/sec [ 4] 11.45-11.45 sec 0.00 Bytes
>> 0.00 bits/sec
>> [ 4] 11.45-12.51 sec 16.0 MBytes 126 Mbits/sec
>> [ 4] 12.51-13.59 sec 20.3 MBytes 158 Mbits/sec
>> [ 4] 13.59-14.65 sec 13.4 MBytes 107 Mbits/sec
>> [ 4] 14.65-16.79 sec 33.3 MBytes 130 Mbits/sec
>> [ 4] 16.79-16.79 sec 0.00 Bytes 0.00 bits/sec [ 4] 16.79-17.82
>> sec 5.94 MBytes 48.7 Mbits/sec
>> (etc)
>>
>> Note: the difference appears to be SCTP-specific, as I get exactly
>> the same TCP
> > throughput in both kernels.
>
> Hmm, okay. :/ Could you further bisect on your side to narrow down from which kernel onwards this behaviour can be seen?
>
> Thanks,
>
> Daniel
>
>> -----Original Message-----
>> From: Daniel Borkmann [mailto:dborkman@redhat.com]
>> Sent: April-11-14 11:22 AM
>> To: Butler, Peter
>> Cc: Vlad Yasevich; linux-sctp@vger.kernel.org
>> Subject: Re: Is SCTP throughput really this low compared to TCP?
>>
>> On 04/11/2014 05:07 PM, Butler, Peter wrote:
>>> Yes indeed this is ixgbe and I do have SCTP checksum offloading
>>> enabled. (For what
>> > it's worth, the checksum offload gives about a 20% throughput
>> gain
>> - but this is, > of course, already included in the numbers I posted
>> to this thread as I've been using > the CRC offload all along.)
>>
>> Ok, understood.
>>
>>> I re-did all the tests with TSO/GSO/LRO/GRO disabled (on both sides
>>> of the association -
>> > i.e. on both endpoint nodes), and using 1452-byte messages instead of 1000-byte > messages. With this new setup, the TCP performance drops significantly, as expected, > while the SCTP performance is boosted, and the playing field is somewhat more 'level'.
>> > (Note that I could not use 1464-byte messages as suggested by
>> Vlad, as anything above > 1452 cut the SCTP performance in half -
>> must have hit the segmentation limit at this > slightly lower
>> message size. MTU is 1500.)
>>>
>>> So comparing "apples to apples" now, TCP only out-performs SCTP by
>>> approximately 40-70%
>> > over the various range of network latencies I tested with (RTTs of 0.2 ms, 10 ms, 20 ms, > and 50 ms). 40-70% is still significant, but nowhere near the 200% better (i.e. 3 > times the throughput) I was getting before.
>>>
>>> Does this value (i.e. 40-70%) sound reasonable? Is this the
>>> more-or-less accepted
>> > performance difference with the current LKSCTP implementation?
>>
>> Yes, that sounds reasonable to me. There are still a lot of open todos in terms of performance that we need to tackle over time, e.g. the way chunks are handled, imho, and copies involved in fast path, also that we're heavily using atomic reference counting, and other open issues.
>>
>>> Also, for what it's worth, I get better SCTP throughput numbers with
>>> the older kernel
>> > (3.4.2) than with the newer kernel (3.14)...
>>
>> Interesting, a lot of things happened in between; were you able to bisect/identify a possible commit that causes this? How big is the difference?
>>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Is SCTP throughput really this low compared to TCP?
2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
` (25 preceding siblings ...)
2014-04-14 16:43 ` Butler, Peter
@ 2014-04-14 16:45 ` Daniel Borkmann
2014-04-14 16:47 ` Butler, Peter
` (8 subsequent siblings)
35 siblings, 0 replies; 37+ messages in thread
From: Daniel Borkmann @ 2014-04-14 16:45 UTC (permalink / raw)
To: linux-sctp
Ok, thanks! I'll send out a revert for now today as this is otherwise
catastrophic ... when that is done, we/the developer from NSN can still
think about his patch and how to solve that differently.
On 04/14/2014 06:43 PM, Butler, Peter wrote:
> With the git revert command applied, the SCTP+IPv4 performance on 3.14.0 is back to normal! Same throughput as 3.4.2 and steady and smooth:
>
> [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.240.3 -V -l 1452 -t 60
> iperf version 3.0.1 (10 January 2014)
> Linux Lab200slot2 3.14.0+ #1 SMP Mon Apr 14 12:06:40 EDT 2014 x86_64
> Time: Mon, 14 Apr 2014 16:40:48 GMT
> Connecting to host 192.168.240.3, port 5201
> Cookie: Lab200slot2.1397493648.413274.65e131
> [ 4] local 192.168.240.2 port 50548 connected to 192.168.240.3 port 5201
> Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
> [ ID] Interval Transfer Bandwidth
> [ 4] 0.00-1.00 sec 240 MBytes 2.02 Gbits/sec
> [ 4] 1.00-2.00 sec 239 MBytes 2.01 Gbits/sec
> [ 4] 2.00-3.00 sec 240 MBytes 2.01 Gbits/sec
> [ 4] 3.00-4.00 sec 239 MBytes 2.00 Gbits/sec
> [ 4] 4.00-5.00 sec 245 MBytes 2.05 Gbits/sec
> [ 4] 5.00-6.00 sec 240 MBytes 2.01 Gbits/sec
> [ 4] 6.00-7.00 sec 240 MBytes 2.02 Gbits/sec
> [ 4] 7.00-8.00 sec 239 MBytes 2.01 Gbits/sec
>
>
>
>
>
>
> -----Original Message-----
> From: Daniel Borkmann [mailto:dborkman@redhat.com]
> Sent: April-11-14 7:58 PM
> To: Butler, Peter
> Cc: Vlad Yasevich; linux-sctp@vger.kernel.org
> Subject: Re: Is SCTP throughput really this low compared to TCP?
>
> On 04/11/2014 10:18 PM, Butler, Peter wrote:
>> It may take a little time to do the bisection. Can you provide me
>> with a
> > guesstimate as to which kernel(s) may have introduced this behaviour?
>
> Just out of curiosity, could you do a ...
>
> git revert ef2820a735f74ea60335f8ba3801b844f0cb184d
>
> ... on your tree and try to see what you get with that?
>
>> -----Original Message-----
>> From: Daniel Borkmann [mailto:dborkman@redhat.com]
>> Sent: April-11-14 2:40 PM
>> To: Butler, Peter
>> Cc: Vlad Yasevich; linux-sctp@vger.kernel.org
>> Subject: Re: Is SCTP throughput really this low compared to TCP?
>>
>> On 04/11/2014 08:22 PM, Butler, Peter wrote:
>>> The difference between 3.14 and 3.4.2 is staggering. An order of
>>> magnitude or so. For example,
>> > using the precisely same setup as before, whereas I get about 2.1 Gbps throughput with 3.4 2, I > can only manage between 70-150 Mbps with 3.14 - a staggering difference.
>>>
>>> Moreover, the SCTP throughput seems to 'choke' itself with 3.14, such
>>> that it is always trying to
>> > recover. For example, with 3.4.2 the 2.1 Gbps throughput is quite consistent from one second to > the next (as you would expect):
>>>
>>> [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t
>>> 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2
>>> 3.4.2-1.fc16.x86_64 #1 SMP Thu Jun 14 20:17:26 UTC 2012 x86_64
>>> Time: Fri, 11 Apr 2014 18:19:15 GMT
>>> Connecting to host 192.168.241.3, port 5201
>>> Cookie: Lab200slot2.1397240355.069035.0d5b0f
>>> [ 4] local 192.168.241.2 port 56030 connected to 192.168.241.3 port
>>> 5201 Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
>>> [ ID] Interval Transfer Bandwidth
>>> [ 4] 0.00-1.00 sec 255 MBytes 2.14 Gbits/sec
>>> [ 4] 1.00-2.00 sec 253 MBytes 2.12 Gbits/sec
>>> [ 4] 2.00-3.00 sec 255 MBytes 2.14 Gbits/sec
>>> [ 4] 3.00-4.00 sec 255 MBytes 2.14 Gbits/sec
>>> [ 4] 4.00-5.00 sec 255 MBytes 2.14 Gbits/sec
>>> [ 4] 5.00-6.00 sec 257 MBytes 2.15 Gbits/sec
>>> [ 4] 6.00-7.00 sec 253 MBytes 2.13 Gbits/sec
>>> [ 4] 7.00-8.00 sec 254 MBytes 2.13 Gbits/sec
>>> [ 4] 8.00-9.00 sec 255 MBytes 2.14 Gbits/sec
>>> [ 4] 9.00-10.00 sec 252 MBytes 2.12 Gbits/sec
>>> (etc)
>>>
>>> but with 3.14 the numbers as all over the place:
>>>
>>> [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t
>>> 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2 3.14.0 #1
>>> SMP Thu Apr 3 23:18:29 EDT 2014 x86_64
>>> Time: Fri, 11 Apr 2014 17:56:21 GMT
>>> Connecting to host 192.168.241.3, port 5201
>>> Cookie: Lab200slot2.1397238981.812898.548918
>>> [ 4] local 192.168.241.2 port 38616 connected to 192.168.241.3 port
>>> 5201 Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
>>> [ ID] Interval Transfer Bandwidth
>>> [ 4] 0.00-1.09 sec 20.8 MBytes 161 Mbits/sec
>>> [ 4] 1.09-2.13 sec 10.8 MBytes 86.8 Mbits/sec
>>> [ 4] 2.13-3.15 sec 3.57 MBytes 29.5 Mbits/sec
>>> [ 4] 3.15-4.16 sec 4.33 MBytes 35.7 Mbits/sec
>>> [ 4] 4.16-6.21 sec 10.4 MBytes 42.7 Mbits/sec
>>> [ 4] 6.21-6.21 sec 0.00 Bytes 0.00 bits/sec
>>> [ 4] 6.21-7.35 sec 34.6 MBytes 253 Mbits/sec
>>> [ 4] 7.35-11.45 sec 22.0 MBytes 45.0 Mbits/sec
>>> [ 4] 11.45-11.45 sec 0.00 Bytes 0.00 bits/sec [ 4] 11.45-11.45
>>> sec 0.00 Bytes 0.00 bits/sec [ 4] 11.45-11.45 sec 0.00 Bytes
>>> 0.00 bits/sec
>>> [ 4] 11.45-12.51 sec 16.0 MBytes 126 Mbits/sec
>>> [ 4] 12.51-13.59 sec 20.3 MBytes 158 Mbits/sec
>>> [ 4] 13.59-14.65 sec 13.4 MBytes 107 Mbits/sec
>>> [ 4] 14.65-16.79 sec 33.3 MBytes 130 Mbits/sec
>>> [ 4] 16.79-16.79 sec 0.00 Bytes 0.00 bits/sec [ 4] 16.79-17.82
>>> sec 5.94 MBytes 48.7 Mbits/sec
>>> (etc)
>>>
>>> Note: the difference appears to be SCTP-specific, as I get exactly
>>> the same TCP
>> > throughput in both kernels.
>>
>> Hmm, okay. :/ Could you further bisect on your side to narrow down from which kernel onwards this behaviour can be seen?
>>
>> Thanks,
>>
>> Daniel
>>
>>> -----Original Message-----
>>> From: Daniel Borkmann [mailto:dborkman@redhat.com]
>>> Sent: April-11-14 11:22 AM
>>> To: Butler, Peter
>>> Cc: Vlad Yasevich; linux-sctp@vger.kernel.org
>>> Subject: Re: Is SCTP throughput really this low compared to TCP?
>>>
>>> On 04/11/2014 05:07 PM, Butler, Peter wrote:
>>>> Yes indeed this is ixgbe and I do have SCTP checksum offloading
>>>> enabled. (For what
>>> > it's worth, the checksum offload gives about a 20% throughput
>>> gain
>>> - but this is, > of course, already included in the numbers I posted
>>> to this thread as I've been using > the CRC offload all along.)
>>>
>>> Ok, understood.
>>>
>>>> I re-did all the tests with TSO/GSO/LRO/GRO disabled (on both sides
>>>> of the association -
>>> > i.e. on both endpoint nodes), and using 1452-byte messages instead of 1000-byte > messages. With this new setup, the TCP performance drops significantly, as expected, > while the SCTP performance is boosted, and the playing field is somewhat more 'level'.
>>> > (Note that I could not use 1464-byte messages as suggested by
>>> Vlad, as anything above > 1452 cut the SCTP performance in half -
>>> must have hit the segmentation limit at this > slightly lower
>>> message size. MTU is 1500.)
>>>>
>>>> So comparing "apples to apples" now, TCP only out-performs SCTP by
>>>> approximately 40-70%
>>> > over the various range of network latencies I tested with (RTTs of 0.2 ms, 10 ms, 20 ms, > and 50 ms). 40-70% is still significant, but nowhere near the 200% better (i.e. 3 > times the throughput) I was getting before.
>>>>
>>>> Does this value (i.e. 40-70%) sound reasonable? Is this the
>>>> more-or-less accepted
>>> > performance difference with the current LKSCTP implementation?
>>>
>>> Yes, that sounds reasonable to me. There are still a lot of open todos in terms of performance that we need to tackle over time, e.g. the way chunks are handled, imho, and copies involved in fast path, also that we're heavily using atomic reference counting, and other open issues.
>>>
>>>> Also, for what it's worth, I get better SCTP throughput numbers with
>>>> the older kernel
>>> > (3.4.2) than with the newer kernel (3.14)...
>>>
>>> Interesting, a lot of things happened in between; were you able to bisect/identify a possible commit that causes this? How big is the difference?
>>>
^ permalink raw reply [flat|nested] 37+ messages in thread
* RE: Is SCTP throughput really this low compared to TCP?
2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
` (26 preceding siblings ...)
2014-04-14 16:45 ` Daniel Borkmann
@ 2014-04-14 16:47 ` Butler, Peter
2014-04-14 17:06 ` Butler, Peter
` (7 subsequent siblings)
35 siblings, 0 replies; 37+ messages in thread
From: Butler, Peter @ 2014-04-14 16:47 UTC (permalink / raw)
To: linux-sctp
Glad to be of help :o)
-----Original Message-----
From: Daniel Borkmann [mailto:dborkman@redhat.com]
Sent: April-14-14 12:46 PM
To: Butler, Peter
Cc: Vlad Yasevich; linux-sctp@vger.kernel.org
Subject: Re: Is SCTP throughput really this low compared to TCP?
Ok, thanks! I'll send out a revert for now today as this is otherwise catastrophic ... when that is done, we/the developer from NSN can still think about his patch and how to solve that differently.
On 04/14/2014 06:43 PM, Butler, Peter wrote:
> With the git revert command applied, the SCTP+IPv4 performance on 3.14.0 is back to normal! Same throughput as 3.4.2 and steady and smooth:
>
> [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.240.3 -V -l 1452 -t
> 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2 3.14.0+ #1
> SMP Mon Apr 14 12:06:40 EDT 2014 x86_64
> Time: Mon, 14 Apr 2014 16:40:48 GMT
> Connecting to host 192.168.240.3, port 5201
> Cookie: Lab200slot2.1397493648.413274.65e131
> [ 4] local 192.168.240.2 port 50548 connected to 192.168.240.3 port
> 5201 Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
> [ ID] Interval Transfer Bandwidth
> [ 4] 0.00-1.00 sec 240 MBytes 2.02 Gbits/sec
> [ 4] 1.00-2.00 sec 239 MBytes 2.01 Gbits/sec
> [ 4] 2.00-3.00 sec 240 MBytes 2.01 Gbits/sec
> [ 4] 3.00-4.00 sec 239 MBytes 2.00 Gbits/sec
> [ 4] 4.00-5.00 sec 245 MBytes 2.05 Gbits/sec
> [ 4] 5.00-6.00 sec 240 MBytes 2.01 Gbits/sec
> [ 4] 6.00-7.00 sec 240 MBytes 2.02 Gbits/sec
> [ 4] 7.00-8.00 sec 239 MBytes 2.01 Gbits/sec
>
>
>
>
>
>
> -----Original Message-----
> From: Daniel Borkmann [mailto:dborkman@redhat.com]
> Sent: April-11-14 7:58 PM
> To: Butler, Peter
> Cc: Vlad Yasevich; linux-sctp@vger.kernel.org
> Subject: Re: Is SCTP throughput really this low compared to TCP?
>
> On 04/11/2014 10:18 PM, Butler, Peter wrote:
>> It may take a little time to do the bisection. Can you provide me
>> with a
> > guesstimate as to which kernel(s) may have introduced this behaviour?
>
> Just out of curiosity, could you do a ...
>
> git revert ef2820a735f74ea60335f8ba3801b844f0cb184d
>
> ... on your tree and try to see what you get with that?
>
>> -----Original Message-----
>> From: Daniel Borkmann [mailto:dborkman@redhat.com]
>> Sent: April-11-14 2:40 PM
>> To: Butler, Peter
>> Cc: Vlad Yasevich; linux-sctp@vger.kernel.org
>> Subject: Re: Is SCTP throughput really this low compared to TCP?
>>
>> On 04/11/2014 08:22 PM, Butler, Peter wrote:
>>> The difference between 3.14 and 3.4.2 is staggering. An order of
>>> magnitude or so. For example,
>> > using the precisely same setup as before, whereas I get about 2.1 Gbps throughput with 3.4 2, I > can only manage between 70-150 Mbps with 3.14 - a staggering difference.
>>>
>>> Moreover, the SCTP throughput seems to 'choke' itself with 3.14,
>>> such that it is always trying to
>> > recover. For example, with 3.4.2 the 2.1 Gbps throughput is quite consistent from one second to > the next (as you would expect):
>>>
>>> [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452
>>> -t
>>> 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2
>>> 3.4.2-1.fc16.x86_64 #1 SMP Thu Jun 14 20:17:26 UTC 2012 x86_64
>>> Time: Fri, 11 Apr 2014 18:19:15 GMT
>>> Connecting to host 192.168.241.3, port 5201
>>> Cookie: Lab200slot2.1397240355.069035.0d5b0f
>>> [ 4] local 192.168.241.2 port 56030 connected to 192.168.241.3 port
>>> 5201 Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
>>> [ ID] Interval Transfer Bandwidth
>>> [ 4] 0.00-1.00 sec 255 MBytes 2.14 Gbits/sec
>>> [ 4] 1.00-2.00 sec 253 MBytes 2.12 Gbits/sec
>>> [ 4] 2.00-3.00 sec 255 MBytes 2.14 Gbits/sec
>>> [ 4] 3.00-4.00 sec 255 MBytes 2.14 Gbits/sec
>>> [ 4] 4.00-5.00 sec 255 MBytes 2.14 Gbits/sec
>>> [ 4] 5.00-6.00 sec 257 MBytes 2.15 Gbits/sec
>>> [ 4] 6.00-7.00 sec 253 MBytes 2.13 Gbits/sec
>>> [ 4] 7.00-8.00 sec 254 MBytes 2.13 Gbits/sec
>>> [ 4] 8.00-9.00 sec 255 MBytes 2.14 Gbits/sec
>>> [ 4] 9.00-10.00 sec 252 MBytes 2.12 Gbits/sec
>>> (etc)
>>>
>>> but with 3.14 the numbers as all over the place:
>>>
>>> [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452
>>> -t
>>> 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2 3.14.0 #1
>>> SMP Thu Apr 3 23:18:29 EDT 2014 x86_64
>>> Time: Fri, 11 Apr 2014 17:56:21 GMT
>>> Connecting to host 192.168.241.3, port 5201
>>> Cookie: Lab200slot2.1397238981.812898.548918
>>> [ 4] local 192.168.241.2 port 38616 connected to 192.168.241.3 port
>>> 5201 Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
>>> [ ID] Interval Transfer Bandwidth
>>> [ 4] 0.00-1.09 sec 20.8 MBytes 161 Mbits/sec
>>> [ 4] 1.09-2.13 sec 10.8 MBytes 86.8 Mbits/sec
>>> [ 4] 2.13-3.15 sec 3.57 MBytes 29.5 Mbits/sec
>>> [ 4] 3.15-4.16 sec 4.33 MBytes 35.7 Mbits/sec
>>> [ 4] 4.16-6.21 sec 10.4 MBytes 42.7 Mbits/sec
>>> [ 4] 6.21-6.21 sec 0.00 Bytes 0.00 bits/sec
>>> [ 4] 6.21-7.35 sec 34.6 MBytes 253 Mbits/sec
>>> [ 4] 7.35-11.45 sec 22.0 MBytes 45.0 Mbits/sec
>>> [ 4] 11.45-11.45 sec 0.00 Bytes 0.00 bits/sec [ 4]
>>> 11.45-11.45 sec 0.00 Bytes 0.00 bits/sec [ 4] 11.45-11.45 sec
>>> 0.00 Bytes
>>> 0.00 bits/sec
>>> [ 4] 11.45-12.51 sec 16.0 MBytes 126 Mbits/sec
>>> [ 4] 12.51-13.59 sec 20.3 MBytes 158 Mbits/sec
>>> [ 4] 13.59-14.65 sec 13.4 MBytes 107 Mbits/sec
>>> [ 4] 14.65-16.79 sec 33.3 MBytes 130 Mbits/sec
>>> [ 4] 16.79-16.79 sec 0.00 Bytes 0.00 bits/sec [ 4]
>>> 16.79-17.82 sec 5.94 MBytes 48.7 Mbits/sec
>>> (etc)
>>>
>>> Note: the difference appears to be SCTP-specific, as I get exactly
>>> the same TCP
>> > throughput in both kernels.
>>
>> Hmm, okay. :/ Could you further bisect on your side to narrow down from which kernel onwards this behaviour can be seen?
>>
>> Thanks,
>>
>> Daniel
>>
>>> -----Original Message-----
>>> From: Daniel Borkmann [mailto:dborkman@redhat.com]
>>> Sent: April-11-14 11:22 AM
>>> To: Butler, Peter
>>> Cc: Vlad Yasevich; linux-sctp@vger.kernel.org
>>> Subject: Re: Is SCTP throughput really this low compared to TCP?
>>>
>>> On 04/11/2014 05:07 PM, Butler, Peter wrote:
>>>> Yes indeed this is ixgbe and I do have SCTP checksum offloading
>>>> enabled. (For what
>>> > it's worth, the checksum offload gives about a 20% throughput
>>> gain
>>> - but this is, > of course, already included in the numbers I
>>> posted to this thread as I've been using > the CRC offload all
>>> along.)
>>>
>>> Ok, understood.
>>>
>>>> I re-did all the tests with TSO/GSO/LRO/GRO disabled (on both sides
>>>> of the association -
>>> > i.e. on both endpoint nodes), and using 1452-byte messages instead of 1000-byte > messages. With this new setup, the TCP performance drops significantly, as expected, > while the SCTP performance is boosted, and the playing field is somewhat more 'level'.
>>> > (Note that I could not use 1464-byte messages as suggested by
>>> Vlad, as anything above > 1452 cut the SCTP performance in half -
>>> must have hit the segmentation limit at this > slightly lower
>>> message size. MTU is 1500.)
>>>>
>>>> So comparing "apples to apples" now, TCP only out-performs SCTP by
>>>> approximately 40-70%
>>> > over the various range of network latencies I tested with (RTTs of 0.2 ms, 10 ms, 20 ms, > and 50 ms). 40-70% is still significant, but nowhere near the 200% better (i.e. 3 > times the throughput) I was getting before.
>>>>
>>>> Does this value (i.e. 40-70%) sound reasonable? Is this the
>>>> more-or-less accepted
>>> > performance difference with the current LKSCTP implementation?
>>>
>>> Yes, that sounds reasonable to me. There are still a lot of open todos in terms of performance that we need to tackle over time, e.g. the way chunks are handled, imho, and copies involved in fast path, also that we're heavily using atomic reference counting, and other open issues.
>>>
>>>> Also, for what it's worth, I get better SCTP throughput numbers
>>>> with the older kernel
>>> > (3.4.2) than with the newer kernel (3.14)...
>>>
>>> Interesting, a lot of things happened in between; were you able to bisect/identify a possible commit that causes this? How big is the difference?
>>>
^ permalink raw reply [flat|nested] 37+ messages in thread
* RE: Is SCTP throughput really this low compared to TCP?
2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
` (27 preceding siblings ...)
2014-04-14 16:47 ` Butler, Peter
@ 2014-04-14 17:06 ` Butler, Peter
2014-04-14 17:10 ` Butler, Peter
` (6 subsequent siblings)
35 siblings, 0 replies; 37+ messages in thread
From: Butler, Peter @ 2014-04-14 17:06 UTC (permalink / raw)
To: linux-sctp
This particular branch of the email thread(s) is now defunct: as per Daniel Borkmann the offending code was narrowed down to the following patch:
git revert ef2820a735f74ea60335f8ba3801b844f0cb184d
With this patch reverted, 3.14.0 SCTP+IPv4 performance is back to normal and working properly.
See other branch of this thread for details.
-----Original Message-----
From: Butler, Peter
Sent: April-14-14 10:52 AM
To: Daniel Borkmann
Cc: Vlad Yasevich; linux-sctp@vger.kernel.org
Subject: RE: Is SCTP throughput really this low compared to TCP?
I started the bisection process - again this will take some time and I (unfortunately) can't dedicate all my time to this at work (other fires for me to put out here as well). However on a hunch, based on TIPC work I had previously done that seemed to suggest that the 3.10.x kernel stream was the last stream before significant changes were made to the underlying net API, I tested the latest kernel in this stream, namely 3.10.36.
As I suspected, this kernel is still 'good' - that is, there is no erratic behaviour with SCTP/IPv4 and the throughput. However the throughput is still better with the 3.4 2 kernel (2.1 Gbps as opposed to 1.75 Gbps).
SUMMARY so far for SCTP+IPv4:
3.4.2 kernel: smooth throughput 2.1 Gbps
3.10.36 kernel: smooth throughput 1.75 Gbps
3.14 kernel: highly erratic, terrible throughput
-----Original Message-----
From: Daniel Borkmann [mailto:dborkman@redhat.com]
Sent: April-11-14 2:40 PM
To: Butler, Peter
Cc: Vlad Yasevich; linux-sctp@vger.kernel.org
Subject: Re: Is SCTP throughput really this low compared to TCP?
On 04/11/2014 08:22 PM, Butler, Peter wrote:
> The difference between 3.14 and 3.4.2 is staggering. An order of
> magnitude or so. For example,
> using the precisely same setup as before, whereas I get about 2.1 Gbps throughput with 3.4 2, I > can only manage between 70-150 Mbps with 3.14 - a staggering difference.
>
> Moreover, the SCTP throughput seems to 'choke' itself with 3.14, such
> that it is always trying to
> recover. For example, with 3.4.2 the 2.1 Gbps throughput is quite consistent from one second to > the next (as you would expect):
>
> [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t
> 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2
> 3.4.2-1.fc16.x86_64 #1 SMP Thu Jun 14 20:17:26 UTC 2012 x86_64
> Time: Fri, 11 Apr 2014 18:19:15 GMT
> Connecting to host 192.168.241.3, port 5201
> Cookie: Lab200slot2.1397240355.069035.0d5b0f
> [ 4] local 192.168.241.2 port 56030 connected to 192.168.241.3 port
> 5201 Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
> [ ID] Interval Transfer Bandwidth
> [ 4] 0.00-1.00 sec 255 MBytes 2.14 Gbits/sec
> [ 4] 1.00-2.00 sec 253 MBytes 2.12 Gbits/sec
> [ 4] 2.00-3.00 sec 255 MBytes 2.14 Gbits/sec
> [ 4] 3.00-4.00 sec 255 MBytes 2.14 Gbits/sec
> [ 4] 4.00-5.00 sec 255 MBytes 2.14 Gbits/sec
> [ 4] 5.00-6.00 sec 257 MBytes 2.15 Gbits/sec
> [ 4] 6.00-7.00 sec 253 MBytes 2.13 Gbits/sec
> [ 4] 7.00-8.00 sec 254 MBytes 2.13 Gbits/sec
> [ 4] 8.00-9.00 sec 255 MBytes 2.14 Gbits/sec
> [ 4] 9.00-10.00 sec 252 MBytes 2.12 Gbits/sec
> (etc)
>
> but with 3.14 the numbers as all over the place:
>
> [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t
> 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2 3.14.0 #1
> SMP Thu Apr 3 23:18:29 EDT 2014 x86_64
> Time: Fri, 11 Apr 2014 17:56:21 GMT
> Connecting to host 192.168.241.3, port 5201
> Cookie: Lab200slot2.1397238981.812898.548918
> [ 4] local 192.168.241.2 port 38616 connected to 192.168.241.3 port
> 5201 Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
> [ ID] Interval Transfer Bandwidth
> [ 4] 0.00-1.09 sec 20.8 MBytes 161 Mbits/sec
> [ 4] 1.09-2.13 sec 10.8 MBytes 86.8 Mbits/sec
> [ 4] 2.13-3.15 sec 3.57 MBytes 29.5 Mbits/sec
> [ 4] 3.15-4.16 sec 4.33 MBytes 35.7 Mbits/sec
> [ 4] 4.16-6.21 sec 10.4 MBytes 42.7 Mbits/sec
> [ 4] 6.21-6.21 sec 0.00 Bytes 0.00 bits/sec
> [ 4] 6.21-7.35 sec 34.6 MBytes 253 Mbits/sec
> [ 4] 7.35-11.45 sec 22.0 MBytes 45.0 Mbits/sec
> [ 4] 11.45-11.45 sec 0.00 Bytes 0.00 bits/sec [ 4] 11.45-11.45
> sec 0.00 Bytes 0.00 bits/sec [ 4] 11.45-11.45 sec 0.00 Bytes
> 0.00 bits/sec
> [ 4] 11.45-12.51 sec 16.0 MBytes 126 Mbits/sec
> [ 4] 12.51-13.59 sec 20.3 MBytes 158 Mbits/sec
> [ 4] 13.59-14.65 sec 13.4 MBytes 107 Mbits/sec
> [ 4] 14.65-16.79 sec 33.3 MBytes 130 Mbits/sec
> [ 4] 16.79-16.79 sec 0.00 Bytes 0.00 bits/sec [ 4] 16.79-17.82
> sec 5.94 MBytes 48.7 Mbits/sec
> (etc)
>
> Note: the difference appears to be SCTP-specific, as I get exactly the
> same TCP
> throughput in both kernels.
Hmm, okay. :/ Could you further bisect on your side to narrow down from which kernel onwards this behaviour can be seen?
Thanks,
Daniel
> -----Original Message-----
> From: Daniel Borkmann [mailto:dborkman@redhat.com]
> Sent: April-11-14 11:22 AM
> To: Butler, Peter
> Cc: Vlad Yasevich; linux-sctp@vger.kernel.org
> Subject: Re: Is SCTP throughput really this low compared to TCP?
>
> On 04/11/2014 05:07 PM, Butler, Peter wrote:
>> Yes indeed this is ixgbe and I do have SCTP checksum offloading
>> enabled. (For what
> > it's worth, the checksum offload gives about a 20% throughput gain
> - but this is, > of course, already included in the numbers I posted
> to this thread as I've been using > the CRC offload all along.)
>
> Ok, understood.
>
>> I re-did all the tests with TSO/GSO/LRO/GRO disabled (on both sides
>> of the association -
> > i.e. on both endpoint nodes), and using 1452-byte messages instead of 1000-byte > messages. With this new setup, the TCP performance drops significantly, as expected, > while the SCTP performance is boosted, and the playing field is somewhat more 'level'.
> > (Note that I could not use 1464-byte messages as suggested by
> Vlad, as anything above > 1452 cut the SCTP performance in half -
> must have hit the segmentation limit at this > slightly lower message
> size. MTU is 1500.)
>>
>> So comparing "apples to apples" now, TCP only out-performs SCTP by
>> approximately 40-70%
> > over the various range of network latencies I tested with (RTTs of 0.2 ms, 10 ms, 20 ms, > and 50 ms). 40-70% is still significant, but nowhere near the 200% better (i.e. 3 > times the throughput) I was getting before.
>>
>> Does this value (i.e. 40-70%) sound reasonable? Is this the
>> more-or-less accepted
> > performance difference with the current LKSCTP implementation?
>
> Yes, that sounds reasonable to me. There are still a lot of open todos in terms of performance that we need to tackle over time, e.g. the way chunks are handled, imho, and copies involved in fast path, also that we're heavily using atomic reference counting, and other open issues.
>
>> Also, for what it's worth, I get better SCTP throughput numbers with
>> the older kernel
> > (3.4.2) than with the newer kernel (3.14)...
>
> Interesting, a lot of things happened in between; were you able to bisect/identify a possible commit that causes this? How big is the difference?
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* RE: Is SCTP throughput really this low compared to TCP?
2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
` (28 preceding siblings ...)
2014-04-14 17:06 ` Butler, Peter
@ 2014-04-14 17:10 ` Butler, Peter
2014-04-14 18:54 ` Matija Glavinic Pecotic
` (5 subsequent siblings)
35 siblings, 0 replies; 37+ messages in thread
From: Butler, Peter @ 2014-04-14 17:10 UTC (permalink / raw)
To: linux-sctp
This particular branch of the email thread(s) is now defunct: as per Daniel Borkmann the offending code was narrowed down to the following patch:
git revert ef2820a735f74ea60335f8ba3801b844f0cb184d
With this patch reverted, 3.14.0 SCTP+IPv4 performance is back to normal and working properly.
See other branch of this thread for details.
-----Original Message-----
From: Butler, Peter
Sent: April-14-14 11:50 AM
To: Vlad Yasevich; Daniel Borkmann
Cc: linux-sctp@vger.kernel.org
Subject: RE: Is SCTP throughput really this low compared to TCP?
Here are some perf numbers. Note that these were obtained with operf/opreport. Only the top 20 or so lines are shown here.
Identical load test performed on 3.4.2 and 3.14.
3.4.2:
CPU: Intel Core/i7, speed 2.134e+06 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000
samples cumulative samples % cumulative % linenr info image name symbol name
130002 130002 4.9435 4.9435 copy_user_64.S:240 vmlinux copy_user_generic_string
125955 255957 4.7896 9.7330 memcpy_64.S:59 vmlinux memcpy
71026 326983 2.7008 12.4339 spinlock.c:136 vmlinux _raw_spin_lock
57138 384121 2.1727 14.6066 slub.c:409 vmlinux cmpxchg_double_slab
56559 440680 2.1507 16.7573 slub.c:2208 vmlinux __slab_alloc
53058 493738 2.0176 18.7749 ixgbe_main.c:2952 ixgbe.ko ixgbe_poll
51541 545279 1.9599 20.7348 slub.c:2439 vmlinux __slab_free
49916 595195 1.8981 22.6329 ip_tables.c:294 vmlinux ipt_do_table
42406 637601 1.6125 24.2455 ixgbe_main.c:7824 ixgbe.ko ixgbe_xmit_frame_ring
40929 678530 1.5564 25.8018 slub.c:3463 vmlinux kfree
40349 718879 1.5343 27.3361 core.c:132 vmlinux nf_iterate
35521 754400 1.3507 28.6869 output.c:347 sctp.ko sctp_packet_transmit
34071 788471 1.2956 29.9825 outqueue.c:1342 sctp.ko sctp_check_transmitted
33962 822433 1.2914 31.2739 slub.c:2601 vmlinux kmem_cache_free
33450 855883 1.2720 32.5459 outqueue.c:735 sctp.ko sctp_outq_flush
33005 888888 1.2551 33.8009 skbuff.c:172 vmlinux __alloc_skb
29231 918119 1.1115 34.9125 socket.c:1565 sctp.ko sctp_sendmsg
27950 946069 1.0628 35.9753 (no location information) libc-2.14.90.so __memmove_ssse3_back
26718 972787 1.0160 36.9913 (no location information) nf_conntrack.ko nf_conntrack_in
26589 999376 1.0111 38.0023 slub.c:4049 vmlinux __kmalloc_node_track_caller
26449 1025825 1.0058 39.0081 slub.c:2375 vmlinux kmem_cache_alloc
26211 1052036 0.9967 40.0048 sm_sideeffect.c:1074 sctp.ko sctp_do_sm
25527 1077563 0.9707 40.9755 slub.c:2404 vmlinux kmem_cache_alloc_node
23970 1101533 0.9115 41.8870 (no location information) libc-2.14.90.so _int_free
23266 1124799 0.8847 42.7717 memset_64.S:62 vmlinux memset
22976 1147775 0.8737 43.6454 (no location information) nf_conntrack.ko hash_conntrack_raw
21855 1169630 0.8311 44.4764 chunk.c:175 sctp.ko sctp_datamsg_from_user
21730 1191360 0.8263 45.3027 list_debug.c:24 vmlinux __list_add
21252 1212612 0.8081 46.1109 dev.c:3151 vmlinux __netif_receive_skb
20742 1233354 0.7887 46.8996 (no location information) libc-2.14.90.so _int_malloc
19955 1253309 0.7588 47.6584 input.c:130 sctp.ko sctp_rcv
3.14:
CPU: Intel Core/i7, speed 2.134e+06 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000
samples cumulative samples % cumulative % linenr info image name symbol name
168021 168021 6.1446 6.1446 copy_user_64.S:183 vmlinux-3.14.0 copy_user_generic_string
85199 253220 3.1158 9.2604 memcpy_64.S:59 vmlinux-3.14.0 memcpy
80133 333353 2.9305 12.1909 spinlock.c:174 vmlinux-3.14.0 _raw_spin_lock_bh
74086 407439 2.7094 14.9003 spinlock.c:150 vmlinux-3.14.0 _raw_spin_lock
51878 459317 1.8972 16.7975 ixgbe_main.c:6930 ixgbe.ko ixgbe_xmit_frame_ring
49354 508671 1.8049 18.6024 slub.c:2538 vmlinux-3.14.0 __slab_free
39103 547774 1.4300 20.0324 outqueue.c:706 sctp.ko sctp_outq_flush
37775 585549 1.3815 21.4139 outqueue.c:1304 sctp.ko sctp_check_transmitted
37514 623063 1.3719 22.7858 output.c:380 sctp.ko sctp_packet_transmit
37320 660383 1.3648 24.1506 slub.c:2700 vmlinux-3.14.0 kmem_cache_free
36147 696530 1.3219 25.4725 ip_tables.c:294 vmlinux-3.14.0 ipt_do_table
35494 732024 1.2980 26.7705 sm_sideeffect.c:1100 sctp.ko sctp_do_sm
35452 767476 1.2965 28.0670 core.c:135 vmlinux-3.14.0 nf_iterate
34697 802173 1.2689 29.3359 slub.c:2281 vmlinux-3.14.0 __slab_alloc
33890 836063 1.2394 30.5753 slub.c:415 vmlinux-3.14.0 cmpxchg_double_slab
33566 869629 1.2275 31.8028 (no location information) libc-2.14.90.so _int_free
33228 902857 1.2152 33.0180 socket.c:1590 sctp.ko sctp_sendmsg
32774 935631 1.1986 34.2166 slub.c:3381 vmlinux-3.14.0 kfree
30359 965990 1.1102 35.3268 (no location information) libc-2.14.90.so __memmove_ssse3_back
28905 994895 1.0571 36.3839 list_debug.c:25 vmlinux-3.14.0 __list_add
25888 1020783 0.9467 37.3306 skbuff.c:199 vmlinux-3.14.0 __alloc_skb
25490 1046273 0.9322 38.2628 fib_trie.c:1399 vmlinux-3.14.0 fib_table_lookup
25232 1071505 0.9227 39.1856 nf_conntrack_core.c:376 nf_conntrack.ko __nf_conntrack_find_get
24114 1095619 0.8819 40.0674 chunk.c:168 sctp.ko sctp_datamsg_from_user
24067 1119686 0.8801 40.9476 ixgbe_main.c:2020 ixgbe.ko ixgbe_clean_rx_irq
23972 1143658 0.8767 41.8242 (no location information) libc-2.14.90.so _int_malloc
22117 1165775 0.8088 42.6331 ip_output.c:215 vmlinux-3.14.0 ip_finish_output
22037 1187812 0.8059 43.4390 slub.c:3854 vmlinux-3.14.0 __kmalloc_node_track_caller
21847 1209659 0.7990 44.2379 dev.c:2546 vmlinux-3.14.0 dev_hard_start_xmit
21564 1231223 0.7886 45.0265 slub.c:2481 vmlinux-3.14.0 kmem_cache_alloc
20659 1251882 0.7555 45.7820 socket.c:2049 sctp.ko sctp_recvmsg
-----Original Message-----
From: Vlad Yasevich [mailto:vyasevich@gmail.com]
Sent: April-11-14 11:28 AM
To: Butler, Peter; Daniel Borkmann
Cc: linux-sctp@vger.kernel.org
Subject: Re: Is SCTP throughput really this low compared to TCP?
On 04/11/2014 11:07 AM, Butler, Peter wrote:
> Yes indeed this is ixgbe and I do have SCTP checksum offloading
> enabled. (For what it's worth, the checksum offload gives about a 20%
> throughput gain - but this is, of course, already included in the
> numbers I posted to this thread as I've been using the CRC offload all
> along.)
>
> I re-did all the tests with TSO/GSO/LRO/GRO disabled (on both sides of the association - i.e. on both endpoint nodes), and using 1452-byte messages instead of 1000-byte messages. With this new setup, the TCP performance drops significantly, as expected, while the SCTP performance is boosted, and the playing field is somewhat more 'level'. (Note that I could not use 1464-byte messages as suggested by Vlad, as anything above 1452 cut the SCTP performance in half - must have hit the segmentation limit at this slightly lower message size. MTU is 1500.)
>
> So comparing "apples to apples" now, TCP only out-performs SCTP by approximately 40-70% over the various range of network latencies I tested with (RTTs of 0.2 ms, 10 ms, 20 ms, and 50 ms). 40-70% is still significant, but nowhere near the 200% better (i.e. 3 times the throughput) I was getting before.
>
> Does this value (i.e. 40-70%) sound reasonable?
This still looks high. Could you run 'perf record -a' and 'perf report'
to see where we are spending all of our time in sctp.
My guess is that a lot of it is going to be in memcpy(), but I am curious.
> Is this the more-or-less accepted performance difference with the current LKSCTP implementation?
>
> Also, for what it's worth, I get better SCTP throughput numbers with the older kernel (3.4.2) than with the newer kernel (3.14)...
>
That's interesting. I'll have to look at see what might have changed here.
-vlad
>
>
> -----Original Message-----
> From: Daniel Borkmann [mailto:dborkman@redhat.com]
> Sent: April-11-14 3:43 AM
> To: Vlad Yasevich
> Cc: Butler, Peter; linux-sctp@vger.kernel.org
> Subject: Re: Is SCTP throughput really this low compared to TCP?
>
> Hi Peter,
>
> On 04/10/2014 10:21 PM, Vlad Yasevich wrote:
>> On 04/10/2014 03:12 PM, Butler, Peter wrote:
>>> I've been testing SCTP throughput between two nodes over a 10Gb-Ethernet backplane, and am finding that at best, its throughput is about a third of that of TCP. Is this number generally accepted for current LKSCTP performance?
>>>
>>> All TCP/SCTP tests performed with 1000-byte (payload) messages, between 8-core Xeon nodes @ 2.13GHz, with no CPU throttling (always running at 100%) on otherwise idle systems. Test applications include netperf, iperf and proprietary in-house stubs.
>>>
>>> The latency between nodes is generally 0.2 ms. Tests were run using this low-latency scenario, as well as using traffic control (tc) to simulate networks with 10 ms, 20 ms and 50 ms latency (i.e. 20 ms, 40 ms and 100 ms RTT, respectively).
>>>
>>> In addition, each of these network scenarios were tested using various kernel socket buffer sizes, ranging from the default kernel size (100-200 kB), to several MB for send and receive buffers, and multiple send:receive ratios for these buffer sizes (generally using larger receive buffer sizes, up to a factor of about 6).
>>>
>>> Finally, tests were performed on kernels as old as 3.4.2 and as recent as 3.14.
>>>
>>> The TCP throughput is about 3x higher than that of SCTP as a best-case scenario (i.e. from an SCTP perspective), and much higher still in worst-case scenarios.
>>
>> To do a more of apples-to-apples comparison, you need to disable
>> tso/gso on the sending node.
>>
>> The reason is that even if you limit buffer sizes, tcp will still try
>> to do tso on the transmit size, thus coalescing you 1000-byte
>> messages into something much larger, thus utilizing your MTU much more efficiently.
>>
>> SCTP, on the other hand, has to preserve message boundaries which
>> results in sub-optimal mtu utilization when using 1000-byte payloads.
>>
>> My recommendation is to use 1464 byte message for SCTP on a 1500 byte
>> MTU nic.
>>
>> I would be interested to see the results. There could very well be issues.
>
> Agreed.
>
> Also, what NIC are you using? It seems only Intel provides SCTP checksum offloading so far, i.e. ixgbe/i40e NICs.
>
>> -vlad
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Is SCTP throughput really this low compared to TCP?
2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
` (29 preceding siblings ...)
2014-04-14 17:10 ` Butler, Peter
@ 2014-04-14 18:54 ` Matija Glavinic Pecotic
2014-04-14 19:46 ` Daniel Borkmann
` (4 subsequent siblings)
35 siblings, 0 replies; 37+ messages in thread
From: Matija Glavinic Pecotic @ 2014-04-14 18:54 UTC (permalink / raw)
To: linux-sctp
Hello,
On 04/14/2014 06:45 PM, ext Daniel Borkmann wrote:
> Ok, thanks! I'll send out a revert for now today as this is otherwise
> catastrophic ... when that is done, we/the developer from NSN can still
> think about his patch and how to solve that differently.
Thanks from my side as well, I will for sure look into it. Congestion control seems to be completely broken. Seems stack doesnt like the new way how rwnd is calculated. Dependency to ipv4 is also interesting.
I also wasnt able to reproduce with my laptop/desktop, it seems some higher processing power is also needed to hit the case.
> On 04/14/2014 06:43 PM, Butler, Peter wrote:
>> With the git revert command applied, the SCTP+IPv4 performance on 3.14.0 is back to normal! Same throughput as 3.4.2 and steady and smooth:
>>
>> [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.240.3 -V -l 1452 -t 60
>> iperf version 3.0.1 (10 January 2014)
>> Linux Lab200slot2 3.14.0+ #1 SMP Mon Apr 14 12:06:40 EDT 2014 x86_64
>> Time: Mon, 14 Apr 2014 16:40:48 GMT
>> Connecting to host 192.168.240.3, port 5201
>> Cookie: Lab200slot2.1397493648.413274.65e131
>> [ 4] local 192.168.240.2 port 50548 connected to 192.168.240.3 port 5201
>> Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
>> [ ID] Interval Transfer Bandwidth
>> [ 4] 0.00-1.00 sec 240 MBytes 2.02 Gbits/sec
>> [ 4] 1.00-2.00 sec 239 MBytes 2.01 Gbits/sec
>> [ 4] 2.00-3.00 sec 240 MBytes 2.01 Gbits/sec
>> [ 4] 3.00-4.00 sec 239 MBytes 2.00 Gbits/sec
>> [ 4] 4.00-5.00 sec 245 MBytes 2.05 Gbits/sec
>> [ 4] 5.00-6.00 sec 240 MBytes 2.01 Gbits/sec
>> [ 4] 6.00-7.00 sec 240 MBytes 2.02 Gbits/sec
>> [ 4] 7.00-8.00 sec 239 MBytes 2.01 Gbits/sec
>>
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: Daniel Borkmann [mailto:dborkman@redhat.com]
>> Sent: April-11-14 7:58 PM
>> To: Butler, Peter
>> Cc: Vlad Yasevich; linux-sctp@vger.kernel.org
>> Subject: Re: Is SCTP throughput really this low compared to TCP?
>>
>> On 04/11/2014 10:18 PM, Butler, Peter wrote:
>>> It may take a little time to do the bisection. Can you provide me
>>> with a
>> > guesstimate as to which kernel(s) may have introduced this behaviour?
>>
>> Just out of curiosity, could you do a ...
>>
>> git revert ef2820a735f74ea60335f8ba3801b844f0cb184d
>>
>> ... on your tree and try to see what you get with that?
>>
>>> -----Original Message-----
>>> From: Daniel Borkmann [mailto:dborkman@redhat.com]
>>> Sent: April-11-14 2:40 PM
>>> To: Butler, Peter
>>> Cc: Vlad Yasevich; linux-sctp@vger.kernel.org
>>> Subject: Re: Is SCTP throughput really this low compared to TCP?
>>>
>>> On 04/11/2014 08:22 PM, Butler, Peter wrote:
>>>> The difference between 3.14 and 3.4.2 is staggering. An order of
>>>> magnitude or so. For example,
>>> > using the precisely same setup as before, whereas I get about 2.1 Gbps throughput with 3.4 2, I > can only manage between 70-150 Mbps with 3.14 - a staggering difference.
>>>>
>>>> Moreover, the SCTP throughput seems to 'choke' itself with 3.14, such
>>>> that it is always trying to
>>> > recover. For example, with 3.4.2 the 2.1 Gbps throughput is quite consistent from one second to > the next (as you would expect):
>>>>
>>>> [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t
>>>> 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2
>>>> 3.4.2-1.fc16.x86_64 #1 SMP Thu Jun 14 20:17:26 UTC 2012 x86_64
>>>> Time: Fri, 11 Apr 2014 18:19:15 GMT
>>>> Connecting to host 192.168.241.3, port 5201
>>>> Cookie: Lab200slot2.1397240355.069035.0d5b0f
>>>> [ 4] local 192.168.241.2 port 56030 connected to 192.168.241.3 port
>>>> 5201 Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
>>>> [ ID] Interval Transfer Bandwidth
>>>> [ 4] 0.00-1.00 sec 255 MBytes 2.14 Gbits/sec
>>>> [ 4] 1.00-2.00 sec 253 MBytes 2.12 Gbits/sec
>>>> [ 4] 2.00-3.00 sec 255 MBytes 2.14 Gbits/sec
>>>> [ 4] 3.00-4.00 sec 255 MBytes 2.14 Gbits/sec
>>>> [ 4] 4.00-5.00 sec 255 MBytes 2.14 Gbits/sec
>>>> [ 4] 5.00-6.00 sec 257 MBytes 2.15 Gbits/sec
>>>> [ 4] 6.00-7.00 sec 253 MBytes 2.13 Gbits/sec
>>>> [ 4] 7.00-8.00 sec 254 MBytes 2.13 Gbits/sec
>>>> [ 4] 8.00-9.00 sec 255 MBytes 2.14 Gbits/sec
>>>> [ 4] 9.00-10.00 sec 252 MBytes 2.12 Gbits/sec
>>>> (etc)
>>>>
>>>> but with 3.14 the numbers as all over the place:
>>>>
>>>> [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t
>>>> 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2 3.14.0 #1
>>>> SMP Thu Apr 3 23:18:29 EDT 2014 x86_64
>>>> Time: Fri, 11 Apr 2014 17:56:21 GMT
>>>> Connecting to host 192.168.241.3, port 5201
>>>> Cookie: Lab200slot2.1397238981.812898.548918
>>>> [ 4] local 192.168.241.2 port 38616 connected to 192.168.241.3 port
>>>> 5201 Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
>>>> [ ID] Interval Transfer Bandwidth
>>>> [ 4] 0.00-1.09 sec 20.8 MBytes 161 Mbits/sec
>>>> [ 4] 1.09-2.13 sec 10.8 MBytes 86.8 Mbits/sec
>>>> [ 4] 2.13-3.15 sec 3.57 MBytes 29.5 Mbits/sec
>>>> [ 4] 3.15-4.16 sec 4.33 MBytes 35.7 Mbits/sec
>>>> [ 4] 4.16-6.21 sec 10.4 MBytes 42.7 Mbits/sec
>>>> [ 4] 6.21-6.21 sec 0.00 Bytes 0.00 bits/sec
>>>> [ 4] 6.21-7.35 sec 34.6 MBytes 253 Mbits/sec
>>>> [ 4] 7.35-11.45 sec 22.0 MBytes 45.0 Mbits/sec
>>>> [ 4] 11.45-11.45 sec 0.00 Bytes 0.00 bits/sec [ 4] 11.45-11.45
>>>> sec 0.00 Bytes 0.00 bits/sec [ 4] 11.45-11.45 sec 0.00 Bytes
>>>> 0.00 bits/sec
>>>> [ 4] 11.45-12.51 sec 16.0 MBytes 126 Mbits/sec
>>>> [ 4] 12.51-13.59 sec 20.3 MBytes 158 Mbits/sec
>>>> [ 4] 13.59-14.65 sec 13.4 MBytes 107 Mbits/sec
>>>> [ 4] 14.65-16.79 sec 33.3 MBytes 130 Mbits/sec
>>>> [ 4] 16.79-16.79 sec 0.00 Bytes 0.00 bits/sec [ 4] 16.79-17.82
>>>> sec 5.94 MBytes 48.7 Mbits/sec
>>>> (etc)
>>>>
>>>> Note: the difference appears to be SCTP-specific, as I get exactly
>>>> the same TCP
>>> > throughput in both kernels.
>>>
>>> Hmm, okay. :/ Could you further bisect on your side to narrow down from which kernel onwards this behaviour can be seen?
>>>
>>> Thanks,
>>>
>>> Daniel
>>>
>>>> -----Original Message-----
>>>> From: Daniel Borkmann [mailto:dborkman@redhat.com]
>>>> Sent: April-11-14 11:22 AM
>>>> To: Butler, Peter
>>>> Cc: Vlad Yasevich; linux-sctp@vger.kernel.org
>>>> Subject: Re: Is SCTP throughput really this low compared to TCP?
>>>>
>>>> On 04/11/2014 05:07 PM, Butler, Peter wrote:
>>>>> Yes indeed this is ixgbe and I do have SCTP checksum offloading
>>>>> enabled. (For what
>>>> > it's worth, the checksum offload gives about a 20% throughput
>>>> gain
>>>> - but this is, > of course, already included in the numbers I posted
>>>> to this thread as I've been using > the CRC offload all along.)
>>>>
>>>> Ok, understood.
>>>>
>>>>> I re-did all the tests with TSO/GSO/LRO/GRO disabled (on both sides
>>>>> of the association -
>>>> > i.e. on both endpoint nodes), and using 1452-byte messages instead of 1000-byte > messages. With this new setup, the TCP performance drops significantly, as expected, > while the SCTP performance is boosted, and the playing field is somewhat more 'level'.
>>>> > (Note that I could not use 1464-byte messages as suggested by
>>>> Vlad, as anything above > 1452 cut the SCTP performance in half -
>>>> must have hit the segmentation limit at this > slightly lower
>>>> message size. MTU is 1500.)
>>>>>
>>>>> So comparing "apples to apples" now, TCP only out-performs SCTP by
>>>>> approximately 40-70%
>>>> > over the various range of network latencies I tested with (RTTs of 0.2 ms, 10 ms, 20 ms, > and 50 ms). 40-70% is still significant, but nowhere near the 200% better (i.e. 3 > times the throughput) I was getting before.
>>>>>
>>>>> Does this value (i.e. 40-70%) sound reasonable? Is this the
>>>>> more-or-less accepted
>>>> > performance difference with the current LKSCTP implementation?
>>>>
>>>> Yes, that sounds reasonable to me. There are still a lot of open todos in terms of performance that we need to tackle over time, e.g. the way chunks are handled, imho, and copies involved in fast path, also that we're heavily using atomic reference counting, and other open issues.
>>>>
>>>>> Also, for what it's worth, I get better SCTP throughput numbers with
>>>>> the older kernel
>>>> > (3.4.2) than with the newer kernel (3.14)...
>>>>
>>>> Interesting, a lot of things happened in between; were you able to bisect/identify a possible commit that causes this? How big is the difference?
>>>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Is SCTP throughput really this low compared to TCP?
2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
` (30 preceding siblings ...)
2014-04-14 18:54 ` Matija Glavinic Pecotic
@ 2014-04-14 19:46 ` Daniel Borkmann
2014-04-17 15:26 ` Vlad Yasevich
` (3 subsequent siblings)
35 siblings, 0 replies; 37+ messages in thread
From: Daniel Borkmann @ 2014-04-14 19:46 UTC (permalink / raw)
To: linux-sctp
On 04/14/2014 08:54 PM, Matija Glavinic Pecotic wrote:
> Hello,
>
> On 04/14/2014 06:45 PM, ext Daniel Borkmann wrote:
>> Ok, thanks! I'll send out a revert for now today as this is otherwise
>> catastrophic ... when that is done, we/the developer from NSN can still
>> think about his patch and how to solve that differently.
>
> Thanks from my side as well, I will for sure look into it. Congestion control seems to be completely broken. Seems stack doesnt like the new way how rwnd is calculated. Dependency to ipv4 is also interesting.
>
> I also wasnt able to reproduce with my laptop/desktop, it seems some higher processing power is also needed to hit the case.
Posted here, Peter: http://patchwork.ozlabs.org/patch/339046/
Thanks for reporting and testing!
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Is SCTP throughput really this low compared to TCP?
2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
` (31 preceding siblings ...)
2014-04-14 19:46 ` Daniel Borkmann
@ 2014-04-17 15:26 ` Vlad Yasevich
2014-04-17 16:15 ` Butler, Peter
` (2 subsequent siblings)
35 siblings, 0 replies; 37+ messages in thread
From: Vlad Yasevich @ 2014-04-17 15:26 UTC (permalink / raw)
To: linux-sctp
On 04/14/2014 12:47 PM, Butler, Peter wrote:
> Glad to be of help :o)
>
Hi Peter
Would you be able to run this test again with the following patch
on top of the problematic code.
Thanks
-vlad
commit c9888a220916284403c5115d6c6c7e33a00d0b55
Author: Vlad Yasevich <vyasevic@redhat.com>
Date: Thu Apr 17 09:21:52 2014 -0400
sctp: Trigger window update SACK after skb has been freed.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
diff --git a/net/sctp/ulpevent.c b/net/sctp/ulpevent.c
index 8d198ae..b59a7c5 100644
--- a/net/sctp/ulpevent.c
+++ b/net/sctp/ulpevent.c
@@ -1011,7 +1011,6 @@ static void sctp_ulpevent_release_data(struct
sctp_ulpevent *event)
{
struct sk_buff *skb, *frag;
unsigned int len;
- struct sctp_association *asoc;
/* Current stack structures assume that the rcv buffer is
* per socket. For UDP style sockets this is not true as
@@ -1036,11 +1035,7 @@ static void sctp_ulpevent_release_data(struct
sctp_ulpevent *event)
}
done:
- asoc = event->asoc;
- sctp_association_hold(asoc);
sctp_ulpevent_release_owner(event);
- sctp_assoc_rwnd_update(asoc, true);
- sctp_association_put(asoc);
}
static void sctp_ulpevent_release_frag_data(struct sctp_ulpevent *event)
@@ -1071,12 +1066,21 @@ done:
*/
void sctp_ulpevent_free(struct sctp_ulpevent *event)
{
+ struct sctp_association *asoc = event->asoc;
+
if (sctp_ulpevent_is_notification(event))
sctp_ulpevent_release_owner(event);
else
sctp_ulpevent_release_data(event);
kfree_skb(sctp_event2skb(event));
+ /* The socket is locked and the assocaiton can't go anywhere
+ * since we are walking the uplqueue. No need to hold
+ * another ref on the association. Now that the skb has been
+ * freed and accounted for everywhere, see if we need to send
+ * a window update SACK.
+ */
+ sctp_assoc_rwnd_update(asoc, true);
}
/* Purge the skb lists holding ulpevents. */
^ permalink raw reply related [flat|nested] 37+ messages in thread
* RE: Is SCTP throughput really this low compared to TCP?
2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
` (32 preceding siblings ...)
2014-04-17 15:26 ` Vlad Yasevich
@ 2014-04-17 16:15 ` Butler, Peter
2014-04-22 21:50 ` Butler, Peter
2014-04-23 12:59 ` Vlad Yasevich
35 siblings, 0 replies; 37+ messages in thread
From: Butler, Peter @ 2014-04-17 16:15 UTC (permalink / raw)
To: linux-sctp
I should be able to, but won't be able to run the test until Tuesday April 22...
________________________________________
From: Vlad Yasevich [vyasevich@gmail.com]
Sent: Thursday, April 17, 2014 11:26 AM
To: Butler, Peter; Daniel Borkmann
Cc: linux-sctp@vger.kernel.org
Subject: Re: Is SCTP throughput really this low compared to TCP?
On 04/14/2014 12:47 PM, Butler, Peter wrote:
> Glad to be of help :o)
>
Hi Peter
Would you be able to run this test again with the following patch
on top of the problematic code.
Thanks
-vlad
commit c9888a220916284403c5115d6c6c7e33a00d0b55
Author: Vlad Yasevich <vyasevic@redhat.com>
Date: Thu Apr 17 09:21:52 2014 -0400
sctp: Trigger window update SACK after skb has been freed.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
diff --git a/net/sctp/ulpevent.c b/net/sctp/ulpevent.c
index 8d198ae..b59a7c5 100644
--- a/net/sctp/ulpevent.c
+++ b/net/sctp/ulpevent.c
@@ -1011,7 +1011,6 @@ static void sctp_ulpevent_release_data(struct
sctp_ulpevent *event)
{
struct sk_buff *skb, *frag;
unsigned int len;
- struct sctp_association *asoc;
/* Current stack structures assume that the rcv buffer is
* per socket. For UDP style sockets this is not true as
@@ -1036,11 +1035,7 @@ static void sctp_ulpevent_release_data(struct
sctp_ulpevent *event)
}
done:
- asoc = event->asoc;
- sctp_association_hold(asoc);
sctp_ulpevent_release_owner(event);
- sctp_assoc_rwnd_update(asoc, true);
- sctp_association_put(asoc);
}
static void sctp_ulpevent_release_frag_data(struct sctp_ulpevent *event)
@@ -1071,12 +1066,21 @@ done:
*/
void sctp_ulpevent_free(struct sctp_ulpevent *event)
{
+ struct sctp_association *asoc = event->asoc;
+
if (sctp_ulpevent_is_notification(event))
sctp_ulpevent_release_owner(event);
else
sctp_ulpevent_release_data(event);
kfree_skb(sctp_event2skb(event));
+ /* The socket is locked and the assocaiton can't go anywhere
+ * since we are walking the uplqueue. No need to hold
+ * another ref on the association. Now that the skb has been
+ * freed and accounted for everywhere, see if we need to send
+ * a window update SACK.
+ */
+ sctp_assoc_rwnd_update(asoc, true);
}
/* Purge the skb lists holding ulpevents. */
^ permalink raw reply related [flat|nested] 37+ messages in thread
* RE: Is SCTP throughput really this low compared to TCP?
2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
` (33 preceding siblings ...)
2014-04-17 16:15 ` Butler, Peter
@ 2014-04-22 21:50 ` Butler, Peter
2014-04-23 12:59 ` Vlad Yasevich
35 siblings, 0 replies; 37+ messages in thread
From: Butler, Peter @ 2014-04-22 21:50 UTC (permalink / raw)
To: linux-sctp
When I apply the patch you provided to the standard 3.14.0 kernel, I still get the highly erratic throughput (see output below). It was only when I did the full "git revert" as suggested by Daniel that the erratic behaviour went away.
[root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.240.3 -V -l 1452 -t 60
iperf version 3.0.1 (10 January 2014)
Linux Lab200slot2 3.14.0 #1 SMP Thu Apr 3 23:18:29 EDT 2014 x86_64
Time: Tue, 22 Apr 2014 21:44:24 GMT
Connecting to host 192.168.240.3, port 5201
Cookie: Lab200slot2.1398203064.823332.513f07
[ 4] local 192.168.240.2 port 55819 connected to 192.168.240.3 port 5201
Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-1.08 sec 23.9 MBytes 186 Mbits/sec
[ 4] 1.08-2.13 sec 16.0 MBytes 128 Mbits/sec
[ 4] 2.13-3.95 sec 198 MBytes 913 Mbits/sec
[ 4] 3.95-4.00 sec 15.8 MBytes 2.62 Gbits/sec
[ 4] 4.00-5.00 sec 226 MBytes 1.90 Gbits/sec
[ 4] 5.00-6.84 sec 180 MBytes 819 Mbits/sec
[ 4] 6.84-7.00 sec 44.0 MBytes 2.30 Gbits/sec
[ 4] 7.00-8.01 sec 6.31 MBytes 52.2 Mbits/sec
[ 4] 8.01-9.08 sec 21.3 MBytes 167 Mbits/sec
[ 4] 9.08-10.12 sec 13.2 MBytes 107 Mbits/sec
[ 4] 10.12-11.17 sec 14.8 MBytes 119 Mbits/sec
[ 4] 11.17-12.97 sec 180 MBytes 839 Mbits/sec
[ 4] 12.97-13.00 sec 8.25 MBytes 2.27 Gbits/sec
[ 4] 13.00-14.10 sec 30.6 MBytes 234 Mbits/sec
[ 4] 14.10-15.95 sec 191 MBytes 866 Mbits/sec
[ 4] 15.95-16.00 sec 15.1 MBytes 2.51 Gbits/sec
[ 4] 16.00-17.00 sec 219 MBytes 1.84 Gbits/sec
[ 4] 17.00-18.09 sec 28.5 MBytes 218 Mbits/sec
[ 4] 18.09-19.13 sec 11.4 MBytes 92.5 Mbits/sec
[ 4] 19.13-20.17 sec 14.1 MBytes 114 Mbits/sec
[ 4] 20.17-21.21 sec 13.0 MBytes 105 Mbits/sec
[ 4] 21.21-23.27 sec 16.8 MBytes 68.4 Mbits/sec
[ 4] 23.27-23.27 sec 0.00 Bytes 0.00 bits/sec
[ 4] 23.27-24.00 sec 168 MBytes 1.91 Gbits/sec
[ 4] 24.00-25.76 sec 179 MBytes 852 Mbits/sec
[ 4] 25.76-26.00 sec 21.2 MBytes 760 Mbits/sec
-----Original Message-----
From: Vlad Yasevich [mailto:vyasevich@gmail.com]
Sent: April-17-14 11:27 AM
To: Butler, Peter; Daniel Borkmann
Cc: linux-sctp@vger.kernel.org
Subject: Re: Is SCTP throughput really this low compared to TCP?
On 04/14/2014 12:47 PM, Butler, Peter wrote:
> Glad to be of help :o)
>
Hi Peter
Would you be able to run this test again with the following patch on top of the problematic code.
Thanks
-vlad
commit c9888a220916284403c5115d6c6c7e33a00d0b55
Author: Vlad Yasevich <vyasevic@redhat.com>
Date: Thu Apr 17 09:21:52 2014 -0400
sctp: Trigger window update SACK after skb has been freed.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
diff --git a/net/sctp/ulpevent.c b/net/sctp/ulpevent.c index 8d198ae..b59a7c5 100644
--- a/net/sctp/ulpevent.c
+++ b/net/sctp/ulpevent.c
@@ -1011,7 +1011,6 @@ static void sctp_ulpevent_release_data(struct sctp_ulpevent *event) {
struct sk_buff *skb, *frag;
unsigned int len;
- struct sctp_association *asoc;
/* Current stack structures assume that the rcv buffer is
* per socket. For UDP style sockets this is not true as
@@ -1036,11 +1035,7 @@ static void sctp_ulpevent_release_data(struct sctp_ulpevent *event)
}
done:
- asoc = event->asoc;
- sctp_association_hold(asoc);
sctp_ulpevent_release_owner(event);
- sctp_assoc_rwnd_update(asoc, true);
- sctp_association_put(asoc);
}
static void sctp_ulpevent_release_frag_data(struct sctp_ulpevent *event) @@ -1071,12 +1066,21 @@ done:
*/
void sctp_ulpevent_free(struct sctp_ulpevent *event) {
+ struct sctp_association *asoc = event->asoc;
+
if (sctp_ulpevent_is_notification(event))
sctp_ulpevent_release_owner(event);
else
sctp_ulpevent_release_data(event);
kfree_skb(sctp_event2skb(event));
+ /* The socket is locked and the assocaiton can't go anywhere
+ * since we are walking the uplqueue. No need to hold
+ * another ref on the association. Now that the skb has been
+ * freed and accounted for everywhere, see if we need to send
+ * a window update SACK.
+ */
+ sctp_assoc_rwnd_update(asoc, true);
}
/* Purge the skb lists holding ulpevents. */
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Is SCTP throughput really this low compared to TCP?
2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
` (34 preceding siblings ...)
2014-04-22 21:50 ` Butler, Peter
@ 2014-04-23 12:59 ` Vlad Yasevich
35 siblings, 0 replies; 37+ messages in thread
From: Vlad Yasevich @ 2014-04-23 12:59 UTC (permalink / raw)
To: linux-sctp
On 04/22/2014 05:50 PM, Butler, Peter wrote:
> When I apply the patch you provided to the standard 3.14.0 kernel, I still get the highly erratic throughput (see output below). It was only when I did the full "git revert" as suggested by Daniel that the erratic behaviour went away.
>
> [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.240.3 -V -l 1452 -t 60
> iperf version 3.0.1 (10 January 2014)
> Linux Lab200slot2 3.14.0 #1 SMP Thu Apr 3 23:18:29 EDT 2014 x86_64
> Time: Tue, 22 Apr 2014 21:44:24 GMT
> Connecting to host 192.168.240.3, port 5201
> Cookie: Lab200slot2.1398203064.823332.513f07
> [ 4] local 192.168.240.2 port 55819 connected to 192.168.240.3 port 5201
> Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test
> [ ID] Interval Transfer Bandwidth
> [ 4] 0.00-1.08 sec 23.9 MBytes 186 Mbits/sec
> [ 4] 1.08-2.13 sec 16.0 MBytes 128 Mbits/sec
> [ 4] 2.13-3.95 sec 198 MBytes 913 Mbits/sec
> [ 4] 3.95-4.00 sec 15.8 MBytes 2.62 Gbits/sec
> [ 4] 4.00-5.00 sec 226 MBytes 1.90 Gbits/sec
> [ 4] 5.00-6.84 sec 180 MBytes 819 Mbits/sec
> [ 4] 6.84-7.00 sec 44.0 MBytes 2.30 Gbits/sec
> [ 4] 7.00-8.01 sec 6.31 MBytes 52.2 Mbits/sec
> [ 4] 8.01-9.08 sec 21.3 MBytes 167 Mbits/sec
> [ 4] 9.08-10.12 sec 13.2 MBytes 107 Mbits/sec
> [ 4] 10.12-11.17 sec 14.8 MBytes 119 Mbits/sec
> [ 4] 11.17-12.97 sec 180 MBytes 839 Mbits/sec
> [ 4] 12.97-13.00 sec 8.25 MBytes 2.27 Gbits/sec
> [ 4] 13.00-14.10 sec 30.6 MBytes 234 Mbits/sec
> [ 4] 14.10-15.95 sec 191 MBytes 866 Mbits/sec
> [ 4] 15.95-16.00 sec 15.1 MBytes 2.51 Gbits/sec
> [ 4] 16.00-17.00 sec 219 MBytes 1.84 Gbits/sec
> [ 4] 17.00-18.09 sec 28.5 MBytes 218 Mbits/sec
> [ 4] 18.09-19.13 sec 11.4 MBytes 92.5 Mbits/sec
> [ 4] 19.13-20.17 sec 14.1 MBytes 114 Mbits/sec
> [ 4] 20.17-21.21 sec 13.0 MBytes 105 Mbits/sec
> [ 4] 21.21-23.27 sec 16.8 MBytes 68.4 Mbits/sec
> [ 4] 23.27-23.27 sec 0.00 Bytes 0.00 bits/sec
> [ 4] 23.27-24.00 sec 168 MBytes 1.91 Gbits/sec
> [ 4] 24.00-25.76 sec 179 MBytes 852 Mbits/sec
> [ 4] 25.76-26.00 sec 21.2 MBytes 760 Mbits/sec
>
>
Thanks Peter. This means there is something else wrong...
-vlad
>
>
>
>
> -----Original Message-----
> From: Vlad Yasevich [mailto:vyasevich@gmail.com]
> Sent: April-17-14 11:27 AM
> To: Butler, Peter; Daniel Borkmann
> Cc: linux-sctp@vger.kernel.org
> Subject: Re: Is SCTP throughput really this low compared to TCP?
>
> On 04/14/2014 12:47 PM, Butler, Peter wrote:
>> Glad to be of help :o)
>>
>
> Hi Peter
>
> Would you be able to run this test again with the following patch on top of the problematic code.
>
> Thanks
> -vlad
>
>
> commit c9888a220916284403c5115d6c6c7e33a00d0b55
> Author: Vlad Yasevich <vyasevic@redhat.com>
> Date: Thu Apr 17 09:21:52 2014 -0400
>
> sctp: Trigger window update SACK after skb has been freed.
>
> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
>
> diff --git a/net/sctp/ulpevent.c b/net/sctp/ulpevent.c index 8d198ae..b59a7c5 100644
> --- a/net/sctp/ulpevent.c
> +++ b/net/sctp/ulpevent.c
> @@ -1011,7 +1011,6 @@ static void sctp_ulpevent_release_data(struct sctp_ulpevent *event) {
> struct sk_buff *skb, *frag;
> unsigned int len;
> - struct sctp_association *asoc;
>
> /* Current stack structures assume that the rcv buffer is
> * per socket. For UDP style sockets this is not true as
> @@ -1036,11 +1035,7 @@ static void sctp_ulpevent_release_data(struct sctp_ulpevent *event)
> }
>
> done:
> - asoc = event->asoc;
> - sctp_association_hold(asoc);
> sctp_ulpevent_release_owner(event);
> - sctp_assoc_rwnd_update(asoc, true);
> - sctp_association_put(asoc);
> }
>
> static void sctp_ulpevent_release_frag_data(struct sctp_ulpevent *event) @@ -1071,12 +1066,21 @@ done:
> */
> void sctp_ulpevent_free(struct sctp_ulpevent *event) {
> + struct sctp_association *asoc = event->asoc;
> +
> if (sctp_ulpevent_is_notification(event))
> sctp_ulpevent_release_owner(event);
> else
> sctp_ulpevent_release_data(event);
>
> kfree_skb(sctp_event2skb(event));
> + /* The socket is locked and the assocaiton can't go anywhere
> + * since we are walking the uplqueue. No need to hold
> + * another ref on the association. Now that the skb has been
> + * freed and accounted for everywhere, see if we need to send
> + * a window update SACK.
> + */
> + sctp_assoc_rwnd_update(asoc, true);
> }
>
> /* Purge the skb lists holding ulpevents. */
>
^ permalink raw reply [flat|nested] 37+ messages in thread
end of thread, other threads:[~2014-04-23 12:59 UTC | newest]
Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-10 19:12 Is SCTP throughput really this low compared to TCP? Butler, Peter
2014-04-10 20:21 ` Vlad Yasevich
2014-04-10 20:40 ` Butler, Peter
2014-04-10 21:00 ` Vlad Yasevich
2014-04-11 7:42 ` Daniel Borkmann
2014-04-11 15:07 ` Butler, Peter
2014-04-11 15:21 ` Daniel Borkmann
2014-04-11 15:27 ` Vlad Yasevich
2014-04-11 15:35 ` Daniel Borkmann
2014-04-11 18:19 ` Vlad Yasevich
2014-04-11 18:22 ` Butler, Peter
2014-04-11 18:40 ` Daniel Borkmann
2014-04-11 18:41 ` Daniel Borkmann
2014-04-11 18:58 ` Butler, Peter
2014-04-11 19:16 ` Butler, Peter
2014-04-11 19:20 ` Vlad Yasevich
2014-04-11 19:24 ` Butler, Peter
2014-04-11 20:14 ` Butler, Peter
2014-04-11 20:18 ` Butler, Peter
2014-04-11 20:51 ` Vlad Yasevich
2014-04-11 20:53 ` Vlad Yasevich
2014-04-11 20:57 ` Butler, Peter
2014-04-11 23:58 ` Daniel Borkmann
2014-04-12 7:27 ` Dongsheng Song
2014-04-14 14:52 ` Butler, Peter
2014-04-14 15:49 ` Butler, Peter
2014-04-14 16:43 ` Butler, Peter
2014-04-14 16:45 ` Daniel Borkmann
2014-04-14 16:47 ` Butler, Peter
2014-04-14 17:06 ` Butler, Peter
2014-04-14 17:10 ` Butler, Peter
2014-04-14 18:54 ` Matija Glavinic Pecotic
2014-04-14 19:46 ` Daniel Borkmann
2014-04-17 15:26 ` Vlad Yasevich
2014-04-17 16:15 ` Butler, Peter
2014-04-22 21:50 ` Butler, Peter
2014-04-23 12:59 ` Vlad Yasevich
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.