All of lore.kernel.org
 help / color / mirror / Atom feed
* SCTP throughput does not scale
@ 2014-05-01 17:55 Butler, Peter
  2014-05-01 22:51 ` Vlad Yasevich
                   ` (13 more replies)
  0 siblings, 14 replies; 15+ messages in thread
From: Butler, Peter @ 2014-05-01 17:55 UTC (permalink / raw)
  To: linux-sctp

It would appear that, in a latency-limited* network, the overall SCTP throughput for a given system does not scale.  That is, no matter how many associations are instantiated on a given system the overall throughput remains capped at the same limit.  This is in contrast to TCP and UDP where one can, for example, obtain twice the throughput by instantiating 2 TCP/UDP connections as opposed to only 1.

*The above observation applies to a network where the throughput is limited due to network latency as opposed to CPU and/or network bandwidth.  (In a low-latency system where throughput is instead limited by CPU and/or network bandwidth we would not expect to be able to obtain better throughput with more associations as the system/network is already maxed out).

Quantitative summary and detailed results are shown below for TCP and SCTP (UDP not shown, for brevity, but behavior is similar to TCP: 2 UDP 'connections' yield twice the throughput as 1 UDP 'connection', etc.).  

Testing performed on both the 3.4.2 kernel and 3.14.0 kernel, using iperf3, with a network RTT of 50 ms (manually implemented via tc-netem), using 1000-byte messages, with both send and receive socket kernel buffer sizes of 2 MB, over a 10 Gbps backplane (although for this particular testing (i.e. with 50 ms RTT latency) the high-speed backplane doesn't really factor in).

Summary (rounding errors included in numbers below):

TCP, 1 connection:    1 x 144 Mbps = 144 Mbps total throughput
TCP, 2 connections:   2 x 144 Mbps = 287 Mbps total throughput
TCP, 3 connections:   3 x 145 Mbps = 434 Mbps total throughput

SCTP, 1 association:    1 x 122 Mbps = 122 Mbps total throughput
SCTP, 2 associations:   2 x 61.4 Mbps = 123 Mbps total throughput
SCTP, 3 associations:   3 x 41.4 Mbps = 124 Mbps total throughput

TCP (and UDP, not shown) scales, SCTP does not.



Actual iperf3 output below.


(1a) TCP, one connection

iperf version 3.0.1 (10 January 2014)
Linux Lab200slot2 3.4.2-1.fc16.x86_64 #1 SMP Thu Jun 14 20:17:26 UTC 2012 x86_64
Time: Sat, 26 Apr 2014 08:17:43 GMT
Connecting to host 192.168.240.3, port 5201
      Cookie: Lab200slot2.1398500263.347708.23b404
      TCP MSS: 1448 (default)
[  4] local 192.168.240.2 port 40776 connected to 192.168.240.3 port 5201
Starting Test: protocol: TCP, 1 streams, 1000 byte blocks, omitting 0 seconds, 600 second test
.
. <snip>
.
Test Complete. Summary Results:
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-600.05 sec  10.1 GBytes   144 Mbits/sec    0             sender
[  4]   0.00-600.05 sec  10.1 GBytes   144 Mbits/sec                  receiver
CPU Utilization: local/sender 1.2% (0.1%u/1.1%s), remote/receiver 3.2% (0.3%u/2.9%s)



(1b) TCP, two connections

iperf version 3.0.1 (10 January 2014)
Linux Lab200slot2 3.4.2-1.fc16.x86_64 #1 SMP Thu Jun 14 20:17:26 UTC 2012 x86_64
Time: Sat, 26 Apr 2014 08:28:44 GMT
Connecting to host 192.168.240.3, port 5201
      Cookie: Lab200slot2.1398500924.026363.19e3a5
      TCP MSS: 1448 (default)
[  4] local 192.168.240.2 port 40780 connected to 192.168.240.3 port 5201
[  6] local 192.168.240.2 port 40781 connected to 192.168.240.3 port 5201
Starting Test: protocol: TCP, 2 streams, 1000 byte blocks, omitting 0 seconds, 600 second test
.
. <snip>
.
Test Complete. Summary Results:
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-600.05 sec  10.0 GBytes   144 Mbits/sec  137             sender
[  4]   0.00-600.05 sec  10.0 GBytes   144 Mbits/sec                  receiver
[  6]   0.00-600.05 sec  10.0 GBytes   144 Mbits/sec    2             sender
[  6]   0.00-600.05 sec  10.0 GBytes   144 Mbits/sec                  receiver
[SUM]   0.00-600.05 sec  20.1 GBytes   287 Mbits/sec  139             sender
[SUM]   0.00-600.05 sec  20.0 GBytes   287 Mbits/sec                  receiver
CPU Utilization: local/sender 2.5% (0.2%u/2.3%s), remote/receiver 4.4% (0.4%u/4.1%s)



(1c) TCP, three connections

iperf version 3.0.1 (10 January 2014)
Linux Lab200slot2 3.4.2-1.fc16.x86_64 #1 SMP Thu Jun 14 20:17:26 UTC 2012 x86_64
Time: Sat, 26 Apr 2014 08:39:44 GMT
Connecting to host 192.168.240.3, port 5201
      Cookie: Lab200slot2.1398501584.755085.3c10fa
      TCP MSS: 1448 (default)
[  4] local 192.168.240.2 port 40785 connected to 192.168.240.3 port 5201
[  6] local 192.168.240.2 port 40786 connected to 192.168.240.3 port 5201
[  8] local 192.168.240.2 port 40787 connected to 192.168.240.3 port 5201
Starting Test: protocol: TCP, 3 streams, 1000 byte blocks, omitting 0 seconds, 600 second test
.
. <snip>
.
Test Complete. Summary Results:
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-600.05 sec  10.1 GBytes   145 Mbits/sec    0             sender
[  4]   0.00-600.05 sec  10.1 GBytes   145 Mbits/sec                  receiver
[  6]   0.00-600.05 sec  10.1 GBytes   145 Mbits/sec    7             sender
[  6]   0.00-600.05 sec  10.1 GBytes   145 Mbits/sec                  receiver
[  8]   0.00-600.05 sec  10.1 GBytes   145 Mbits/sec    4             sender
[  8]   0.00-600.05 sec  10.1 GBytes   145 Mbits/sec                  receiver
[SUM]   0.00-600.05 sec  30.3 GBytes   434 Mbits/sec   11             sender
[SUM]   0.00-600.05 sec  30.3 GBytes   434 Mbits/sec                  receiver
CPU Utilization: local/sender 3.7% (0.3%u/3.4%s), remote/receiver 5.7% (0.5%u/5.3%s)



(2a) SCTP, one association

iperf version 3.0.1 (10 January 2014)
Linux Lab200slot2 3.4.2-1.fc16.x86_64 #1 SMP Thu Jun 14 20:17:26 UTC 2012 x86_64
Time: Sat, 26 Apr 2014 05:30:34 GMT
Connecting to host 192.168.240.3, port 5201
      Cookie: Lab200slot2.1398490234.326620.36ee2f
 [  4] local 192.168.240.2 port 46631 connected to 192.168.240.3 port 5201
Starting Test: protocol: SCTP, 1 streams, 1000 byte blocks, omitting 0 seconds, 600 second test
.
. <snip>
.
Test Complete. Summary Results:
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-600.07 sec  8.52 GBytes   122 Mbits/sec                  sender
[  4]   0.00-600.07 sec  8.52 GBytes   122 Mbits/sec                  receiver
CPU Utilization: local/sender 4.1% (0.2%u/3.9%s), remote/receiver 2.0% (0.2%u/1.8%s)



(2b) SCTP, two associations

iperf version 3.0.1 (10 January 2014)
Linux Lab200slot2 3.4.2-1.fc16.x86_64 #1 SMP Thu Jun 14 20:17:26 UTC 2012 x86_64
Time: Sat, 26 Apr 2014 05:41:35 GMT
Connecting to host 192.168.240.3, port 5201
      Cookie: Lab200slot2.1398490895.079359.28c5c1
 [  4] local 192.168.240.2 port 34175 connected to 192.168.240.3 port 5201
 [  6] local 192.168.240.2 port 41448 connected to 192.168.240.3 port 5201
Starting Test: protocol: SCTP, 2 streams, 1000 byte blocks, omitting 0 seconds, 600 second test
.
. <snip>
.
Test Complete. Summary Results:
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-600.09 sec  4.29 GBytes  61.4 Mbits/sec                  sender
[  4]   0.00-600.09 sec  4.29 GBytes  61.4 Mbits/sec                  receiver
[  6]   0.00-600.09 sec  4.29 GBytes  61.4 Mbits/sec                  sender
[  6]   0.00-600.09 sec  4.29 GBytes  61.4 Mbits/sec                  receiver
[SUM]   0.00-600.09 sec  8.58 GBytes   123 Mbits/sec                  sender
[SUM]   0.00-600.09 sec  8.58 GBytes   123 Mbits/sec                  receiver
CPU Utilization: local/sender 2.5% (0.1%u/2.4%s), remote/receiver 1.7% (0.2%u/1.5%s)



(2c) SCTP, three associations

iperf version 3.0.1 (10 January 2014)
Linux Lab200slot2 3.4.2-1.fc16.x86_64 #1 SMP Thu Jun 14 20:17:26 UTC 2012 x86_64
Time: Sat, 26 Apr 2014 05:52:36 GMT
Connecting to host 192.168.240.3, port 5201
      Cookie: Lab200slot2.1398491555.947776.10993d
 [  4] local 192.168.240.2 port 45551 connected to 192.168.240.3 port 5201
 [  6] local 192.168.240.2 port 35528 connected to 192.168.240.3 port 5201
 [  8] local 192.168.240.2 port 44540 connected to 192.168.240.3 port 5201
Starting Test: protocol: SCTP, 3 streams, 1000 byte blocks, omitting 0 seconds, 600 second test
.
. <snip>
.
Test Complete. Summary Results:
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-600.08 sec  2.90 GBytes  41.4 Mbits/sec                  sender
[  4]   0.00-600.08 sec  2.90 GBytes  41.4 Mbits/sec                  receiver
[  6]   0.00-600.08 sec  2.90 GBytes  41.4 Mbits/sec                  sender
[  6]   0.00-600.08 sec  2.90 GBytes  41.4 Mbits/sec                  receiver
[  8]   0.00-600.08 sec  2.90 GBytes  41.4 Mbits/sec                  sender
[  8]   0.00-600.08 sec  2.90 GBytes  41.4 Mbits/sec                  receiver
[SUM]   0.00-600.08 sec  8.69 GBytes   124 Mbits/sec                  sender
[SUM]   0.00-600.08 sec  8.69 GBytes   124 Mbits/sec                  receiver
CPU Utilization: local/sender 2.6% (0.1%u/2.5%s), remote/receiver 1.6% (0.1%u/1.5%s)





^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: SCTP throughput does not scale
  2014-05-01 17:55 SCTP throughput does not scale Butler, Peter
@ 2014-05-01 22:51 ` Vlad Yasevich
  2014-05-02  6:00 ` Butler, Peter
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Vlad Yasevich @ 2014-05-01 22:51 UTC (permalink / raw)
  To: linux-sctp

On 05/01/2014 01:55 PM, Butler, Peter wrote:
> It would appear that, in a latency-limited* network, the overall SCTP throughput for a given system does not scale.  That is, no matter how many associations are instantiated on a given system the overall throughput remains capped at the same limit.  This is in contrast to TCP and UDP where one can, for example, obtain twice the throughput by instantiating 2 TCP/UDP connections as opposed to only 1.
> 
> *The above observation applies to a network where the throughput is limited due to network latency as opposed to CPU and/or network bandwidth.  (In a low-latency system where throughput is instead limited by CPU and/or network bandwidth we would not expect to be able to obtain better throughput with more associations as the system/network is already maxed out).
> 
> Quantitative summary and detailed results are shown below for TCP and SCTP (UDP not shown, for brevity, but behavior is similar to TCP: 2 UDP 'connections' yield twice the throughput as 1 UDP 'connection', etc.).  
> 
> Testing performed on both the 3.4.2 kernel and 3.14.0 kernel, using iperf3, with a network RTT of 50 ms (manually implemented via tc-netem), using 1000-byte messages, with both send and receive socket kernel buffer sizes of 2 MB, over a 10 Gbps backplane (although for this particular testing (i.e. with 50 ms RTT latency) the high-speed backplane doesn't really factor in).
> 

Hi Peter

on the 3.14 kernel,  could you retest with
/proc/sys/net/sctp/max_burst = 0

Thanks
-vlad

> Summary (rounding errors included in numbers below):
> 
> TCP, 1 connection:    1 x 144 Mbps = 144 Mbps total throughput
> TCP, 2 connections:   2 x 144 Mbps = 287 Mbps total throughput
> TCP, 3 connections:   3 x 145 Mbps = 434 Mbps total throughput
> 
> SCTP, 1 association:    1 x 122 Mbps = 122 Mbps total throughput
> SCTP, 2 associations:   2 x 61.4 Mbps = 123 Mbps total throughput
> SCTP, 3 associations:   3 x 41.4 Mbps = 124 Mbps total throughput
> 
> TCP (and UDP, not shown) scales, SCTP does not.
> 
> 
> 
> Actual iperf3 output below.
> 
> 
> (1a) TCP, one connection
> 
> iperf version 3.0.1 (10 January 2014)
> Linux Lab200slot2 3.4.2-1.fc16.x86_64 #1 SMP Thu Jun 14 20:17:26 UTC 2012 x86_64
> Time: Sat, 26 Apr 2014 08:17:43 GMT
> Connecting to host 192.168.240.3, port 5201
>       Cookie: Lab200slot2.1398500263.347708.23b404
>       TCP MSS: 1448 (default)
> [  4] local 192.168.240.2 port 40776 connected to 192.168.240.3 port 5201
> Starting Test: protocol: TCP, 1 streams, 1000 byte blocks, omitting 0 seconds, 600 second test
> .
> . <snip>
> .
> Test Complete. Summary Results:
> [ ID] Interval           Transfer     Bandwidth       Retr
> [  4]   0.00-600.05 sec  10.1 GBytes   144 Mbits/sec    0             sender
> [  4]   0.00-600.05 sec  10.1 GBytes   144 Mbits/sec                  receiver
> CPU Utilization: local/sender 1.2% (0.1%u/1.1%s), remote/receiver 3.2% (0.3%u/2.9%s)
> 
> 
> 
> (1b) TCP, two connections
> 
> iperf version 3.0.1 (10 January 2014)
> Linux Lab200slot2 3.4.2-1.fc16.x86_64 #1 SMP Thu Jun 14 20:17:26 UTC 2012 x86_64
> Time: Sat, 26 Apr 2014 08:28:44 GMT
> Connecting to host 192.168.240.3, port 5201
>       Cookie: Lab200slot2.1398500924.026363.19e3a5
>       TCP MSS: 1448 (default)
> [  4] local 192.168.240.2 port 40780 connected to 192.168.240.3 port 5201
> [  6] local 192.168.240.2 port 40781 connected to 192.168.240.3 port 5201
> Starting Test: protocol: TCP, 2 streams, 1000 byte blocks, omitting 0 seconds, 600 second test
> .
> . <snip>
> .
> Test Complete. Summary Results:
> [ ID] Interval           Transfer     Bandwidth       Retr
> [  4]   0.00-600.05 sec  10.0 GBytes   144 Mbits/sec  137             sender
> [  4]   0.00-600.05 sec  10.0 GBytes   144 Mbits/sec                  receiver
> [  6]   0.00-600.05 sec  10.0 GBytes   144 Mbits/sec    2             sender
> [  6]   0.00-600.05 sec  10.0 GBytes   144 Mbits/sec                  receiver
> [SUM]   0.00-600.05 sec  20.1 GBytes   287 Mbits/sec  139             sender
> [SUM]   0.00-600.05 sec  20.0 GBytes   287 Mbits/sec                  receiver
> CPU Utilization: local/sender 2.5% (0.2%u/2.3%s), remote/receiver 4.4% (0.4%u/4.1%s)
> 
> 
> 
> (1c) TCP, three connections
> 
> iperf version 3.0.1 (10 January 2014)
> Linux Lab200slot2 3.4.2-1.fc16.x86_64 #1 SMP Thu Jun 14 20:17:26 UTC 2012 x86_64
> Time: Sat, 26 Apr 2014 08:39:44 GMT
> Connecting to host 192.168.240.3, port 5201
>       Cookie: Lab200slot2.1398501584.755085.3c10fa
>       TCP MSS: 1448 (default)
> [  4] local 192.168.240.2 port 40785 connected to 192.168.240.3 port 5201
> [  6] local 192.168.240.2 port 40786 connected to 192.168.240.3 port 5201
> [  8] local 192.168.240.2 port 40787 connected to 192.168.240.3 port 5201
> Starting Test: protocol: TCP, 3 streams, 1000 byte blocks, omitting 0 seconds, 600 second test
> .
> . <snip>
> .
> Test Complete. Summary Results:
> [ ID] Interval           Transfer     Bandwidth       Retr
> [  4]   0.00-600.05 sec  10.1 GBytes   145 Mbits/sec    0             sender
> [  4]   0.00-600.05 sec  10.1 GBytes   145 Mbits/sec                  receiver
> [  6]   0.00-600.05 sec  10.1 GBytes   145 Mbits/sec    7             sender
> [  6]   0.00-600.05 sec  10.1 GBytes   145 Mbits/sec                  receiver
> [  8]   0.00-600.05 sec  10.1 GBytes   145 Mbits/sec    4             sender
> [  8]   0.00-600.05 sec  10.1 GBytes   145 Mbits/sec                  receiver
> [SUM]   0.00-600.05 sec  30.3 GBytes   434 Mbits/sec   11             sender
> [SUM]   0.00-600.05 sec  30.3 GBytes   434 Mbits/sec                  receiver
> CPU Utilization: local/sender 3.7% (0.3%u/3.4%s), remote/receiver 5.7% (0.5%u/5.3%s)
> 
> 
> 
> (2a) SCTP, one association
> 
> iperf version 3.0.1 (10 January 2014)
> Linux Lab200slot2 3.4.2-1.fc16.x86_64 #1 SMP Thu Jun 14 20:17:26 UTC 2012 x86_64
> Time: Sat, 26 Apr 2014 05:30:34 GMT
> Connecting to host 192.168.240.3, port 5201
>       Cookie: Lab200slot2.1398490234.326620.36ee2f
>  [  4] local 192.168.240.2 port 46631 connected to 192.168.240.3 port 5201
> Starting Test: protocol: SCTP, 1 streams, 1000 byte blocks, omitting 0 seconds, 600 second test
> .
> . <snip>
> .
> Test Complete. Summary Results:
> [ ID] Interval           Transfer     Bandwidth
> [  4]   0.00-600.07 sec  8.52 GBytes   122 Mbits/sec                  sender
> [  4]   0.00-600.07 sec  8.52 GBytes   122 Mbits/sec                  receiver
> CPU Utilization: local/sender 4.1% (0.2%u/3.9%s), remote/receiver 2.0% (0.2%u/1.8%s)
> 
> 
> 
> (2b) SCTP, two associations
> 
> iperf version 3.0.1 (10 January 2014)
> Linux Lab200slot2 3.4.2-1.fc16.x86_64 #1 SMP Thu Jun 14 20:17:26 UTC 2012 x86_64
> Time: Sat, 26 Apr 2014 05:41:35 GMT
> Connecting to host 192.168.240.3, port 5201
>       Cookie: Lab200slot2.1398490895.079359.28c5c1
>  [  4] local 192.168.240.2 port 34175 connected to 192.168.240.3 port 5201
>  [  6] local 192.168.240.2 port 41448 connected to 192.168.240.3 port 5201
> Starting Test: protocol: SCTP, 2 streams, 1000 byte blocks, omitting 0 seconds, 600 second test
> .
> . <snip>
> .
> Test Complete. Summary Results:
> [ ID] Interval           Transfer     Bandwidth
> [  4]   0.00-600.09 sec  4.29 GBytes  61.4 Mbits/sec                  sender
> [  4]   0.00-600.09 sec  4.29 GBytes  61.4 Mbits/sec                  receiver
> [  6]   0.00-600.09 sec  4.29 GBytes  61.4 Mbits/sec                  sender
> [  6]   0.00-600.09 sec  4.29 GBytes  61.4 Mbits/sec                  receiver
> [SUM]   0.00-600.09 sec  8.58 GBytes   123 Mbits/sec                  sender
> [SUM]   0.00-600.09 sec  8.58 GBytes   123 Mbits/sec                  receiver
> CPU Utilization: local/sender 2.5% (0.1%u/2.4%s), remote/receiver 1.7% (0.2%u/1.5%s)
> 
> 
> 
> (2c) SCTP, three associations
> 
> iperf version 3.0.1 (10 January 2014)
> Linux Lab200slot2 3.4.2-1.fc16.x86_64 #1 SMP Thu Jun 14 20:17:26 UTC 2012 x86_64
> Time: Sat, 26 Apr 2014 05:52:36 GMT
> Connecting to host 192.168.240.3, port 5201
>       Cookie: Lab200slot2.1398491555.947776.10993d
>  [  4] local 192.168.240.2 port 45551 connected to 192.168.240.3 port 5201
>  [  6] local 192.168.240.2 port 35528 connected to 192.168.240.3 port 5201
>  [  8] local 192.168.240.2 port 44540 connected to 192.168.240.3 port 5201
> Starting Test: protocol: SCTP, 3 streams, 1000 byte blocks, omitting 0 seconds, 600 second test
> .
> . <snip>
> .
> Test Complete. Summary Results:
> [ ID] Interval           Transfer     Bandwidth
> [  4]   0.00-600.08 sec  2.90 GBytes  41.4 Mbits/sec                  sender
> [  4]   0.00-600.08 sec  2.90 GBytes  41.4 Mbits/sec                  receiver
> [  6]   0.00-600.08 sec  2.90 GBytes  41.4 Mbits/sec                  sender
> [  6]   0.00-600.08 sec  2.90 GBytes  41.4 Mbits/sec                  receiver
> [  8]   0.00-600.08 sec  2.90 GBytes  41.4 Mbits/sec                  sender
> [  8]   0.00-600.08 sec  2.90 GBytes  41.4 Mbits/sec                  receiver
> [SUM]   0.00-600.08 sec  8.69 GBytes   124 Mbits/sec                  sender
> [SUM]   0.00-600.08 sec  8.69 GBytes   124 Mbits/sec                  receiver
> CPU Utilization: local/sender 2.6% (0.1%u/2.5%s), remote/receiver 1.6% (0.1%u/1.5%s)
> 
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: SCTP throughput does not scale
  2014-05-01 17:55 SCTP throughput does not scale Butler, Peter
  2014-05-01 22:51 ` Vlad Yasevich
@ 2014-05-02  6:00 ` Butler, Peter
  2014-05-02 11:34 ` Neil Horman
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Butler, Peter @ 2014-05-02  6:00 UTC (permalink / raw)
  To: linux-sctp

Hi Vlad

Just tested that now, as per your suggestion (max_burst=0 on 3.14.0, all other parameters same as before).  The behaviour is the same - that is, SCTP throughput still does not scale, whereas TCP throughput and UDP throughput do.  Results are shown below for 1, 2, 3 and 4 parallel associations all transmitting DATA at their maximum rate.  Note that I used the stock 3.14.0 kernel, but with Daniel Borkmann's suggestion (from April 11 - see other mailing list thread that I initiated called " Is SCTP throughput really this low compared to TCP? ") to revert out some code, namely:

git revert ef2820a735f74ea60335f8ba3801b844f0cb184d

(Without this modification, the 3.14.0 throughput is very poor, as originally reported.)

Here are the results (SCTP only) using your suggestion, based on 10-minute tests for each configuration.  In each case the CPU usage is very small - the throughput is limited entirely by the network latency, not the CPU and/or network bandwidths.

SCTP, 1 association:    1 x 102 Mbps = 102 Mbps total throughput
SCTP, 2 associations:   2 x 59.2 Mbps = 118 Mbps total throughput
SCTP, 3 associations:   3 x 39.8 Mbps = 119 Mbps total throughput
SCTP, 4 associations:   4 x 30.7 Mbps = 123 Mbps total throughput

For quick reference, here are the results for the same setup, but on the 3.4.2 kernel (and max_burst having its default value of 4).

SCTP, 1 association:    1 x 122 Mbps = 122 Mbps total throughput
SCTP, 2 associations:   2 x 61.4 Mbps = 123 Mbps total throughput
SCTP, 3 associations:   3 x 41.4 Mbps = 124 Mbps total throughput
SCTP, 4 associations:   4 x 25.3 Mbps = 127 Mbps total throughput




-----Original Message-----
From: Vlad Yasevich [mailto:vyasevich@gmail.com] 
Sent: May-01-14 6:52 PM
To: Butler, Peter; linux-sctp@vger.kernel.org
Subject: Re: SCTP throughput does not scale

On 05/01/2014 01:55 PM, Butler, Peter wrote:
> It would appear that, in a latency-limited* network, the overall SCTP throughput for a given system does not scale.  That is, no matter how many associations are instantiated on a given system the overall throughput remains capped at the same limit.  This is in contrast to TCP and UDP where one can, for example, obtain twice the throughput by instantiating 2 TCP/UDP connections as opposed to only 1.
> 
> *The above observation applies to a network where the throughput is limited due to network latency as opposed to CPU and/or network bandwidth.  (In a low-latency system where throughput is instead limited by CPU and/or network bandwidth we would not expect to be able to obtain better throughput with more associations as the system/network is already maxed out).
> 
> Quantitative summary and detailed results are shown below for TCP and SCTP (UDP not shown, for brevity, but behavior is similar to TCP: 2 UDP 'connections' yield twice the throughput as 1 UDP 'connection', etc.).  
> 
> Testing performed on both the 3.4.2 kernel and 3.14.0 kernel, using iperf3, with a network RTT of 50 ms (manually implemented via tc-netem), using 1000-byte messages, with both send and receive socket kernel buffer sizes of 2 MB, over a 10 Gbps backplane (although for this particular testing (i.e. with 50 ms RTT latency) the high-speed backplane doesn't really factor in).
> 

Hi Peter

on the 3.14 kernel,  could you retest with /proc/sys/net/sctp/max_burst = 0

Thanks
-vlad

> Summary (rounding errors included in numbers below):
> 
> TCP, 1 connection:    1 x 144 Mbps = 144 Mbps total throughput
> TCP, 2 connections:   2 x 144 Mbps = 287 Mbps total throughput
> TCP, 3 connections:   3 x 145 Mbps = 434 Mbps total throughput
> 
> SCTP, 1 association:    1 x 122 Mbps = 122 Mbps total throughput
> SCTP, 2 associations:   2 x 61.4 Mbps = 123 Mbps total throughput
> SCTP, 3 associations:   3 x 41.4 Mbps = 124 Mbps total throughput
> 
> TCP (and UDP, not shown) scales, SCTP does not.
> 
> 
> 
> Actual iperf3 output below.
> 
> 
> (1a) TCP, one connection
> 
> iperf version 3.0.1 (10 January 2014)
> Linux Lab200slot2 3.4.2-1.fc16.x86_64 #1 SMP Thu Jun 14 20:17:26 UTC 
> 2012 x86_64
> Time: Sat, 26 Apr 2014 08:17:43 GMT
> Connecting to host 192.168.240.3, port 5201
>       Cookie: Lab200slot2.1398500263.347708.23b404
>       TCP MSS: 1448 (default)
> [  4] local 192.168.240.2 port 40776 connected to 192.168.240.3 port 
> 5201 Starting Test: protocol: TCP, 1 streams, 1000 byte blocks, 
> omitting 0 seconds, 600 second test .
> . <snip>
> .
> Test Complete. Summary Results:
> [ ID] Interval           Transfer     Bandwidth       Retr
> [  4]   0.00-600.05 sec  10.1 GBytes   144 Mbits/sec    0             sender
> [  4]   0.00-600.05 sec  10.1 GBytes   144 Mbits/sec                  receiver
> CPU Utilization: local/sender 1.2% (0.1%u/1.1%s), remote/receiver 3.2% 
> (0.3%u/2.9%s)
> 
> 
> 
> (1b) TCP, two connections
> 
> iperf version 3.0.1 (10 January 2014)
> Linux Lab200slot2 3.4.2-1.fc16.x86_64 #1 SMP Thu Jun 14 20:17:26 UTC 
> 2012 x86_64
> Time: Sat, 26 Apr 2014 08:28:44 GMT
> Connecting to host 192.168.240.3, port 5201
>       Cookie: Lab200slot2.1398500924.026363.19e3a5
>       TCP MSS: 1448 (default)
> [  4] local 192.168.240.2 port 40780 connected to 192.168.240.3 port 
> 5201 [  6] local 192.168.240.2 port 40781 connected to 192.168.240.3 
> port 5201 Starting Test: protocol: TCP, 2 streams, 1000 byte blocks, 
> omitting 0 seconds, 600 second test .
> . <snip>
> .
> Test Complete. Summary Results:
> [ ID] Interval           Transfer     Bandwidth       Retr
> [  4]   0.00-600.05 sec  10.0 GBytes   144 Mbits/sec  137             sender
> [  4]   0.00-600.05 sec  10.0 GBytes   144 Mbits/sec                  receiver
> [  6]   0.00-600.05 sec  10.0 GBytes   144 Mbits/sec    2             sender
> [  6]   0.00-600.05 sec  10.0 GBytes   144 Mbits/sec                  receiver
> [SUM]   0.00-600.05 sec  20.1 GBytes   287 Mbits/sec  139             sender
> [SUM]   0.00-600.05 sec  20.0 GBytes   287 Mbits/sec                  receiver
> CPU Utilization: local/sender 2.5% (0.2%u/2.3%s), remote/receiver 4.4% 
> (0.4%u/4.1%s)
> 
> 
> 
> (1c) TCP, three connections
> 
> iperf version 3.0.1 (10 January 2014)
> Linux Lab200slot2 3.4.2-1.fc16.x86_64 #1 SMP Thu Jun 14 20:17:26 UTC 
> 2012 x86_64
> Time: Sat, 26 Apr 2014 08:39:44 GMT
> Connecting to host 192.168.240.3, port 5201
>       Cookie: Lab200slot2.1398501584.755085.3c10fa
>       TCP MSS: 1448 (default)
> [  4] local 192.168.240.2 port 40785 connected to 192.168.240.3 port 
> 5201 [  6] local 192.168.240.2 port 40786 connected to 192.168.240.3 
> port 5201 [  8] local 192.168.240.2 port 40787 connected to 
> 192.168.240.3 port 5201 Starting Test: protocol: TCP, 3 streams, 1000 
> byte blocks, omitting 0 seconds, 600 second test .
> . <snip>
> .
> Test Complete. Summary Results:
> [ ID] Interval           Transfer     Bandwidth       Retr
> [  4]   0.00-600.05 sec  10.1 GBytes   145 Mbits/sec    0             sender
> [  4]   0.00-600.05 sec  10.1 GBytes   145 Mbits/sec                  receiver
> [  6]   0.00-600.05 sec  10.1 GBytes   145 Mbits/sec    7             sender
> [  6]   0.00-600.05 sec  10.1 GBytes   145 Mbits/sec                  receiver
> [  8]   0.00-600.05 sec  10.1 GBytes   145 Mbits/sec    4             sender
> [  8]   0.00-600.05 sec  10.1 GBytes   145 Mbits/sec                  receiver
> [SUM]   0.00-600.05 sec  30.3 GBytes   434 Mbits/sec   11             sender
> [SUM]   0.00-600.05 sec  30.3 GBytes   434 Mbits/sec                  receiver
> CPU Utilization: local/sender 3.7% (0.3%u/3.4%s), remote/receiver 5.7% 
> (0.5%u/5.3%s)
> 
> 
> 
> (2a) SCTP, one association
> 
> iperf version 3.0.1 (10 January 2014)
> Linux Lab200slot2 3.4.2-1.fc16.x86_64 #1 SMP Thu Jun 14 20:17:26 UTC 
> 2012 x86_64
> Time: Sat, 26 Apr 2014 05:30:34 GMT
> Connecting to host 192.168.240.3, port 5201
>       Cookie: Lab200slot2.1398490234.326620.36ee2f
>  [  4] local 192.168.240.2 port 46631 connected to 192.168.240.3 port 
> 5201 Starting Test: protocol: SCTP, 1 streams, 1000 byte blocks, 
> omitting 0 seconds, 600 second test .
> . <snip>
> .
> Test Complete. Summary Results:
> [ ID] Interval           Transfer     Bandwidth
> [  4]   0.00-600.07 sec  8.52 GBytes   122 Mbits/sec                  sender
> [  4]   0.00-600.07 sec  8.52 GBytes   122 Mbits/sec                  receiver
> CPU Utilization: local/sender 4.1% (0.2%u/3.9%s), remote/receiver 2.0% 
> (0.2%u/1.8%s)
> 
> 
> 
> (2b) SCTP, two associations
> 
> iperf version 3.0.1 (10 January 2014)
> Linux Lab200slot2 3.4.2-1.fc16.x86_64 #1 SMP Thu Jun 14 20:17:26 UTC 
> 2012 x86_64
> Time: Sat, 26 Apr 2014 05:41:35 GMT
> Connecting to host 192.168.240.3, port 5201
>       Cookie: Lab200slot2.1398490895.079359.28c5c1
>  [  4] local 192.168.240.2 port 34175 connected to 192.168.240.3 port 
> 5201  [  6] local 192.168.240.2 port 41448 connected to 192.168.240.3 
> port 5201 Starting Test: protocol: SCTP, 2 streams, 1000 byte blocks, 
> omitting 0 seconds, 600 second test .
> . <snip>
> .
> Test Complete. Summary Results:
> [ ID] Interval           Transfer     Bandwidth
> [  4]   0.00-600.09 sec  4.29 GBytes  61.4 Mbits/sec                  sender
> [  4]   0.00-600.09 sec  4.29 GBytes  61.4 Mbits/sec                  receiver
> [  6]   0.00-600.09 sec  4.29 GBytes  61.4 Mbits/sec                  sender
> [  6]   0.00-600.09 sec  4.29 GBytes  61.4 Mbits/sec                  receiver
> [SUM]   0.00-600.09 sec  8.58 GBytes   123 Mbits/sec                  sender
> [SUM]   0.00-600.09 sec  8.58 GBytes   123 Mbits/sec                  receiver
> CPU Utilization: local/sender 2.5% (0.1%u/2.4%s), remote/receiver 1.7% 
> (0.2%u/1.5%s)
> 
> 
> 
> (2c) SCTP, three associations
> 
> iperf version 3.0.1 (10 January 2014)
> Linux Lab200slot2 3.4.2-1.fc16.x86_64 #1 SMP Thu Jun 14 20:17:26 UTC 
> 2012 x86_64
> Time: Sat, 26 Apr 2014 05:52:36 GMT
> Connecting to host 192.168.240.3, port 5201
>       Cookie: Lab200slot2.1398491555.947776.10993d
>  [  4] local 192.168.240.2 port 45551 connected to 192.168.240.3 port 
> 5201  [  6] local 192.168.240.2 port 35528 connected to 192.168.240.3 
> port 5201  [  8] local 192.168.240.2 port 44540 connected to 
> 192.168.240.3 port 5201 Starting Test: protocol: SCTP, 3 streams, 1000 
> byte blocks, omitting 0 seconds, 600 second test .
> . <snip>
> .
> Test Complete. Summary Results:
> [ ID] Interval           Transfer     Bandwidth
> [  4]   0.00-600.08 sec  2.90 GBytes  41.4 Mbits/sec                  sender
> [  4]   0.00-600.08 sec  2.90 GBytes  41.4 Mbits/sec                  receiver
> [  6]   0.00-600.08 sec  2.90 GBytes  41.4 Mbits/sec                  sender
> [  6]   0.00-600.08 sec  2.90 GBytes  41.4 Mbits/sec                  receiver
> [  8]   0.00-600.08 sec  2.90 GBytes  41.4 Mbits/sec                  sender
> [  8]   0.00-600.08 sec  2.90 GBytes  41.4 Mbits/sec                  receiver
> [SUM]   0.00-600.08 sec  8.69 GBytes   124 Mbits/sec                  sender
> [SUM]   0.00-600.08 sec  8.69 GBytes   124 Mbits/sec                  receiver
> CPU Utilization: local/sender 2.6% (0.1%u/2.5%s), remote/receiver 1.6% 
> (0.1%u/1.5%s)
> 
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" 
> in the body of a message to majordomo@vger.kernel.org More majordomo 
> info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: SCTP throughput does not scale
  2014-05-01 17:55 SCTP throughput does not scale Butler, Peter
  2014-05-01 22:51 ` Vlad Yasevich
  2014-05-02  6:00 ` Butler, Peter
@ 2014-05-02 11:34 ` Neil Horman
  2014-05-02 11:45 ` Butler, Peter
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Neil Horman @ 2014-05-02 11:34 UTC (permalink / raw)
  To: linux-sctp

On Thu, May 01, 2014 at 05:55:32PM +0000, Butler, Peter wrote:
> It would appear that, in a latency-limited* network, the overall SCTP throughput for a given system does not scale.  That is, no matter how many associations are instantiated on a given system the overall throughput remains capped at the same limit.  This is in contrast to TCP and UDP where one can, for example, obtain twice the throughput by instantiating 2 TCP/UDP connections as opposed to only 1.
> 
> *The above observation applies to a network where the throughput is limited due to network latency as opposed to CPU and/or network bandwidth.  (In a low-latency system where throughput is instead limited by CPU and/or network bandwidth we would not expect to be able to obtain better throughput with more associations as the system/network is already maxed out).
> 
> Quantitative summary and detailed results are shown below for TCP and SCTP (UDP not shown, for brevity, but behavior is similar to TCP: 2 UDP 'connections' yield twice the throughput as 1 UDP 'connection', etc.).  
> 
> Testing performed on both the 3.4.2 kernel and 3.14.0 kernel, using iperf3, with a network RTT of 50 ms (manually implemented via tc-netem), using 1000-byte messages, with both send and receive socket kernel buffer sizes of 2 MB, over a 10 Gbps backplane (although for this particular testing (i.e. with 50 ms RTT latency) the high-speed backplane doesn't really factor in).
> 
> Summary (rounding errors included in numbers below):
> 
> TCP, 1 connection:    1 x 144 Mbps = 144 Mbps total throughput
> TCP, 2 connections:   2 x 144 Mbps = 287 Mbps total throughput
> TCP, 3 connections:   3 x 145 Mbps = 434 Mbps total throughput
> 
> SCTP, 1 association:    1 x 122 Mbps = 122 Mbps total throughput
> SCTP, 2 associations:   2 x 61.4 Mbps = 123 Mbps total throughput
> SCTP, 3 associations:   3 x 41.4 Mbps = 124 Mbps total throughput
> 
What values are /proc/sys/net/sctp/[snd|rcv]buf_policy set to.  They default to
0, but should be set to 1 if you want to see througput scaling with multiple
associations.

Neil

> TCP (and UDP, not shown) scales, SCTP does not.
> 
> 
> 
> Actual iperf3 output below.
> 
> 
> (1a) TCP, one connection
> 
> iperf version 3.0.1 (10 January 2014)
> Linux Lab200slot2 3.4.2-1.fc16.x86_64 #1 SMP Thu Jun 14 20:17:26 UTC 2012 x86_64
> Time: Sat, 26 Apr 2014 08:17:43 GMT
> Connecting to host 192.168.240.3, port 5201
>       Cookie: Lab200slot2.1398500263.347708.23b404
>       TCP MSS: 1448 (default)
> [  4] local 192.168.240.2 port 40776 connected to 192.168.240.3 port 5201
> Starting Test: protocol: TCP, 1 streams, 1000 byte blocks, omitting 0 seconds, 600 second test
> .
> . <snip>
> .
> Test Complete. Summary Results:
> [ ID] Interval           Transfer     Bandwidth       Retr
> [  4]   0.00-600.05 sec  10.1 GBytes   144 Mbits/sec    0             sender
> [  4]   0.00-600.05 sec  10.1 GBytes   144 Mbits/sec                  receiver
> CPU Utilization: local/sender 1.2% (0.1%u/1.1%s), remote/receiver 3.2% (0.3%u/2.9%s)
> 
> 
> 
> (1b) TCP, two connections
> 
> iperf version 3.0.1 (10 January 2014)
> Linux Lab200slot2 3.4.2-1.fc16.x86_64 #1 SMP Thu Jun 14 20:17:26 UTC 2012 x86_64
> Time: Sat, 26 Apr 2014 08:28:44 GMT
> Connecting to host 192.168.240.3, port 5201
>       Cookie: Lab200slot2.1398500924.026363.19e3a5
>       TCP MSS: 1448 (default)
> [  4] local 192.168.240.2 port 40780 connected to 192.168.240.3 port 5201
> [  6] local 192.168.240.2 port 40781 connected to 192.168.240.3 port 5201
> Starting Test: protocol: TCP, 2 streams, 1000 byte blocks, omitting 0 seconds, 600 second test
> .
> . <snip>
> .
> Test Complete. Summary Results:
> [ ID] Interval           Transfer     Bandwidth       Retr
> [  4]   0.00-600.05 sec  10.0 GBytes   144 Mbits/sec  137             sender
> [  4]   0.00-600.05 sec  10.0 GBytes   144 Mbits/sec                  receiver
> [  6]   0.00-600.05 sec  10.0 GBytes   144 Mbits/sec    2             sender
> [  6]   0.00-600.05 sec  10.0 GBytes   144 Mbits/sec                  receiver
> [SUM]   0.00-600.05 sec  20.1 GBytes   287 Mbits/sec  139             sender
> [SUM]   0.00-600.05 sec  20.0 GBytes   287 Mbits/sec                  receiver
> CPU Utilization: local/sender 2.5% (0.2%u/2.3%s), remote/receiver 4.4% (0.4%u/4.1%s)
> 
> 
> 
> (1c) TCP, three connections
> 
> iperf version 3.0.1 (10 January 2014)
> Linux Lab200slot2 3.4.2-1.fc16.x86_64 #1 SMP Thu Jun 14 20:17:26 UTC 2012 x86_64
> Time: Sat, 26 Apr 2014 08:39:44 GMT
> Connecting to host 192.168.240.3, port 5201
>       Cookie: Lab200slot2.1398501584.755085.3c10fa
>       TCP MSS: 1448 (default)
> [  4] local 192.168.240.2 port 40785 connected to 192.168.240.3 port 5201
> [  6] local 192.168.240.2 port 40786 connected to 192.168.240.3 port 5201
> [  8] local 192.168.240.2 port 40787 connected to 192.168.240.3 port 5201
> Starting Test: protocol: TCP, 3 streams, 1000 byte blocks, omitting 0 seconds, 600 second test
> .
> . <snip>
> .
> Test Complete. Summary Results:
> [ ID] Interval           Transfer     Bandwidth       Retr
> [  4]   0.00-600.05 sec  10.1 GBytes   145 Mbits/sec    0             sender
> [  4]   0.00-600.05 sec  10.1 GBytes   145 Mbits/sec                  receiver
> [  6]   0.00-600.05 sec  10.1 GBytes   145 Mbits/sec    7             sender
> [  6]   0.00-600.05 sec  10.1 GBytes   145 Mbits/sec                  receiver
> [  8]   0.00-600.05 sec  10.1 GBytes   145 Mbits/sec    4             sender
> [  8]   0.00-600.05 sec  10.1 GBytes   145 Mbits/sec                  receiver
> [SUM]   0.00-600.05 sec  30.3 GBytes   434 Mbits/sec   11             sender
> [SUM]   0.00-600.05 sec  30.3 GBytes   434 Mbits/sec                  receiver
> CPU Utilization: local/sender 3.7% (0.3%u/3.4%s), remote/receiver 5.7% (0.5%u/5.3%s)
> 
> 
> 
> (2a) SCTP, one association
> 
> iperf version 3.0.1 (10 January 2014)
> Linux Lab200slot2 3.4.2-1.fc16.x86_64 #1 SMP Thu Jun 14 20:17:26 UTC 2012 x86_64
> Time: Sat, 26 Apr 2014 05:30:34 GMT
> Connecting to host 192.168.240.3, port 5201
>       Cookie: Lab200slot2.1398490234.326620.36ee2f
>  [  4] local 192.168.240.2 port 46631 connected to 192.168.240.3 port 5201
> Starting Test: protocol: SCTP, 1 streams, 1000 byte blocks, omitting 0 seconds, 600 second test
> .
> . <snip>
> .
> Test Complete. Summary Results:
> [ ID] Interval           Transfer     Bandwidth
> [  4]   0.00-600.07 sec  8.52 GBytes   122 Mbits/sec                  sender
> [  4]   0.00-600.07 sec  8.52 GBytes   122 Mbits/sec                  receiver
> CPU Utilization: local/sender 4.1% (0.2%u/3.9%s), remote/receiver 2.0% (0.2%u/1.8%s)
> 
> 
> 
> (2b) SCTP, two associations
> 
> iperf version 3.0.1 (10 January 2014)
> Linux Lab200slot2 3.4.2-1.fc16.x86_64 #1 SMP Thu Jun 14 20:17:26 UTC 2012 x86_64
> Time: Sat, 26 Apr 2014 05:41:35 GMT
> Connecting to host 192.168.240.3, port 5201
>       Cookie: Lab200slot2.1398490895.079359.28c5c1
>  [  4] local 192.168.240.2 port 34175 connected to 192.168.240.3 port 5201
>  [  6] local 192.168.240.2 port 41448 connected to 192.168.240.3 port 5201
> Starting Test: protocol: SCTP, 2 streams, 1000 byte blocks, omitting 0 seconds, 600 second test
> .
> . <snip>
> .
> Test Complete. Summary Results:
> [ ID] Interval           Transfer     Bandwidth
> [  4]   0.00-600.09 sec  4.29 GBytes  61.4 Mbits/sec                  sender
> [  4]   0.00-600.09 sec  4.29 GBytes  61.4 Mbits/sec                  receiver
> [  6]   0.00-600.09 sec  4.29 GBytes  61.4 Mbits/sec                  sender
> [  6]   0.00-600.09 sec  4.29 GBytes  61.4 Mbits/sec                  receiver
> [SUM]   0.00-600.09 sec  8.58 GBytes   123 Mbits/sec                  sender
> [SUM]   0.00-600.09 sec  8.58 GBytes   123 Mbits/sec                  receiver
> CPU Utilization: local/sender 2.5% (0.1%u/2.4%s), remote/receiver 1.7% (0.2%u/1.5%s)
> 
> 
> 
> (2c) SCTP, three associations
> 
> iperf version 3.0.1 (10 January 2014)
> Linux Lab200slot2 3.4.2-1.fc16.x86_64 #1 SMP Thu Jun 14 20:17:26 UTC 2012 x86_64
> Time: Sat, 26 Apr 2014 05:52:36 GMT
> Connecting to host 192.168.240.3, port 5201
>       Cookie: Lab200slot2.1398491555.947776.10993d
>  [  4] local 192.168.240.2 port 45551 connected to 192.168.240.3 port 5201
>  [  6] local 192.168.240.2 port 35528 connected to 192.168.240.3 port 5201
>  [  8] local 192.168.240.2 port 44540 connected to 192.168.240.3 port 5201
> Starting Test: protocol: SCTP, 3 streams, 1000 byte blocks, omitting 0 seconds, 600 second test
> .
> . <snip>
> .
> Test Complete. Summary Results:
> [ ID] Interval           Transfer     Bandwidth
> [  4]   0.00-600.08 sec  2.90 GBytes  41.4 Mbits/sec                  sender
> [  4]   0.00-600.08 sec  2.90 GBytes  41.4 Mbits/sec                  receiver
> [  6]   0.00-600.08 sec  2.90 GBytes  41.4 Mbits/sec                  sender
> [  6]   0.00-600.08 sec  2.90 GBytes  41.4 Mbits/sec                  receiver
> [  8]   0.00-600.08 sec  2.90 GBytes  41.4 Mbits/sec                  sender
> [  8]   0.00-600.08 sec  2.90 GBytes  41.4 Mbits/sec                  receiver
> [SUM]   0.00-600.08 sec  8.69 GBytes   124 Mbits/sec                  sender
> [SUM]   0.00-600.08 sec  8.69 GBytes   124 Mbits/sec                  receiver
> CPU Utilization: local/sender 2.6% (0.1%u/2.5%s), remote/receiver 1.6% (0.1%u/1.5%s)
> 
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: SCTP throughput does not scale
  2014-05-01 17:55 SCTP throughput does not scale Butler, Peter
                   ` (2 preceding siblings ...)
  2014-05-02 11:34 ` Neil Horman
@ 2014-05-02 11:45 ` Butler, Peter
  2014-05-02 13:35 ` Neil Horman
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Butler, Peter @ 2014-05-02 11:45 UTC (permalink / raw)
  To: linux-sctp

I have tested with /proc/sys/net/sctp/[snd|rcv]buf_policy set to 0 and to 1.  I get the same behaviour both ways.  Note that my associations are all TCP-style SOCK_DGRAM associations, and not UDP-style SOCK_SEQPACKET associations.  As such, each association has its own socket - rather than all the associations sharing a single socket - and thus, to my understanding, the parameters in question will have no effect (as my testing has shown).  



-----Original Message-----
From: linux-sctp-owner@vger.kernel.org [mailto:linux-sctp-owner@vger.kernel.org] On Behalf Of Neil Horman
Sent: May-02-14 7:35 AM
To: Butler, Peter
Cc: linux-sctp@vger.kernel.org
Subject: Re: SCTP throughput does not scale

On Thu, May 01, 2014 at 05:55:32PM +0000, Butler, Peter wrote:
> It would appear that, in a latency-limited* network, the overall SCTP throughput for a given system does not scale.  That is, no matter how many associations are instantiated on a given system the overall throughput remains capped at the same limit.  This is in contrast to TCP and UDP where one can, for example, obtain twice the throughput by instantiating 2 TCP/UDP connections as opposed to only 1.
> 
> *The above observation applies to a network where the throughput is limited due to network latency as opposed to CPU and/or network bandwidth.  (In a low-latency system where throughput is instead limited by CPU and/or network bandwidth we would not expect to be able to obtain better throughput with more associations as the system/network is already maxed out).
> 
> Quantitative summary and detailed results are shown below for TCP and SCTP (UDP not shown, for brevity, but behavior is similar to TCP: 2 UDP 'connections' yield twice the throughput as 1 UDP 'connection', etc.).  
> 
> Testing performed on both the 3.4.2 kernel and 3.14.0 kernel, using iperf3, with a network RTT of 50 ms (manually implemented via tc-netem), using 1000-byte messages, with both send and receive socket kernel buffer sizes of 2 MB, over a 10 Gbps backplane (although for this particular testing (i.e. with 50 ms RTT latency) the high-speed backplane doesn't really factor in).
> 
> Summary (rounding errors included in numbers below):
> 
> TCP, 1 connection:    1 x 144 Mbps = 144 Mbps total throughput
> TCP, 2 connections:   2 x 144 Mbps = 287 Mbps total throughput
> TCP, 3 connections:   3 x 145 Mbps = 434 Mbps total throughput
> 
> SCTP, 1 association:    1 x 122 Mbps = 122 Mbps total throughput
> SCTP, 2 associations:   2 x 61.4 Mbps = 123 Mbps total throughput
> SCTP, 3 associations:   3 x 41.4 Mbps = 124 Mbps total throughput
> 
What values are /proc/sys/net/sctp/[snd|rcv]buf_policy set to.  They default to 0, but should be set to 1 if you want to see througput scaling with multiple associations.

Neil

> TCP (and UDP, not shown) scales, SCTP does not.
> 
> 
> 
> Actual iperf3 output below.
> 
> 
> (1a) TCP, one connection
> 
> iperf version 3.0.1 (10 January 2014)
> Linux Lab200slot2 3.4.2-1.fc16.x86_64 #1 SMP Thu Jun 14 20:17:26 UTC 
> 2012 x86_64
> Time: Sat, 26 Apr 2014 08:17:43 GMT
> Connecting to host 192.168.240.3, port 5201
>       Cookie: Lab200slot2.1398500263.347708.23b404
>       TCP MSS: 1448 (default)
> [  4] local 192.168.240.2 port 40776 connected to 192.168.240.3 port 
> 5201 Starting Test: protocol: TCP, 1 streams, 1000 byte blocks, 
> omitting 0 seconds, 600 second test .
> . <snip>
> .
> Test Complete. Summary Results:
> [ ID] Interval           Transfer     Bandwidth       Retr
> [  4]   0.00-600.05 sec  10.1 GBytes   144 Mbits/sec    0             sender
> [  4]   0.00-600.05 sec  10.1 GBytes   144 Mbits/sec                  receiver
> CPU Utilization: local/sender 1.2% (0.1%u/1.1%s), remote/receiver 3.2% 
> (0.3%u/2.9%s)
> 
> 
> 
> (1b) TCP, two connections
> 
> iperf version 3.0.1 (10 January 2014)
> Linux Lab200slot2 3.4.2-1.fc16.x86_64 #1 SMP Thu Jun 14 20:17:26 UTC 
> 2012 x86_64
> Time: Sat, 26 Apr 2014 08:28:44 GMT
> Connecting to host 192.168.240.3, port 5201
>       Cookie: Lab200slot2.1398500924.026363.19e3a5
>       TCP MSS: 1448 (default)
> [  4] local 192.168.240.2 port 40780 connected to 192.168.240.3 port 
> 5201 [  6] local 192.168.240.2 port 40781 connected to 192.168.240.3 
> port 5201 Starting Test: protocol: TCP, 2 streams, 1000 byte blocks, 
> omitting 0 seconds, 600 second test .
> . <snip>
> .
> Test Complete. Summary Results:
> [ ID] Interval           Transfer     Bandwidth       Retr
> [  4]   0.00-600.05 sec  10.0 GBytes   144 Mbits/sec  137             sender
> [  4]   0.00-600.05 sec  10.0 GBytes   144 Mbits/sec                  receiver
> [  6]   0.00-600.05 sec  10.0 GBytes   144 Mbits/sec    2             sender
> [  6]   0.00-600.05 sec  10.0 GBytes   144 Mbits/sec                  receiver
> [SUM]   0.00-600.05 sec  20.1 GBytes   287 Mbits/sec  139             sender
> [SUM]   0.00-600.05 sec  20.0 GBytes   287 Mbits/sec                  receiver
> CPU Utilization: local/sender 2.5% (0.2%u/2.3%s), remote/receiver 4.4% 
> (0.4%u/4.1%s)
> 
> 
> 
> (1c) TCP, three connections
> 
> iperf version 3.0.1 (10 January 2014)
> Linux Lab200slot2 3.4.2-1.fc16.x86_64 #1 SMP Thu Jun 14 20:17:26 UTC 
> 2012 x86_64
> Time: Sat, 26 Apr 2014 08:39:44 GMT
> Connecting to host 192.168.240.3, port 5201
>       Cookie: Lab200slot2.1398501584.755085.3c10fa
>       TCP MSS: 1448 (default)
> [  4] local 192.168.240.2 port 40785 connected to 192.168.240.3 port 
> 5201 [  6] local 192.168.240.2 port 40786 connected to 192.168.240.3 
> port 5201 [  8] local 192.168.240.2 port 40787 connected to 
> 192.168.240.3 port 5201 Starting Test: protocol: TCP, 3 streams, 1000 
> byte blocks, omitting 0 seconds, 600 second test .
> . <snip>
> .
> Test Complete. Summary Results:
> [ ID] Interval           Transfer     Bandwidth       Retr
> [  4]   0.00-600.05 sec  10.1 GBytes   145 Mbits/sec    0             sender
> [  4]   0.00-600.05 sec  10.1 GBytes   145 Mbits/sec                  receiver
> [  6]   0.00-600.05 sec  10.1 GBytes   145 Mbits/sec    7             sender
> [  6]   0.00-600.05 sec  10.1 GBytes   145 Mbits/sec                  receiver
> [  8]   0.00-600.05 sec  10.1 GBytes   145 Mbits/sec    4             sender
> [  8]   0.00-600.05 sec  10.1 GBytes   145 Mbits/sec                  receiver
> [SUM]   0.00-600.05 sec  30.3 GBytes   434 Mbits/sec   11             sender
> [SUM]   0.00-600.05 sec  30.3 GBytes   434 Mbits/sec                  receiver
> CPU Utilization: local/sender 3.7% (0.3%u/3.4%s), remote/receiver 5.7% 
> (0.5%u/5.3%s)
> 
> 
> 
> (2a) SCTP, one association
> 
> iperf version 3.0.1 (10 January 2014)
> Linux Lab200slot2 3.4.2-1.fc16.x86_64 #1 SMP Thu Jun 14 20:17:26 UTC 
> 2012 x86_64
> Time: Sat, 26 Apr 2014 05:30:34 GMT
> Connecting to host 192.168.240.3, port 5201
>       Cookie: Lab200slot2.1398490234.326620.36ee2f
>  [  4] local 192.168.240.2 port 46631 connected to 192.168.240.3 port 
> 5201 Starting Test: protocol: SCTP, 1 streams, 1000 byte blocks, 
> omitting 0 seconds, 600 second test .
> . <snip>
> .
> Test Complete. Summary Results:
> [ ID] Interval           Transfer     Bandwidth
> [  4]   0.00-600.07 sec  8.52 GBytes   122 Mbits/sec                  sender
> [  4]   0.00-600.07 sec  8.52 GBytes   122 Mbits/sec                  receiver
> CPU Utilization: local/sender 4.1% (0.2%u/3.9%s), remote/receiver 2.0% 
> (0.2%u/1.8%s)
> 
> 
> 
> (2b) SCTP, two associations
> 
> iperf version 3.0.1 (10 January 2014)
> Linux Lab200slot2 3.4.2-1.fc16.x86_64 #1 SMP Thu Jun 14 20:17:26 UTC 
> 2012 x86_64
> Time: Sat, 26 Apr 2014 05:41:35 GMT
> Connecting to host 192.168.240.3, port 5201
>       Cookie: Lab200slot2.1398490895.079359.28c5c1
>  [  4] local 192.168.240.2 port 34175 connected to 192.168.240.3 port 
> 5201  [  6] local 192.168.240.2 port 41448 connected to 192.168.240.3 
> port 5201 Starting Test: protocol: SCTP, 2 streams, 1000 byte blocks, 
> omitting 0 seconds, 600 second test .
> . <snip>
> .
> Test Complete. Summary Results:
> [ ID] Interval           Transfer     Bandwidth
> [  4]   0.00-600.09 sec  4.29 GBytes  61.4 Mbits/sec                  sender
> [  4]   0.00-600.09 sec  4.29 GBytes  61.4 Mbits/sec                  receiver
> [  6]   0.00-600.09 sec  4.29 GBytes  61.4 Mbits/sec                  sender
> [  6]   0.00-600.09 sec  4.29 GBytes  61.4 Mbits/sec                  receiver
> [SUM]   0.00-600.09 sec  8.58 GBytes   123 Mbits/sec                  sender
> [SUM]   0.00-600.09 sec  8.58 GBytes   123 Mbits/sec                  receiver
> CPU Utilization: local/sender 2.5% (0.1%u/2.4%s), remote/receiver 1.7% 
> (0.2%u/1.5%s)
> 
> 
> 
> (2c) SCTP, three associations
> 
> iperf version 3.0.1 (10 January 2014)
> Linux Lab200slot2 3.4.2-1.fc16.x86_64 #1 SMP Thu Jun 14 20:17:26 UTC 
> 2012 x86_64
> Time: Sat, 26 Apr 2014 05:52:36 GMT
> Connecting to host 192.168.240.3, port 5201
>       Cookie: Lab200slot2.1398491555.947776.10993d
>  [  4] local 192.168.240.2 port 45551 connected to 192.168.240.3 port 
> 5201  [  6] local 192.168.240.2 port 35528 connected to 192.168.240.3 
> port 5201  [  8] local 192.168.240.2 port 44540 connected to 
> 192.168.240.3 port 5201 Starting Test: protocol: SCTP, 3 streams, 1000 
> byte blocks, omitting 0 seconds, 600 second test .
> . <snip>
> .
> Test Complete. Summary Results:
> [ ID] Interval           Transfer     Bandwidth
> [  4]   0.00-600.08 sec  2.90 GBytes  41.4 Mbits/sec                  sender
> [  4]   0.00-600.08 sec  2.90 GBytes  41.4 Mbits/sec                  receiver
> [  6]   0.00-600.08 sec  2.90 GBytes  41.4 Mbits/sec                  sender
> [  6]   0.00-600.08 sec  2.90 GBytes  41.4 Mbits/sec                  receiver
> [  8]   0.00-600.08 sec  2.90 GBytes  41.4 Mbits/sec                  sender
> [  8]   0.00-600.08 sec  2.90 GBytes  41.4 Mbits/sec                  receiver
> [SUM]   0.00-600.08 sec  8.69 GBytes   124 Mbits/sec                  sender
> [SUM]   0.00-600.08 sec  8.69 GBytes   124 Mbits/sec                  receiver
> CPU Utilization: local/sender 2.6% (0.1%u/2.5%s), remote/receiver 1.6% 
> (0.1%u/1.5%s)
> 
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" 
> in the body of a message to majordomo@vger.kernel.org More majordomo 
> info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-sctp" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: SCTP throughput does not scale
  2014-05-01 17:55 SCTP throughput does not scale Butler, Peter
                   ` (3 preceding siblings ...)
  2014-05-02 11:45 ` Butler, Peter
@ 2014-05-02 13:35 ` Neil Horman
  2014-05-02 16:33 ` Butler, Peter
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Neil Horman @ 2014-05-02 13:35 UTC (permalink / raw)
  To: linux-sctp

On Fri, May 02, 2014 at 11:45:00AM +0000, Butler, Peter wrote:
> I have tested with /proc/sys/net/sctp/[snd|rcv]buf_policy set to 0 and to 1.  I get the same behaviour both ways.  Note that my associations are all TCP-style SOCK_DGRAM associations, and not UDP-style SOCK_SEQPACKET associations.  As such, each association has its own socket - rather than all the associations sharing a single socket - and thus, to my understanding, the parameters in question will have no effect (as my testing has shown).  
> 
You're correct, if you're using TCP style associations, the above policies won't
change much.

Such consistent throughput sharing though still seems odd.  You don't have any
traffic shaping or policing implimented on your network devices do you?  Either
on your sending or receiving system?  tc qdisc show would be able to tell you.
Such low throughput on a 10G interface seems like it could not be much other
than that.  Are you seeing any droped frames in /proc/net/sctp/snmp or in
/proc/net/snmp[6]?

Neil


^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: SCTP throughput does not scale
  2014-05-01 17:55 SCTP throughput does not scale Butler, Peter
                   ` (4 preceding siblings ...)
  2014-05-02 13:35 ` Neil Horman
@ 2014-05-02 16:33 ` Butler, Peter
  2014-05-02 16:37 ` Neil Horman
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Butler, Peter @ 2014-05-02 16:33 UTC (permalink / raw)
  To: linux-sctp

The only entries from "tc qdisc show" are the ones used to implement the 50 ms RTT, which applies to all packet types (not just SCTP).

As for dropped frames, are you referring to SctpInPktDiscards?    SctpInPktDiscards is zero or very small (compared to the total number of transmitted packets).  For example, starting with all stats in /proc/net/sctp/snmp zeroed out, and then running one minute's worth of traffic with the same setup (50 ms RTT, 1000-byte messages, 2 MB tx/rx buffer size) I get the following data in /proc/net/sctp/snmp when running two parallel associations (only relevant lines shown here):

client side (sending DATA):
SctpOutSCTPPacks                        938945
SctpInSCTPPacks                         473209
SctpInPktDiscards                       0
SctpInDataChunkDiscards                 0


server side (receiving DATA):
SctpOutSCTPPacks                        473209
SctpInSCTPPacks                         938457
SctpInPktDiscards                       0
SctpInDataChunkDiscards                 0



-----Original Message-----
From: linux-sctp-owner@vger.kernel.org [mailto:linux-sctp-owner@vger.kernel.org] On Behalf Of Neil Horman
Sent: May-02-14 9:36 AM
To: Butler, Peter
Cc: linux-sctp@vger.kernel.org
Subject: Re: SCTP throughput does not scale

On Fri, May 02, 2014 at 11:45:00AM +0000, Butler, Peter wrote:
> I have tested with /proc/sys/net/sctp/[snd|rcv]buf_policy set to 0 and to 1.  I get the same behaviour both ways.  Note that my associations are all TCP-style SOCK_DGRAM associations, and not UDP-style SOCK_SEQPACKET associations.  As such, each association has its own socket - rather than all the associations sharing a single socket - and thus, to my understanding, the parameters in question will have no effect (as my testing has shown).  
> 
You're correct, if you're using TCP style associations, the above policies won't change much.

Such consistent throughput sharing though still seems odd.  You don't have any traffic shaping or policing implimented on your network devices do you?  Either on your sending or receiving system?  tc qdisc show would be able to tell you.
Such low throughput on a 10G interface seems like it could not be much other than that.  Are you seeing any droped frames in /proc/net/sctp/snmp or in /proc/net/snmp[6]?

Neil

--
To unsubscribe from this list: send the line "unsubscribe linux-sctp" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: SCTP throughput does not scale
  2014-05-01 17:55 SCTP throughput does not scale Butler, Peter
                   ` (5 preceding siblings ...)
  2014-05-02 16:33 ` Butler, Peter
@ 2014-05-02 16:37 ` Neil Horman
  2014-05-02 16:52 ` Butler, Peter
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Neil Horman @ 2014-05-02 16:37 UTC (permalink / raw)
  To: linux-sctp

On Fri, May 02, 2014 at 04:33:23PM +0000, Butler, Peter wrote:
> The only entries from "tc qdisc show" are the ones used to implement the 50 ms RTT, which applies to all packet types (not just SCTP).
> 
Can you dump out the tc configuartion here?
Neil

> As for dropped frames, are you referring to SctpInPktDiscards?    SctpInPktDiscards is zero or very small (compared to the total number of transmitted packets).  For example, starting with all stats in /proc/net/sctp/snmp zeroed out, and then running one minute's worth of traffic with the same setup (50 ms RTT, 1000-byte messages, 2 MB tx/rx buffer size) I get the following data in /proc/net/sctp/snmp when running two parallel associations (only relevant lines shown here):
> 
> client side (sending DATA):
> SctpOutSCTPPacks                        938945
> SctpInSCTPPacks                         473209
> SctpInPktDiscards                       0
> SctpInDataChunkDiscards                 0
> 
> 
> server side (receiving DATA):
> SctpOutSCTPPacks                        473209
> SctpInSCTPPacks                         938457
> SctpInPktDiscards                       0
> SctpInDataChunkDiscards                 0
> 
> 
> 
> -----Original Message-----
> From: linux-sctp-owner@vger.kernel.org [mailto:linux-sctp-owner@vger.kernel.org] On Behalf Of Neil Horman
> Sent: May-02-14 9:36 AM
> To: Butler, Peter
> Cc: linux-sctp@vger.kernel.org
> Subject: Re: SCTP throughput does not scale
> 
> On Fri, May 02, 2014 at 11:45:00AM +0000, Butler, Peter wrote:
> > I have tested with /proc/sys/net/sctp/[snd|rcv]buf_policy set to 0 and to 1.  I get the same behaviour both ways.  Note that my associations are all TCP-style SOCK_DGRAM associations, and not UDP-style SOCK_SEQPACKET associations.  As such, each association has its own socket - rather than all the associations sharing a single socket - and thus, to my understanding, the parameters in question will have no effect (as my testing has shown).  
> > 
> You're correct, if you're using TCP style associations, the above policies won't change much.
> 
> Such consistent throughput sharing though still seems odd.  You don't have any traffic shaping or policing implimented on your network devices do you?  Either on your sending or receiving system?  tc qdisc show would be able to tell you.
> Such low throughput on a 10G interface seems like it could not be much other than that.  Are you seeing any droped frames in /proc/net/sctp/snmp or in /proc/net/snmp[6]?
> 
> Neil
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: SCTP throughput does not scale
  2014-05-01 17:55 SCTP throughput does not scale Butler, Peter
                   ` (6 preceding siblings ...)
  2014-05-02 16:37 ` Neil Horman
@ 2014-05-02 16:52 ` Butler, Peter
  2014-05-02 17:07 ` Vlad Yasevich
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Butler, Peter @ 2014-05-02 16:52 UTC (permalink / raw)
  To: linux-sctp

So that you can readily understand the output below, a tiny bit of background.  There are 2 high performance blades communicating over a 10 Gbps dual-Ethernet-fabric backplane.  Each slot has two 10 Gbps NICs, each connected to one of the two segregated backplane fabrics.  The NIC names are p19p1 and p19p2.   The two blades are in slots 2 and 3, with hostnames 'slot2' and 'slot3' respectively.  The IP aliases for the two backplane IP addresses are 'slot2_0' and 'slot2_1' (for the two slot 2 IP addresses) and 'slot3_0' and 'slot3_1' (for the two slot 3 IP addresses).

To 'fairly' implement the 50 ms RTT, I add a 25 ms to each blade (rather than 50 ms to one of the two blades):

[root@slot2 ~]# tc qdisc show
qdisc netem 8002: dev p19p2 root refcnt 65 limit 1000 delay 25.0ms
qdisc netem 8001: dev p19p1 root refcnt 65 limit 1000 delay 25.0ms


[root@slot3 ~]# tc qdisc show
qdisc netem 8002: dev p19p2 root refcnt 65 limit 1000 delay 25.0ms
qdisc netem 8001: dev p19p1 root refcnt 65 limit 1000 delay 25.0ms


With the RTT implemented as such, I have a 50 ms RTT for both addresses used in the SCTP association (as well as any single IP used when performing the analogous test with TCP or UDP):

[root@slot2 ~]# ping slot3_0
PING slot3_0 (192.168.240.3) 56(84) bytes of data.
64 bytes from slot3_0 (192.168.240.3): icmp_req=1 ttld timeP.3 ms
64 bytes from slot3_0 (192.168.240.3): icmp_req=2 ttld timeP.6 ms
64 bytes from slot3_0 (192.168.240.3): icmp_req=3 ttld timeP.6 ms
(etc.)

 [root@slot2 ~]# ping slot3_1
PING slot3_1 (192.168.241.3) 56(84) bytes of data.
64 bytes from slot3_1 (192.168.241.3): icmp_req=1 ttld timeP.4 ms
64 bytes from slot3_1 (192.168.241.3): icmp_req=2 ttld timeP.4 ms
64 bytes from slot3_1 (192.168.241.3): icmp_req=3 ttld timeP.4 ms
(etc.)



-----Original Message-----
From: linux-sctp-owner@vger.kernel.org [mailto:linux-sctp-owner@vger.kernel.org] On Behalf Of Neil Horman
Sent: May-02-14 12:37 PM
To: Butler, Peter
Cc: linux-sctp@vger.kernel.org
Subject: Re: SCTP throughput does not scale

On Fri, May 02, 2014 at 04:33:23PM +0000, Butler, Peter wrote:
> The only entries from "tc qdisc show" are the ones used to implement the 50 ms RTT, which applies to all packet types (not just SCTP).
> 
Can you dump out the tc configuartion here?
Neil
rdomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: SCTP throughput does not scale
  2014-05-01 17:55 SCTP throughput does not scale Butler, Peter
                   ` (7 preceding siblings ...)
  2014-05-02 16:52 ` Butler, Peter
@ 2014-05-02 17:07 ` Vlad Yasevich
  2014-05-02 17:10 ` Butler, Peter
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Vlad Yasevich @ 2014-05-02 17:07 UTC (permalink / raw)
  To: linux-sctp

On 05/02/2014 12:33 PM, Butler, Peter wrote:
> The only entries from "tc qdisc show" are the ones used to implement the 50 ms RTT, which applies to all packet types (not just SCTP).
> 

I am assuming that you are using netem.  What is the queue length?
What is the output of
 # tc -s qdisc show

look like.

Thanks
-vlad

> As for dropped frames, are you referring to SctpInPktDiscards?    SctpInPktDiscards is zero or very small (compared to the total number of transmitted packets).  For example, starting with all stats in /proc/net/sctp/snmp zeroed out, and then running one minute's worth of traffic with the same setup (50 ms RTT, 1000-byte messages, 2 MB tx/rx buffer size) I get the following data in /proc/net/sctp/snmp when running two parallel associations (only relevant lines shown here):
> 
> client side (sending DATA):
> SctpOutSCTPPacks                        938945
> SctpInSCTPPacks                         473209
> SctpInPktDiscards                       0
> SctpInDataChunkDiscards                 0
> 
> 
> server side (receiving DATA):
> SctpOutSCTPPacks                        473209
> SctpInSCTPPacks                         938457
> SctpInPktDiscards                       0
> SctpInDataChunkDiscards                 0
> 
> 
> 
> -----Original Message-----
> From: linux-sctp-owner@vger.kernel.org [mailto:linux-sctp-owner@vger.kernel.org] On Behalf Of Neil Horman
> Sent: May-02-14 9:36 AM
> To: Butler, Peter
> Cc: linux-sctp@vger.kernel.org
> Subject: Re: SCTP throughput does not scale
> 
> On Fri, May 02, 2014 at 11:45:00AM +0000, Butler, Peter wrote:
>> I have tested with /proc/sys/net/sctp/[snd|rcv]buf_policy set to 0 and to 1.  I get the same behaviour both ways.  Note that my associations are all TCP-style SOCK_DGRAM associations, and not UDP-style SOCK_SEQPACKET associations.  As such, each association has its own socket - rather than all the associations sharing a single socket - and thus, to my understanding, the parameters in question will have no effect (as my testing has shown).  
>>
> You're correct, if you're using TCP style associations, the above policies won't change much.
> 
> Such consistent throughput sharing though still seems odd.  You don't have any traffic shaping or policing implimented on your network devices do you?  Either on your sending or receiving system?  tc qdisc show would be able to tell you.
> Such low throughput on a 10G interface seems like it could not be much other than that.  Are you seeing any droped frames in /proc/net/sctp/snmp or in /proc/net/snmp[6]?
> 
> Neil
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: SCTP throughput does not scale
  2014-05-01 17:55 SCTP throughput does not scale Butler, Peter
                   ` (8 preceding siblings ...)
  2014-05-02 17:07 ` Vlad Yasevich
@ 2014-05-02 17:10 ` Butler, Peter
  2014-05-02 17:34 ` Vlad Yasevich
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Butler, Peter @ 2014-05-02 17:10 UTC (permalink / raw)
  To: linux-sctp

[root@slot2 ~]# tc -s qdisc show
qdisc netem 8002: dev p19p2 root refcnt 65 limit 1000 delay 25.0ms
 Sent 590200 bytes 7204 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0 
qdisc netem 8001: dev p19p1 root refcnt 65 limit 1000 delay 25.0ms
 Sent 997332352 bytes 946411 pkt (dropped 478, overlimits 0 requeues 1) 
 backlog 114b 1p requeues 1 


[root@slot3 ~]# tc -s qdisc show
qdisc netem 8002: dev p19p2 root refcnt 65 limit 1000 delay 25.0ms
 Sent 90352 bytes 1666 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0 
qdisc netem 8001: dev p19p1 root refcnt 65 limit 1000 delay 25.0ms
 Sent 29544962 bytes 475167 pkt (dropped 0, overlimits 0 requeues 2) 
 backlog 130b 1p requeues 2




-----Original Message-----
From: Vlad Yasevich [mailto:vyasevich@gmail.com] 
Sent: May-02-14 1:07 PM
To: Butler, Peter; Neil Horman
Cc: linux-sctp@vger.kernel.org
Subject: Re: SCTP throughput does not scale

On 05/02/2014 12:33 PM, Butler, Peter wrote:
> The only entries from "tc qdisc show" are the ones used to implement the 50 ms RTT, which applies to all packet types (not just SCTP).
> 

I am assuming that you are using netem.  What is the queue length?
What is the output of
 # tc -s qdisc show

look like.

Thanks
-vlad

> As for dropped frames, are you referring to SctpInPktDiscards?    SctpInPktDiscards is zero or very small (compared to the total number of transmitted packets).  For example, starting with all stats in /proc/net/sctp/snmp zeroed out, and then running one minute's worth of traffic with the same setup (50 ms RTT, 1000-byte messages, 2 MB tx/rx buffer size) I get the following data in /proc/net/sctp/snmp when running two parallel associations (only relevant lines shown here):
> 
> client side (sending DATA):
> SctpOutSCTPPacks                        938945
> SctpInSCTPPacks                         473209
> SctpInPktDiscards                       0
> SctpInDataChunkDiscards                 0
> 
> 
> server side (receiving DATA):
> SctpOutSCTPPacks                        473209
> SctpInSCTPPacks                         938457
> SctpInPktDiscards                       0
> SctpInDataChunkDiscards                 0
> 
> 
> 
> -----Original Message-----
> From: linux-sctp-owner@vger.kernel.org 
> [mailto:linux-sctp-owner@vger.kernel.org] On Behalf Of Neil Horman
> Sent: May-02-14 9:36 AM
> To: Butler, Peter
> Cc: linux-sctp@vger.kernel.org
> Subject: Re: SCTP throughput does not scale
> 
> On Fri, May 02, 2014 at 11:45:00AM +0000, Butler, Peter wrote:
>> I have tested with /proc/sys/net/sctp/[snd|rcv]buf_policy set to 0 and to 1.  I get the same behaviour both ways.  Note that my associations are all TCP-style SOCK_DGRAM associations, and not UDP-style SOCK_SEQPACKET associations.  As such, each association has its own socket - rather than all the associations sharing a single socket - and thus, to my understanding, the parameters in question will have no effect (as my testing has shown).  
>>
> You're correct, if you're using TCP style associations, the above policies won't change much.
> 
> Such consistent throughput sharing though still seems odd.  You don't have any traffic shaping or policing implimented on your network devices do you?  Either on your sending or receiving system?  tc qdisc show would be able to tell you.
> Such low throughput on a 10G interface seems like it could not be much other than that.  Are you seeing any droped frames in /proc/net/sctp/snmp or in /proc/net/snmp[6]?
> 
> Neil
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" 
> in the body of a message to majordomo@vger.kernel.org More majordomo 
> info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" 
> in the body of a message to majordomo@vger.kernel.org More majordomo 
> info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: SCTP throughput does not scale
  2014-05-01 17:55 SCTP throughput does not scale Butler, Peter
                   ` (9 preceding siblings ...)
  2014-05-02 17:10 ` Butler, Peter
@ 2014-05-02 17:34 ` Vlad Yasevich
  2014-05-02 19:13 ` Butler, Peter
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: Vlad Yasevich @ 2014-05-02 17:34 UTC (permalink / raw)
  To: linux-sctp

On 05/02/2014 01:10 PM, Butler, Peter wrote:
> [root@slot2 ~]# tc -s qdisc show
> qdisc netem 8002: dev p19p2 root refcnt 65 limit 1000 delay 25.0ms
>  Sent 590200 bytes 7204 pkt (dropped 0, overlimits 0 requeues 0) 
>  backlog 0b 0p requeues 0 
> qdisc netem 8001: dev p19p1 root refcnt 65 limit 1000 delay 25.0ms
>  Sent 997332352 bytes 946411 pkt (dropped 478, overlimits 0 requeues 1) 
>  backlog 114b 1p requeues 1 
> 

Thanks.  The above shows a drop of 478 packets.  You might try growing
you queue size.  Remember that SCTP is very much packet oriented
and with your size of 1000 bytes, each message ends up taking up
1 under-utilized packet.

Meanwhile TCP will coalesce your 1000 byte writes into full mss sized
writes (plug GSO/TSO if you are still using it).  That allows TCP
to much more effectively utilized the packets.

-vlad

> 
> [root@slot3 ~]# tc -s qdisc show
> qdisc netem 8002: dev p19p2 root refcnt 65 limit 1000 delay 25.0ms
>  Sent 90352 bytes 1666 pkt (dropped 0, overlimits 0 requeues 0) 
>  backlog 0b 0p requeues 0 
> qdisc netem 8001: dev p19p1 root refcnt 65 limit 1000 delay 25.0ms
>  Sent 29544962 bytes 475167 pkt (dropped 0, overlimits 0 requeues 2) 
>  backlog 130b 1p requeues 2
> 
> 
> 
> 
> -----Original Message-----
> From: Vlad Yasevich [mailto:vyasevich@gmail.com] 
> Sent: May-02-14 1:07 PM
> To: Butler, Peter; Neil Horman
> Cc: linux-sctp@vger.kernel.org
> Subject: Re: SCTP throughput does not scale
> 
> On 05/02/2014 12:33 PM, Butler, Peter wrote:
>> The only entries from "tc qdisc show" are the ones used to implement the 50 ms RTT, which applies to all packet types (not just SCTP).
>>
> 
> I am assuming that you are using netem.  What is the queue length?
> What is the output of
>  # tc -s qdisc show
> 
> look like.
> 
> Thanks
> -vlad
> 
>> As for dropped frames, are you referring to SctpInPktDiscards?    SctpInPktDiscards is zero or very small (compared to the total number of transmitted packets).  For example, starting with all stats in /proc/net/sctp/snmp zeroed out, and then running one minute's worth of traffic with the same setup (50 ms RTT, 1000-byte messages, 2 MB tx/rx buffer size) I get the following data in /proc/net/sctp/snmp when running two parallel associations (only relevant lines shown here):
>>
>> client side (sending DATA):
>> SctpOutSCTPPacks                        938945
>> SctpInSCTPPacks                         473209
>> SctpInPktDiscards                       0
>> SctpInDataChunkDiscards                 0
>>
>>
>> server side (receiving DATA):
>> SctpOutSCTPPacks                        473209
>> SctpInSCTPPacks                         938457
>> SctpInPktDiscards                       0
>> SctpInDataChunkDiscards                 0
>>
>>
>>
>> -----Original Message-----
>> From: linux-sctp-owner@vger.kernel.org 
>> [mailto:linux-sctp-owner@vger.kernel.org] On Behalf Of Neil Horman
>> Sent: May-02-14 9:36 AM
>> To: Butler, Peter
>> Cc: linux-sctp@vger.kernel.org
>> Subject: Re: SCTP throughput does not scale
>>
>> On Fri, May 02, 2014 at 11:45:00AM +0000, Butler, Peter wrote:
>>> I have tested with /proc/sys/net/sctp/[snd|rcv]buf_policy set to 0 and to 1.  I get the same behaviour both ways.  Note that my associations are all TCP-style SOCK_DGRAM associations, and not UDP-style SOCK_SEQPACKET associations.  As such, each association has its own socket - rather than all the associations sharing a single socket - and thus, to my understanding, the parameters in question will have no effect (as my testing has shown).  
>>>
>> You're correct, if you're using TCP style associations, the above policies won't change much.
>>
>> Such consistent throughput sharing though still seems odd.  You don't have any traffic shaping or policing implimented on your network devices do you?  Either on your sending or receiving system?  tc qdisc show would be able to tell you.
>> Such low throughput on a 10G interface seems like it could not be much other than that.  Are you seeing any droped frames in /proc/net/sctp/snmp or in /proc/net/snmp[6]?
>>
>> Neil
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" 
>> in the body of a message to majordomo@vger.kernel.org More majordomo 
>> info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" 
>> in the body of a message to majordomo@vger.kernel.org More majordomo 
>> info at  http://vger.kernel.org/majordomo-info.html
>>
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: SCTP throughput does not scale
  2014-05-01 17:55 SCTP throughput does not scale Butler, Peter
                   ` (10 preceding siblings ...)
  2014-05-02 17:34 ` Vlad Yasevich
@ 2014-05-02 19:13 ` Butler, Peter
  2014-05-02 19:39 ` Vlad Yasevich
  2014-05-02 20:14 ` Butler, Peter
  13 siblings, 0 replies; 15+ messages in thread
From: Butler, Peter @ 2014-05-02 19:13 UTC (permalink / raw)
  To: linux-sctp

Recall that the issue here isn't that TCP outperforms SCTP - i.e. that it has higher throughput overall - but that TCP (and UDP) scale up when more connections are added, whereas SCTP does not.  So while changing the message size (say, from 1000 bytes to 1452 bytes) and modifying the GSO/TSO/GRO/LRO NIC settings does indeed change the overall SCTP and TCP throughput (and closes the throughput gap between these protocols), the fact remains that I can still then double the overall TCP system throughput by adding in a second TCP connection, whereas I cannot double the SCTP throughput by adding in a second SCTP association.  (Again, in the latter case the overall throughput remains constant with the two associations now carrying half the traffic as the lone association did in the former case.)



-----Original Message-----
From: Vlad Yasevich [mailto:vyasevich@gmail.com] 
Sent: May-02-14 1:34 PM
To: Butler, Peter; Neil Horman
Cc: linux-sctp@vger.kernel.org
Subject: Re: SCTP throughput does not scale

On 05/02/2014 01:10 PM, Butler, Peter wrote:
> [root@slot2 ~]# tc -s qdisc show
> qdisc netem 8002: dev p19p2 root refcnt 65 limit 1000 delay 25.0ms  
> Sent 590200 bytes 7204 pkt (dropped 0, overlimits 0 requeues 0)  
> backlog 0b 0p requeues 0 qdisc netem 8001: dev p19p1 root refcnt 65 
> limit 1000 delay 25.0ms  Sent 997332352 bytes 946411 pkt (dropped 478, 
> overlimits 0 requeues 1)  backlog 114b 1p requeues 1
> 

Thanks.  The above shows a drop of 478 packets.  You might try growing you queue size.  Remember that SCTP is very much packet oriented and with your size of 1000 bytes, each message ends up taking up
1 under-utilized packet.

Meanwhile TCP will coalesce your 1000 byte writes into full mss sized writes (plug GSO/TSO if you are still using it).  That allows TCP to much more effectively utilized the packets.

-vlad

> 
> [root@slot3 ~]# tc -s qdisc show
> qdisc netem 8002: dev p19p2 root refcnt 65 limit 1000 delay 25.0ms  
> Sent 90352 bytes 1666 pkt (dropped 0, overlimits 0 requeues 0)  
> backlog 0b 0p requeues 0 qdisc netem 8001: dev p19p1 root refcnt 65 
> limit 1000 delay 25.0ms  Sent 29544962 bytes 475167 pkt (dropped 0, 
> overlimits 0 requeues 2)  backlog 130b 1p requeues 2
> 
> 
> 
> 
> -----Original Message-----
> From: Vlad Yasevich [mailto:vyasevich@gmail.com]
> Sent: May-02-14 1:07 PM
> To: Butler, Peter; Neil Horman
> Cc: linux-sctp@vger.kernel.org
> Subject: Re: SCTP throughput does not scale
> 
> On 05/02/2014 12:33 PM, Butler, Peter wrote:
>> The only entries from "tc qdisc show" are the ones used to implement the 50 ms RTT, which applies to all packet types (not just SCTP).
>>
> 
> I am assuming that you are using netem.  What is the queue length?
> What is the output of
>  # tc -s qdisc show
> 
> look like.
> 
> Thanks
> -vlad
> 
>> As for dropped frames, are you referring to SctpInPktDiscards?    SctpInPktDiscards is zero or very small (compared to the total number of transmitted packets).  For example, starting with all stats in /proc/net/sctp/snmp zeroed out, and then running one minute's worth of traffic with the same setup (50 ms RTT, 1000-byte messages, 2 MB tx/rx buffer size) I get the following data in /proc/net/sctp/snmp when running two parallel associations (only relevant lines shown here):
>>
>> client side (sending DATA):
>> SctpOutSCTPPacks                        938945
>> SctpInSCTPPacks                         473209
>> SctpInPktDiscards                       0
>> SctpInDataChunkDiscards                 0
>>
>>
>> server side (receiving DATA):
>> SctpOutSCTPPacks                        473209
>> SctpInSCTPPacks                         938457
>> SctpInPktDiscards                       0
>> SctpInDataChunkDiscards                 0
>>
>>
>>
>> -----Original Message-----
>> From: linux-sctp-owner@vger.kernel.org 
>> [mailto:linux-sctp-owner@vger.kernel.org] On Behalf Of Neil Horman
>> Sent: May-02-14 9:36 AM
>> To: Butler, Peter
>> Cc: linux-sctp@vger.kernel.org
>> Subject: Re: SCTP throughput does not scale
>>
>> On Fri, May 02, 2014 at 11:45:00AM +0000, Butler, Peter wrote:
>>> I have tested with /proc/sys/net/sctp/[snd|rcv]buf_policy set to 0 and to 1.  I get the same behaviour both ways.  Note that my associations are all TCP-style SOCK_DGRAM associations, and not UDP-style SOCK_SEQPACKET associations.  As such, each association has its own socket - rather than all the associations sharing a single socket - and thus, to my understanding, the parameters in question will have no effect (as my testing has shown).  
>>>
>> You're correct, if you're using TCP style associations, the above policies won't change much.
>>
>> Such consistent throughput sharing though still seems odd.  You don't have any traffic shaping or policing implimented on your network devices do you?  Either on your sending or receiving system?  tc qdisc show would be able to tell you.
>> Such low throughput on a 10G interface seems like it could not be much other than that.  Are you seeing any droped frames in /proc/net/sctp/snmp or in /proc/net/snmp[6]?
>>
>> Neil
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" 
>> in the body of a message to majordomo@vger.kernel.org More majordomo 
>> info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" 
>> in the body of a message to majordomo@vger.kernel.org More majordomo 
>> info at  http://vger.kernel.org/majordomo-info.html
>>
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: SCTP throughput does not scale
  2014-05-01 17:55 SCTP throughput does not scale Butler, Peter
                   ` (11 preceding siblings ...)
  2014-05-02 19:13 ` Butler, Peter
@ 2014-05-02 19:39 ` Vlad Yasevich
  2014-05-02 20:14 ` Butler, Peter
  13 siblings, 0 replies; 15+ messages in thread
From: Vlad Yasevich @ 2014-05-02 19:39 UTC (permalink / raw)
  To: linux-sctp

On 05/02/2014 03:13 PM, Butler, Peter wrote:
> Recall that the issue here isn't that TCP outperforms SCTP - i.e. that it has higher throughput overall - but that TCP (and UDP) scale up when more connections are added, whereas SCTP does not.  So while changing the message size (say, from 1000 bytes to 1452 bytes) and modifying the GSO/TSO/GRO/LRO NIC settings does indeed change the overall SCTP and TCP throughput (and closes the throughput gap between these protocols), the fact remains that I can still then double the overall TCP system throughput by adding in a second TCP connection, whereas I cannot double the SCTP throughput by adding in a second SCTP association.  (Again, in the latter case the overall throughput remains constant with the two associations now carrying half the traffic as the lone association did in the former case.)
> 

Right, I understand.  However, TCP will end up sending less packets then
SCTP due to the stream nature of TCP.  So, you may not be hitting
a netem drop limitation with TCP.  This would be an interesting data point.

Run a 2 stream TCP perf session with default netem settings and see
if you have qdisc drops.

Then run a 2 stream SCTP perf session and check for drops.

-vlad

> 
> 
> -----Original Message-----
> From: Vlad Yasevich [mailto:vyasevich@gmail.com] 
> Sent: May-02-14 1:34 PM
> To: Butler, Peter; Neil Horman
> Cc: linux-sctp@vger.kernel.org
> Subject: Re: SCTP throughput does not scale
> 
> On 05/02/2014 01:10 PM, Butler, Peter wrote:
>> [root@slot2 ~]# tc -s qdisc show
>> qdisc netem 8002: dev p19p2 root refcnt 65 limit 1000 delay 25.0ms  
>> Sent 590200 bytes 7204 pkt (dropped 0, overlimits 0 requeues 0)  
>> backlog 0b 0p requeues 0 qdisc netem 8001: dev p19p1 root refcnt 65 
>> limit 1000 delay 25.0ms  Sent 997332352 bytes 946411 pkt (dropped 478, 
>> overlimits 0 requeues 1)  backlog 114b 1p requeues 1
>>
> 
> Thanks.  The above shows a drop of 478 packets.  You might try growing you queue size.  Remember that SCTP is very much packet oriented and with your size of 1000 bytes, each message ends up taking up
> 1 under-utilized packet.
> 
> Meanwhile TCP will coalesce your 1000 byte writes into full mss sized writes (plug GSO/TSO if you are still using it).  That allows TCP to much more effectively utilized the packets.
> 
> -vlad
> 
>>
>> [root@slot3 ~]# tc -s qdisc show
>> qdisc netem 8002: dev p19p2 root refcnt 65 limit 1000 delay 25.0ms  
>> Sent 90352 bytes 1666 pkt (dropped 0, overlimits 0 requeues 0)  
>> backlog 0b 0p requeues 0 qdisc netem 8001: dev p19p1 root refcnt 65 
>> limit 1000 delay 25.0ms  Sent 29544962 bytes 475167 pkt (dropped 0, 
>> overlimits 0 requeues 2)  backlog 130b 1p requeues 2
>>
>>
>>
>>
>> -----Original Message-----
>> From: Vlad Yasevich [mailto:vyasevich@gmail.com]
>> Sent: May-02-14 1:07 PM
>> To: Butler, Peter; Neil Horman
>> Cc: linux-sctp@vger.kernel.org
>> Subject: Re: SCTP throughput does not scale
>>
>> On 05/02/2014 12:33 PM, Butler, Peter wrote:
>>> The only entries from "tc qdisc show" are the ones used to implement the 50 ms RTT, which applies to all packet types (not just SCTP).
>>>
>>
>> I am assuming that you are using netem.  What is the queue length?
>> What is the output of
>>  # tc -s qdisc show
>>
>> look like.
>>
>> Thanks
>> -vlad
>>
>>> As for dropped frames, are you referring to SctpInPktDiscards?    SctpInPktDiscards is zero or very small (compared to the total number of transmitted packets).  For example, starting with all stats in /proc/net/sctp/snmp zeroed out, and then running one minute's worth of traffic with the same setup (50 ms RTT, 1000-byte messages, 2 MB tx/rx buffer size) I get the following data in /proc/net/sctp/snmp when running two parallel associations (only relevant lines shown here):
>>>
>>> client side (sending DATA):
>>> SctpOutSCTPPacks                        938945
>>> SctpInSCTPPacks                         473209
>>> SctpInPktDiscards                       0
>>> SctpInDataChunkDiscards                 0
>>>
>>>
>>> server side (receiving DATA):
>>> SctpOutSCTPPacks                        473209
>>> SctpInSCTPPacks                         938457
>>> SctpInPktDiscards                       0
>>> SctpInDataChunkDiscards                 0
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: linux-sctp-owner@vger.kernel.org 
>>> [mailto:linux-sctp-owner@vger.kernel.org] On Behalf Of Neil Horman
>>> Sent: May-02-14 9:36 AM
>>> To: Butler, Peter
>>> Cc: linux-sctp@vger.kernel.org
>>> Subject: Re: SCTP throughput does not scale
>>>
>>> On Fri, May 02, 2014 at 11:45:00AM +0000, Butler, Peter wrote:
>>>> I have tested with /proc/sys/net/sctp/[snd|rcv]buf_policy set to 0 and to 1.  I get the same behaviour both ways.  Note that my associations are all TCP-style SOCK_DGRAM associations, and not UDP-style SOCK_SEQPACKET associations.  As such, each association has its own socket - rather than all the associations sharing a single socket - and thus, to my understanding, the parameters in question will have no effect (as my testing has shown).  
>>>>
>>> You're correct, if you're using TCP style associations, the above policies won't change much.
>>>
>>> Such consistent throughput sharing though still seems odd.  You don't have any traffic shaping or policing implimented on your network devices do you?  Either on your sending or receiving system?  tc qdisc show would be able to tell you.
>>> Such low throughput on a 10G interface seems like it could not be much other than that.  Are you seeing any droped frames in /proc/net/sctp/snmp or in /proc/net/snmp[6]?
>>>
>>> Neil
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" 
>>> in the body of a message to majordomo@vger.kernel.org More majordomo 
>>> info at  http://vger.kernel.org/majordomo-info.html
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" 
>>> in the body of a message to majordomo@vger.kernel.org More majordomo 
>>> info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: SCTP throughput does not scale
  2014-05-01 17:55 SCTP throughput does not scale Butler, Peter
                   ` (12 preceding siblings ...)
  2014-05-02 19:39 ` Vlad Yasevich
@ 2014-05-02 20:14 ` Butler, Peter
  13 siblings, 0 replies; 15+ messages in thread
From: Butler, Peter @ 2014-05-02 20:14 UTC (permalink / raw)
  To: linux-sctp

Those are excellent points you make.

I just tried retesting as per your suggestion (2 parallel connections/associations with default netem settings) and sure enough I do not see any drops in the TCP test whereas I do in the SCTP test.

I also tried testing with a drastically inflated qdisc queue size (i.e. 1000000 as opposed to the default of 1000 - overkill for sure, but just wanted to eliminate any ambiguity for this quick test) , and as such the SCTP behaviour now *much* more closely resembles the TCP behaviour.  Not quite as good as TCP (that is, I do not necessarily scale linearly), but by adding more and more associations I do indeed get significantly greater and greater throughput.  

Granted, this is all based only on some quick tests that I just ran but the results sure look promising insofar as clearing up this 'scaling' issue.

Thanks a great deal for your valuable input!





-----Original Message-----
From: Vlad Yasevich [mailto:vyasevich@gmail.com] 
Sent: May-02-14 3:39 PM
To: Butler, Peter; Neil Horman
Cc: linux-sctp@vger.kernel.org
Subject: Re: SCTP throughput does not scale

On 05/02/2014 03:13 PM, Butler, Peter wrote:
> Recall that the issue here isn't that TCP outperforms SCTP - i.e. that 
> it has higher throughput overall - but that TCP (and UDP) scale up 
> when more connections are added, whereas SCTP does not.  So while 
> changing the message size (say, from 1000 bytes to 1452 bytes) and 
> modifying the GSO/TSO/GRO/LRO NIC settings does indeed change the 
> overall SCTP and TCP throughput (and closes the throughput gap between 
> these protocols), the fact remains that I can still then double the 
> overall TCP system throughput by adding in a second TCP connection, 
> whereas I cannot double the SCTP throughput by adding in a second SCTP 
> association.  (Again, in the latter case the overall throughput 
> remains constant with the two associations now carrying half the 
> traffic as the lone association did in the former case.)
> 

Right, I understand.  However, TCP will end up sending less packets then SCTP due to the stream nature of TCP.  So, you may not be hitting a netem drop limitation with TCP.  This would be an interesting data point.

Run a 2 stream TCP perf session with default netem settings and see if you have qdisc drops.

Then run a 2 stream SCTP perf session and check for drops.

-vlad

> 
> 
> -----Original Message-----
> From: Vlad Yasevich [mailto:vyasevich@gmail.com]
> Sent: May-02-14 1:34 PM
> To: Butler, Peter; Neil Horman
> Cc: linux-sctp@vger.kernel.org
> Subject: Re: SCTP throughput does not scale
> 
> On 05/02/2014 01:10 PM, Butler, Peter wrote:
>> [root@slot2 ~]# tc -s qdisc show
>> qdisc netem 8002: dev p19p2 root refcnt 65 limit 1000 delay 25.0ms 
>> Sent 590200 bytes 7204 pkt (dropped 0, overlimits 0 requeues 0) 
>> backlog 0b 0p requeues 0 qdisc netem 8001: dev p19p1 root refcnt 65 
>> limit 1000 delay 25.0ms  Sent 997332352 bytes 946411 pkt (dropped 
>> 478, overlimits 0 requeues 1)  backlog 114b 1p requeues 1
>>
> 
> Thanks.  The above shows a drop of 478 packets.  You might try growing 
> you queue size.  Remember that SCTP is very much packet oriented and 
> with your size of 1000 bytes, each message ends up taking up
> 1 under-utilized packet.
> 
> Meanwhile TCP will coalesce your 1000 byte writes into full mss sized writes (plug GSO/TSO if you are still using it).  That allows TCP to much more effectively utilized the packets.
> 
> -vlad
> 
>>
>> [root@slot3 ~]# tc -s qdisc show
>> qdisc netem 8002: dev p19p2 root refcnt 65 limit 1000 delay 25.0ms 
>> Sent 90352 bytes 1666 pkt (dropped 0, overlimits 0 requeues 0) 
>> backlog 0b 0p requeues 0 qdisc netem 8001: dev p19p1 root refcnt 65 
>> limit 1000 delay 25.0ms  Sent 29544962 bytes 475167 pkt (dropped 0, 
>> overlimits 0 requeues 2)  backlog 130b 1p requeues 2
>>
>>
>>
>>
>> -----Original Message-----
>> From: Vlad Yasevich [mailto:vyasevich@gmail.com]
>> Sent: May-02-14 1:07 PM
>> To: Butler, Peter; Neil Horman
>> Cc: linux-sctp@vger.kernel.org
>> Subject: Re: SCTP throughput does not scale
>>
>> On 05/02/2014 12:33 PM, Butler, Peter wrote:
>>> The only entries from "tc qdisc show" are the ones used to implement the 50 ms RTT, which applies to all packet types (not just SCTP).
>>>
>>
>> I am assuming that you are using netem.  What is the queue length?
>> What is the output of
>>  # tc -s qdisc show
>>
>> look like.
>>
>> Thanks
>> -vlad
>>
>>> As for dropped frames, are you referring to SctpInPktDiscards?    SctpInPktDiscards is zero or very small (compared to the total number of transmitted packets).  For example, starting with all stats in /proc/net/sctp/snmp zeroed out, and then running one minute's worth of traffic with the same setup (50 ms RTT, 1000-byte messages, 2 MB tx/rx buffer size) I get the following data in /proc/net/sctp/snmp when running two parallel associations (only relevant lines shown here):
>>>
>>> client side (sending DATA):
>>> SctpOutSCTPPacks                        938945
>>> SctpInSCTPPacks                         473209
>>> SctpInPktDiscards                       0
>>> SctpInDataChunkDiscards                 0
>>>
>>>
>>> server side (receiving DATA):
>>> SctpOutSCTPPacks                        473209
>>> SctpInSCTPPacks                         938457
>>> SctpInPktDiscards                       0
>>> SctpInDataChunkDiscards                 0
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: linux-sctp-owner@vger.kernel.org 
>>> [mailto:linux-sctp-owner@vger.kernel.org] On Behalf Of Neil Horman
>>> Sent: May-02-14 9:36 AM
>>> To: Butler, Peter
>>> Cc: linux-sctp@vger.kernel.org
>>> Subject: Re: SCTP throughput does not scale
>>>
>>> On Fri, May 02, 2014 at 11:45:00AM +0000, Butler, Peter wrote:
>>>> I have tested with /proc/sys/net/sctp/[snd|rcv]buf_policy set to 0 and to 1.  I get the same behaviour both ways.  Note that my associations are all TCP-style SOCK_DGRAM associations, and not UDP-style SOCK_SEQPACKET associations.  As such, each association has its own socket - rather than all the associations sharing a single socket - and thus, to my understanding, the parameters in question will have no effect (as my testing has shown).  
>>>>
>>> You're correct, if you're using TCP style associations, the above policies won't change much.
>>>
>>> Such consistent throughput sharing though still seems odd.  You don't have any traffic shaping or policing implimented on your network devices do you?  Either on your sending or receiving system?  tc qdisc show would be able to tell you.
>>> Such low throughput on a 10G interface seems like it could not be much other than that.  Are you seeing any droped frames in /proc/net/sctp/snmp or in /proc/net/snmp[6]?
>>>
>>> Neil
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" 
>>> in the body of a message to majordomo@vger.kernel.org More majordomo 
>>> info at  http://vger.kernel.org/majordomo-info.html
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" 
>>> in the body of a message to majordomo@vger.kernel.org More majordomo 
>>> info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2014-05-02 20:14 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-01 17:55 SCTP throughput does not scale Butler, Peter
2014-05-01 22:51 ` Vlad Yasevich
2014-05-02  6:00 ` Butler, Peter
2014-05-02 11:34 ` Neil Horman
2014-05-02 11:45 ` Butler, Peter
2014-05-02 13:35 ` Neil Horman
2014-05-02 16:33 ` Butler, Peter
2014-05-02 16:37 ` Neil Horman
2014-05-02 16:52 ` Butler, Peter
2014-05-02 17:07 ` Vlad Yasevich
2014-05-02 17:10 ` Butler, Peter
2014-05-02 17:34 ` Vlad Yasevich
2014-05-02 19:13 ` Butler, Peter
2014-05-02 19:39 ` Vlad Yasevich
2014-05-02 20:14 ` Butler, Peter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.