netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* retransmissions/out-of-order packets with large (~1 MB) writes/reads w/ TCP (SOCK_STREAM) over localhost loopback
@ 2020-02-25 15:08 Ron Rechenmacher
  0 siblings, 0 replies; only message in thread
From: Ron Rechenmacher @ 2020-02-25 15:08 UTC (permalink / raw)
  To: netdev

I'm seeing this and have been googling for days to try to determine if I should be
surprised by this (which I am) or if anyone else is seeing it, but I haven't found any answers.
Apologies if I'm just missing something. Someone mentioned a loopback specification,
but I can't find it.

Here's one thing that I'm doing (tried on several modern kernel, including 5.5.4):

sudo tcpdump -s78 -wt.tcpdump -ilo port 7001 & tcpdump_pid=$!; \
taskset -c 1 ./tcp_loopback.py -s -b1048576 & sleep .5; taskset -c 2 ./tcp_loopback.py -c -b1048576 --count=8192; \
sudo kill $tcpdump_pid; \
tshark -r t.tcpdump | grep -i retrans

The tcp_loopback.py script is available at home.fnal.gov/~ron/tcp_loopback.py

The heart of the server portion (-s) is:

sock = socket.socket( socket.AF_INET, socket.SOCK_STREAM )
     sock.setsockopt( socket.SOL_SOCKET, socket.SO_REUSEADDR, 1 )
     sock.bind( ('127.0.0.1', port) )
     sock.listen( 4 )
     sockconn,address = sock.accept()
     while 1:
         data = sockconn.recv(bs)
         if opargs['-v']=='': print('received: '+str(len(data)))
         if len(data) == 0:
             if opargs['-v']=='': print('0 data, closing')
             break

The heart of the client portion (-s) is:

     sock = socket.socket( socket.AF_INET, socket.SOCK_STREAM )
     sock.connect( ('127.0.0.1',port) )
     for xx in range(cnt): sock.send( '*'*bs )

The probability of retrans seems to increase with larger (i.e. 2M, 4M) writes/reads.

I've read (e.g. Documentation/networking/scaling.rst) about out-of-order issues related
to scheduling on different cores, hence the use of taskset above.

Is there a way to prevent this from happening (while still using large writes/reads at
high rate)?

With loopback, don't really know if I'm looking at the send processing sending things
out-of-order or the receive processing receiving things out-of-order. My ultimate goal is to
establish a baseline for low-latency inter-node transmission in a 100 Gi, high congestion
(many-to-one) environment. I developed an application which uses the "debug socket" to get
retransmission information and I was surprised to see retransmissions on localhost.

Can anyone please help me understand what's happening and if there are any knobs to turn to
eliminate retransmission while still maximizing data rate?

Thanks,
Ron

Example output:

/home/ron/notes
ron@ronlap77 :^) sudo tcpdump -s78 -wt.tcpdump -ilo port 7001 & tcpdump_pid=$!; \
 > taskset -c 1 ./tcp_loopback.py -s -b1048576 & sleep .5; taskset -c 2 ./tcp_loopback.py -c -b1048576 --count=8192; \
 > sudo kill $tcpdump_pid; \
 > tshark -r t.tcpdump | grep -i retrans
[1] 31571
[2] 31572
tcpdump: listening on lo, link-type EN10MB (Ethernet), capture size 78 bytes
207707 packets captured
415440 packets received by filter
0 packets dropped by kernel
[2]+  Done                    taskset -c 1 ./tcp_loopback.py -s -b1048576
77373   0.442572    127.0.0.1 → 127.0.0.1    TCP 65549 [TCP Retransmission] 44842 → 7001 [ACK] Seq=3201826393 Ack=1 Win=65536 Len=65483 TSval=2684544027 TSecr=2684544026
77374   0.442574    127.0.0.1 → 127.0.0.1    TCP 65549 [TCP Retransmission] 44842 → 7001 [ACK] Seq=3201891876 Ack=1 Win=65536 Len=65483 TSval=2684544027 TSecr=2684544026
77375   0.442576    127.0.0.1 → 127.0.0.1    TCP 65549 [TCP Retransmission] 44842 → 7001 [ACK] Seq=3201957359 Ack=1 Win=65536 Len=65483 TSval=2684544027 TSecr=2684544026
79452   0.454359    127.0.0.1 → 127.0.0.1    TCP 65549 [TCP Spurious Retransmission] 44842 → 7001 [ACK] Seq=3313696610 Ack=1 Win=65536 Len=65483 TSval=2684544038 TSecr=2684544038
79453   0.454362    127.0.0.1 → 127.0.0.1    TCP 65549 [TCP Retransmission] 44842 → 7001 [ACK] Seq=3313893059 Ack=1 Win=65536 Len=65483 TSval=2684544038 TSecr=2684544038
79454   0.454365    127.0.0.1 → 127.0.0.1    TCP 65549 [TCP Retransmission] 44842 → 7001 [ACK] Seq=3313958542 Ack=1 Win=65536 Len=65483 TSval=2684544038 TSecr=2684544038
79455   0.454367    127.0.0.1 → 127.0.0.1    TCP 65549 [TCP Retransmission] 44842 → 7001 [ACK] Seq=3314024025 Ack=1 Win=65536 Len=65483 TSval=2684544038 TSecr=2684544038
79456   0.454370    127.0.0.1 → 127.0.0.1    TCP 65549 [TCP Retransmission] 44842 → 7001 [ACK] Seq=3314089508 Ack=1 Win=65536 Len=65483 TSval=2684544038 TSecr=2684544038
79457   0.454373    127.0.0.1 → 127.0.0.1    TCP 65549 [TCP Retransmission] 44842 → 7001 [ACK] Seq=3314154991 Ack=1 Win=65536 Len=65483 TSval=2684544038 TSecr=2684544038
[1]+  Done                    sudo tcpdump -s78 -wt.tcpdump -ilo port 7001
--2020-02-25_09:02:16--



^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2020-02-25 15:08 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-25 15:08 retransmissions/out-of-order packets with large (~1 MB) writes/reads w/ TCP (SOCK_STREAM) over localhost loopback Ron Rechenmacher

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).