All of lore.kernel.org
 help / color / mirror / Atom feed
* TCP socket send return EAGAIN unexpectedly when sending small fragments
@ 2022-06-10 12:48 Ronny Meeus
  2022-06-10 14:21 ` Eric Dumazet
  0 siblings, 1 reply; 9+ messages in thread
From: Ronny Meeus @ 2022-06-10 12:48 UTC (permalink / raw)
  To: netdev; +Cc: Ronny Meeus

Hello

I have a small test application written in C that creates a local (in
process) TCP channel via loopback.
(kernel version is 3.10.0 but the same issue is also seen for example
on a 4.9 kernel).

On the client side, an SO_SNDBUF of 5Kb is configured (which is
doubled by the kernel) while on the server side the default size is
used.
Both client and server side are running in non-blocking mode.

The server side is not reading its socket so all data is simply queued
in the socket's receive buffer.
On the client side, the client is writing data into the socket in a
loop until the write returns an error (EAGAIN in this case).

Depending on the size of data I send on this socket, I get the EAGAIN
on the client side already after a small number of messages.
For example when sending 106 byte messages, I see the first EAGAIN
after 21 write calls:
# ./tcp_client_server 106
using data size 106
client send buffer size 5000
client socket snd_buf: 10000
ERRNO = 11 count=21

The same is observed for all sizes smaller than or equal to 107 bytes.

When getting the socket stats using ss I see all data (2226b) pending
in the socket on the server side:
# ss -tmi  | grep -i X11 -A 1
ESTAB      0      0      127.0.0.1:59792                127.0.0.1:x11
skmem:(r0,rb1061296,t0,tb10000,f0,w0,o0,bl0,d0) cubic wscale:7,7
rto:202 rtt:1.276/2.504 mss:21888 rcvmss:536 advmss:65483 cwnd:10
bytes_acked:2227 segs_out:24 segs_in:18 send 1372.3Mbps lastsnd:3883
lastrcv:771546999 lastack:3883 pacing_rate 2744.3Mbps retrans:0/1
rcv_space:43690
--
ESTAB      2226   0      127.0.0.1:x11
127.0.0.1:59792
skmem:(r4608,rb1061488,t0,tb2626560,f3584,w0,o0,bl0,d1) cubic
wscale:7,7 rto:200 ato:40 mss:21888 rcvmss:536 advmss:65483 cwnd:10
bytes_received:2226 segs_out:17 segs_in:24 lastsnd:3893 lastrcv:3893
lastack:3883 rcv_space:43690


When sending larger messages, the EAGAIN only is seen after 2116 writes.
# ./tcp_client_server 108
using data size 108
client send buffer size 5000
client socket snd_buf: 10000
ERRNO = 11 count=2116

Again, the ss shows all data being present on the server side (108 *
2116 = 228528)
ESTAB      228528 0      127.0.0.1:x11
127.0.0.1:59830
skmem:(r976896,rb1061488,t0,tb2626560,f2048,w0,o0,bl0,d1) cubic
wscale:7,7 rto:200 ato:80 mss:21888 rcvmss:536 advmss:65483 cwnd:10
bytes_received:228528 segs_out:436 segs_in:2119 lastsnd:3615
lastrcv:3606 lastack:3596 rcv_rtt:1 rcv_space:43690
--
ESTAB      0      0      127.0.0.1:59830                127.0.0.1:x11
skmem:(r0,rb1061296,t0,tb10000,f0,w0,o0,bl0,d0) cubic wscale:7,7
rto:206 rtt:5.016/9.996 mss:21888 rcvmss:536 advmss:65483 cwnd:10
bytes_acked:228529 segs_out:2119 segs_in:437 send 349.1Mbps
lastsnd:3596 lastrcv:771704718 lastack:3566 pacing_rate 698.1Mbps
retrans:0/1 rcv_space:43690



When I enlarge the SNDBUF on the client side to for example 10K, I see
that more messages can be sent:
# ./tcp_client_server 106 10000
using data size 106
client send buffer size 10000
client socket snd_buf: 20000
ERRNO = 11 count=1291

I also captured the packets on the interface using wireshark and it
looks like the error is returned after the TCP layer has done a
retransmit (after 10ms) of the last packet sent on the connection.

10:12:38.186451 IP localhost.48470 > localhost.etlservicemgr: Flags
[P.], seq 1165:1265, ack 165, win 342, options [nop,nop,TS val
593860562 ecr 593860562], length 100
10:12:38.186461 IP localhost.etlservicemgr > localhost.48470: Flags
[.], ack 1265, win 342, options [nop,nop,TS val 593860562 ecr
593860562], length 0
10:12:38.186478 IP localhost.48470 > localhost.etlservicemgr: Flags
[P.], seq 1265:1365, ack 165, win 342, options [nop,nop,TS val
593860562 ecr 593860562], length 100
10:12:38.186488 IP localhost.etlservicemgr > localhost.48470: Flags
[.], ack 1365, win 342, options [nop,nop,TS val 593860562 ecr
593860562], length 0
10:12:38.186505 IP localhost.48470 > localhost.etlservicemgr: Flags
[P.], seq 1365:1465, ack 165, win 342, options [nop,nop,TS val
593860562 ecr 593860562], length 100
10:12:38.186516 IP localhost.etlservicemgr > localhost.48470: Flags
[.], ack 1465, win 342, options [nop,nop,TS val 593860562 ecr
593860562], length 0
10:12:38.186533 IP localhost.48470 > localhost.etlservicemgr: Flags
[P.], seq 1465:1565, ack 165, win 342, options [nop,nop,TS val
593860562 ecr 593860562], length 100
10:12:38.186543 IP localhost.etlservicemgr > localhost.48470: Flags
[.], ack 1565, win 342, options [nop,nop,TS val 593860562 ecr
593860562], length 0
10:12:38.186560 IP localhost.48470 > localhost.etlservicemgr: Flags
[P.], seq 1565:1665, ack 165, win 342, options [nop,nop,TS val
593860562 ecr 593860562], length 100
10:12:38.186578 IP localhost.48470 > localhost.etlservicemgr: Flags
[P.], seq 1665:1765, ack 165, win 342, options [nop,nop,TS val
593860562 ecr 593860562], length 100
10:12:38.186595 IP localhost.48470 > localhost.etlservicemgr: Flags
[P.], seq 1765:1865, ack 165, win 342, options [nop,nop,TS val
593860562 ecr 593860562], length 100
10:12:38.186615 IP localhost.48470 > localhost.etlservicemgr: Flags
[P.], seq 1865:1965, ack 165, win 342, options [nop,nop,TS val
593860562 ecr 593860562], length 100
10:12:38.186632 IP localhost.48470 > localhost.etlservicemgr: Flags
[P.], seq 1965:2065, ack 165, win 342, options [nop,nop,TS val
593860562 ecr 593860562], length 100
10:12:38.196064 IP localhost.48470 > localhost.etlservicemgr: Flags
[P.], seq 1965:2065, ack 165, win 342, options [nop,nop,TS val
593860572 ecr 593860562], length 100
10:12:38.196128 IP localhost.etlservicemgr > localhost.48470: Flags
[.], ack 2065, win 342, options [nop,nop,TS val 593860572 ecr
593860562,nop,nop,sack 1 {1965:2065}], length 0

Now my question is: how is it possible that I can only send 21
messages of 106 bytes before I see the EAGAIN while when sending
larger messages I can send 2K+ messages.

Thanks

Best regards,
Ronny

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: TCP socket send return EAGAIN unexpectedly when sending small fragments
  2022-06-10 12:48 TCP socket send return EAGAIN unexpectedly when sending small fragments Ronny Meeus
@ 2022-06-10 14:21 ` Eric Dumazet
  2022-06-10 15:16   ` Ronny Meeus
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2022-06-10 14:21 UTC (permalink / raw)
  To: Ronny Meeus, netdev


On 6/10/22 05:48, Ronny Meeus wrote:
> Hello
>
> I have a small test application written in C that creates a local (in
> process) TCP channel via loopback.
> (kernel version is 3.10.0 but the same issue is also seen for example
> on a 4.9 kernel).
>
> On the client side, an SO_SNDBUF of 5Kb is configured (which is
> doubled by the kernel) while on the server side the default size is
> used.
> Both client and server side are running in non-blocking mode.
>
> The server side is not reading its socket so all data is simply queued
> in the socket's receive buffer.
> On the client side, the client is writing data into the socket in a
> loop until the write returns an error (EAGAIN in this case).
>
> Depending on the size of data I send on this socket, I get the EAGAIN
> on the client side already after a small number of messages.
> For example when sending 106 byte messages, I see the first EAGAIN
> after 21 write calls:
> # ./tcp_client_server 106
> using data size 106
> client send buffer size 5000
> client socket snd_buf: 10000
> ERRNO = 11 count=21
>
> The same is observed for all sizes smaller than or equal to 107 bytes.
>
> When getting the socket stats using ss I see all data (2226b) pending
> in the socket on the server side:
> # ss -tmi  | grep -i X11 -A 1
> ESTAB      0      0      127.0.0.1:59792                127.0.0.1:x11
> skmem:(r0,rb1061296,t0,tb10000,f0,w0,o0,bl0,d0) cubic wscale:7,7
> rto:202 rtt:1.276/2.504 mss:21888 rcvmss:536 advmss:65483 cwnd:10
> bytes_acked:2227 segs_out:24 segs_in:18 send 1372.3Mbps lastsnd:3883
> lastrcv:771546999 lastack:3883 pacing_rate 2744.3Mbps retrans:0/1
> rcv_space:43690
> --
> ESTAB      2226   0      127.0.0.1:x11
> 127.0.0.1:59792
> skmem:(r4608,rb1061488,t0,tb2626560,f3584,w0,o0,bl0,d1) cubic
> wscale:7,7 rto:200 ato:40 mss:21888 rcvmss:536 advmss:65483 cwnd:10
> bytes_received:2226 segs_out:17 segs_in:24 lastsnd:3893 lastrcv:3893
> lastack:3883 rcv_space:43690
>
>
> When sending larger messages, the EAGAIN only is seen after 2116 writes.
> # ./tcp_client_server 108
> using data size 108
> client send buffer size 5000
> client socket snd_buf: 10000
> ERRNO = 11 count=2116
>
> Again, the ss shows all data being present on the server side (108 *
> 2116 = 228528)
> ESTAB      228528 0      127.0.0.1:x11
> 127.0.0.1:59830
> skmem:(r976896,rb1061488,t0,tb2626560,f2048,w0,o0,bl0,d1) cubic
> wscale:7,7 rto:200 ato:80 mss:21888 rcvmss:536 advmss:65483 cwnd:10
> bytes_received:228528 segs_out:436 segs_in:2119 lastsnd:3615
> lastrcv:3606 lastack:3596 rcv_rtt:1 rcv_space:43690
> --
> ESTAB      0      0      127.0.0.1:59830                127.0.0.1:x11
> skmem:(r0,rb1061296,t0,tb10000,f0,w0,o0,bl0,d0) cubic wscale:7,7
> rto:206 rtt:5.016/9.996 mss:21888 rcvmss:536 advmss:65483 cwnd:10
> bytes_acked:228529 segs_out:2119 segs_in:437 send 349.1Mbps
> lastsnd:3596 lastrcv:771704718 lastack:3566 pacing_rate 698.1Mbps
> retrans:0/1 rcv_space:43690
>
>
>
> When I enlarge the SNDBUF on the client side to for example 10K, I see
> that more messages can be sent:
> # ./tcp_client_server 106 10000
> using data size 106
> client send buffer size 10000
> client socket snd_buf: 20000
> ERRNO = 11 count=1291
>
> I also captured the packets on the interface using wireshark and it
> looks like the error is returned after the TCP layer has done a
> retransmit (after 10ms) of the last packet sent on the connection.
>
> 10:12:38.186451 IP localhost.48470 > localhost.etlservicemgr: Flags
> [P.], seq 1165:1265, ack 165, win 342, options [nop,nop,TS val
> 593860562 ecr 593860562], length 100
> 10:12:38.186461 IP localhost.etlservicemgr > localhost.48470: Flags
> [.], ack 1265, win 342, options [nop,nop,TS val 593860562 ecr
> 593860562], length 0
> 10:12:38.186478 IP localhost.48470 > localhost.etlservicemgr: Flags
> [P.], seq 1265:1365, ack 165, win 342, options [nop,nop,TS val
> 593860562 ecr 593860562], length 100
> 10:12:38.186488 IP localhost.etlservicemgr > localhost.48470: Flags
> [.], ack 1365, win 342, options [nop,nop,TS val 593860562 ecr
> 593860562], length 0
> 10:12:38.186505 IP localhost.48470 > localhost.etlservicemgr: Flags
> [P.], seq 1365:1465, ack 165, win 342, options [nop,nop,TS val
> 593860562 ecr 593860562], length 100
> 10:12:38.186516 IP localhost.etlservicemgr > localhost.48470: Flags
> [.], ack 1465, win 342, options [nop,nop,TS val 593860562 ecr
> 593860562], length 0
> 10:12:38.186533 IP localhost.48470 > localhost.etlservicemgr: Flags
> [P.], seq 1465:1565, ack 165, win 342, options [nop,nop,TS val
> 593860562 ecr 593860562], length 100
> 10:12:38.186543 IP localhost.etlservicemgr > localhost.48470: Flags
> [.], ack 1565, win 342, options [nop,nop,TS val 593860562 ecr
> 593860562], length 0
> 10:12:38.186560 IP localhost.48470 > localhost.etlservicemgr: Flags
> [P.], seq 1565:1665, ack 165, win 342, options [nop,nop,TS val
> 593860562 ecr 593860562], length 100
> 10:12:38.186578 IP localhost.48470 > localhost.etlservicemgr: Flags
> [P.], seq 1665:1765, ack 165, win 342, options [nop,nop,TS val
> 593860562 ecr 593860562], length 100
> 10:12:38.186595 IP localhost.48470 > localhost.etlservicemgr: Flags
> [P.], seq 1765:1865, ack 165, win 342, options [nop,nop,TS val
> 593860562 ecr 593860562], length 100
> 10:12:38.186615 IP localhost.48470 > localhost.etlservicemgr: Flags
> [P.], seq 1865:1965, ack 165, win 342, options [nop,nop,TS val
> 593860562 ecr 593860562], length 100
> 10:12:38.186632 IP localhost.48470 > localhost.etlservicemgr: Flags
> [P.], seq 1965:2065, ack 165, win 342, options [nop,nop,TS val
> 593860562 ecr 593860562], length 100
> 10:12:38.196064 IP localhost.48470 > localhost.etlservicemgr: Flags
> [P.], seq 1965:2065, ack 165, win 342, options [nop,nop,TS val
> 593860572 ecr 593860562], length 100
> 10:12:38.196128 IP localhost.etlservicemgr > localhost.48470: Flags
> [.], ack 2065, win 342, options [nop,nop,TS val 593860572 ecr
> 593860562,nop,nop,sack 1 {1965:2065}], length 0
>
> Now my question is: how is it possible that I can only send 21
> messages of 106 bytes before I see the EAGAIN while when sending
> larger messages I can send 2K+ messages.


This is because kernel tracks kernel memory usage.

Small packets incur more overhead.

The doubling of SO_SNDBUF user value, is an heuristic based on the fact 
the kernel

uses a 2x overhead estimation,

while effective overhead can vary from ~1.001x to ~768x depending on 
number of bytes per skb.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: TCP socket send return EAGAIN unexpectedly when sending small fragments
  2022-06-10 14:21 ` Eric Dumazet
@ 2022-06-10 15:16   ` Ronny Meeus
  2022-06-10 15:21     ` David Laight
  2022-06-10 17:46     ` Eric Dumazet
  0 siblings, 2 replies; 9+ messages in thread
From: Ronny Meeus @ 2022-06-10 15:16 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

Op vr 10 jun. 2022 om 16:21 schreef Eric Dumazet <erdnetdev@gmail.com>:
>
>
> On 6/10/22 05:48, Ronny Meeus wrote:
> > Hello
> >
> > I have a small test application written in C that creates a local (in
> > process) TCP channel via loopback.
> > (kernel version is 3.10.0 but the same issue is also seen for example
> > on a 4.9 kernel).
> >
> > On the client side, an SO_SNDBUF of 5Kb is configured (which is
> > doubled by the kernel) while on the server side the default size is
> > used.
> > Both client and server side are running in non-blocking mode.
> >
> > The server side is not reading its socket so all data is simply queued
> > in the socket's receive buffer.
> > On the client side, the client is writing data into the socket in a
> > loop until the write returns an error (EAGAIN in this case).
> >
> > Depending on the size of data I send on this socket, I get the EAGAIN
> > on the client side already after a small number of messages.
> > For example when sending 106 byte messages, I see the first EAGAIN
> > after 21 write calls:
> > # ./tcp_client_server 106
> > using data size 106
> > client send buffer size 5000
> > client socket snd_buf: 10000
> > ERRNO = 11 count=21
> >
> > The same is observed for all sizes smaller than or equal to 107 bytes.
> >
> > When getting the socket stats using ss I see all data (2226b) pending
> > in the socket on the server side:
> > # ss -tmi  | grep -i X11 -A 1
> > ESTAB      0      0      127.0.0.1:59792                127.0.0.1:x11
> > skmem:(r0,rb1061296,t0,tb10000,f0,w0,o0,bl0,d0) cubic wscale:7,7
> > rto:202 rtt:1.276/2.504 mss:21888 rcvmss:536 advmss:65483 cwnd:10
> > bytes_acked:2227 segs_out:24 segs_in:18 send 1372.3Mbps lastsnd:3883
> > lastrcv:771546999 lastack:3883 pacing_rate 2744.3Mbps retrans:0/1
> > rcv_space:43690
> > --
> > ESTAB      2226   0      127.0.0.1:x11
> > 127.0.0.1:59792
> > skmem:(r4608,rb1061488,t0,tb2626560,f3584,w0,o0,bl0,d1) cubic
> > wscale:7,7 rto:200 ato:40 mss:21888 rcvmss:536 advmss:65483 cwnd:10
> > bytes_received:2226 segs_out:17 segs_in:24 lastsnd:3893 lastrcv:3893
> > lastack:3883 rcv_space:43690
> >
> >
> > When sending larger messages, the EAGAIN only is seen after 2116 writes.
> > # ./tcp_client_server 108
> > using data size 108
> > client send buffer size 5000
> > client socket snd_buf: 10000
> > ERRNO = 11 count=2116
> >
> > Again, the ss shows all data being present on the server side (108 *
> > 2116 = 228528)
> > ESTAB      228528 0      127.0.0.1:x11
> > 127.0.0.1:59830
> > skmem:(r976896,rb1061488,t0,tb2626560,f2048,w0,o0,bl0,d1) cubic
> > wscale:7,7 rto:200 ato:80 mss:21888 rcvmss:536 advmss:65483 cwnd:10
> > bytes_received:228528 segs_out:436 segs_in:2119 lastsnd:3615
> > lastrcv:3606 lastack:3596 rcv_rtt:1 rcv_space:43690
> > --
> > ESTAB      0      0      127.0.0.1:59830                127.0.0.1:x11
> > skmem:(r0,rb1061296,t0,tb10000,f0,w0,o0,bl0,d0) cubic wscale:7,7
> > rto:206 rtt:5.016/9.996 mss:21888 rcvmss:536 advmss:65483 cwnd:10
> > bytes_acked:228529 segs_out:2119 segs_in:437 send 349.1Mbps
> > lastsnd:3596 lastrcv:771704718 lastack:3566 pacing_rate 698.1Mbps
> > retrans:0/1 rcv_space:43690
> >
> >
> >
> > When I enlarge the SNDBUF on the client side to for example 10K, I see
> > that more messages can be sent:
> > # ./tcp_client_server 106 10000
> > using data size 106
> > client send buffer size 10000
> > client socket snd_buf: 20000
> > ERRNO = 11 count=1291
> >
> > I also captured the packets on the interface using wireshark and it
> > looks like the error is returned after the TCP layer has done a
> > retransmit (after 10ms) of the last packet sent on the connection.
> >
> > 10:12:38.186451 IP localhost.48470 > localhost.etlservicemgr: Flags
> > [P.], seq 1165:1265, ack 165, win 342, options [nop,nop,TS val
> > 593860562 ecr 593860562], length 100
> > 10:12:38.186461 IP localhost.etlservicemgr > localhost.48470: Flags
> > [.], ack 1265, win 342, options [nop,nop,TS val 593860562 ecr
> > 593860562], length 0
> > 10:12:38.186478 IP localhost.48470 > localhost.etlservicemgr: Flags
> > [P.], seq 1265:1365, ack 165, win 342, options [nop,nop,TS val
> > 593860562 ecr 593860562], length 100
> > 10:12:38.186488 IP localhost.etlservicemgr > localhost.48470: Flags
> > [.], ack 1365, win 342, options [nop,nop,TS val 593860562 ecr
> > 593860562], length 0
> > 10:12:38.186505 IP localhost.48470 > localhost.etlservicemgr: Flags
> > [P.], seq 1365:1465, ack 165, win 342, options [nop,nop,TS val
> > 593860562 ecr 593860562], length 100
> > 10:12:38.186516 IP localhost.etlservicemgr > localhost.48470: Flags
> > [.], ack 1465, win 342, options [nop,nop,TS val 593860562 ecr
> > 593860562], length 0
> > 10:12:38.186533 IP localhost.48470 > localhost.etlservicemgr: Flags
> > [P.], seq 1465:1565, ack 165, win 342, options [nop,nop,TS val
> > 593860562 ecr 593860562], length 100
> > 10:12:38.186543 IP localhost.etlservicemgr > localhost.48470: Flags
> > [.], ack 1565, win 342, options [nop,nop,TS val 593860562 ecr
> > 593860562], length 0
> > 10:12:38.186560 IP localhost.48470 > localhost.etlservicemgr: Flags
> > [P.], seq 1565:1665, ack 165, win 342, options [nop,nop,TS val
> > 593860562 ecr 593860562], length 100
> > 10:12:38.186578 IP localhost.48470 > localhost.etlservicemgr: Flags
> > [P.], seq 1665:1765, ack 165, win 342, options [nop,nop,TS val
> > 593860562 ecr 593860562], length 100
> > 10:12:38.186595 IP localhost.48470 > localhost.etlservicemgr: Flags
> > [P.], seq 1765:1865, ack 165, win 342, options [nop,nop,TS val
> > 593860562 ecr 593860562], length 100
> > 10:12:38.186615 IP localhost.48470 > localhost.etlservicemgr: Flags
> > [P.], seq 1865:1965, ack 165, win 342, options [nop,nop,TS val
> > 593860562 ecr 593860562], length 100
> > 10:12:38.186632 IP localhost.48470 > localhost.etlservicemgr: Flags
> > [P.], seq 1965:2065, ack 165, win 342, options [nop,nop,TS val
> > 593860562 ecr 593860562], length 100
> > 10:12:38.196064 IP localhost.48470 > localhost.etlservicemgr: Flags
> > [P.], seq 1965:2065, ack 165, win 342, options [nop,nop,TS val
> > 593860572 ecr 593860562], length 100
> > 10:12:38.196128 IP localhost.etlservicemgr > localhost.48470: Flags
> > [.], ack 2065, win 342, options [nop,nop,TS val 593860572 ecr
> > 593860562,nop,nop,sack 1 {1965:2065}], length 0
> >
> > Now my question is: how is it possible that I can only send 21
> > messages of 106 bytes before I see the EAGAIN while when sending
> > larger messages I can send 2K+ messages.
>
>
> This is because kernel tracks kernel memory usage.
>
> Small packets incur more overhead.
>
> The doubling of SO_SNDBUF user value, is an heuristic based on the fact
> the kernel
>
> uses a 2x overhead estimation,
>
> while effective overhead can vary from ~1.001x to ~768x depending on
> number of bytes per skb.
>

Hi Eric,

thanks for the feedback but it is not completely clear to me.

The small SNDBUF is on the client side and a large part of the data
has already ACKed by the receiving side.
Only the last 5 or 6 messages are waiting to be ACKed.
So I would expect that the biggest part of the overhead is located on
the receiving side where there is sufficient memory space (default
RCVBUF size is used).

If the 5 queued packets on the sending side would cause the EAGAIN
issue, the real question maybe is why the receiving side is not
sending the ACK within the 10ms while for earlier messages the ACK is
sent much sooner.

Do you have any clue which kernel function does generate this EAGAIN?
And what would be the correct solution to this problem?

Best regards,
Ronny

>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: TCP socket send return EAGAIN unexpectedly when sending small fragments
  2022-06-10 15:16   ` Ronny Meeus
@ 2022-06-10 15:21     ` David Laight
  2022-06-10 17:16       ` Ronny Meeus
  2022-06-10 17:46     ` Eric Dumazet
  1 sibling, 1 reply; 9+ messages in thread
From: David Laight @ 2022-06-10 15:21 UTC (permalink / raw)
  To: 'Ronny Meeus', Eric Dumazet; +Cc: netdev

...
> If the 5 queued packets on the sending side would cause the EAGAIN
> issue, the real question maybe is why the receiving side is not
> sending the ACK within the 10ms while for earlier messages the ACK is
> sent much sooner.

Have you disabled Nagle (TCP_NODELAY) ?

Nagle only really works for bulk data transfer (large sends)
and interactive sessions (command - response).
For nearly everything else it adds unwanted 100ms delays.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: TCP socket send return EAGAIN unexpectedly when sending small fragments
  2022-06-10 15:21     ` David Laight
@ 2022-06-10 17:16       ` Ronny Meeus
  2022-06-10 17:42         ` Willy Tarreau
  0 siblings, 1 reply; 9+ messages in thread
From: Ronny Meeus @ 2022-06-10 17:16 UTC (permalink / raw)
  To: David Laight; +Cc: Eric Dumazet, netdev

Op vr 10 jun. 2022 om 17:21 schreef David Laight <David.Laight@aculab.com>:
>
> ...
> > If the 5 queued packets on the sending side would cause the EAGAIN
> > issue, the real question maybe is why the receiving side is not
> > sending the ACK within the 10ms while for earlier messages the ACK is
> > sent much sooner.
>
> Have you disabled Nagle (TCP_NODELAY) ?

Yes I enabled TCP_NODELAY so the Nagle algo is disabled.
I did a lot of tests over the last couple of days but if I remember well
enable or disable TCP_NODELAY does not influence the result.

> Nagle only really works for bulk data transfer (large sends)
> and interactive sessions (command - response).
> For nearly everything else it adds unwanted 100ms delays.
>
>         David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: TCP socket send return EAGAIN unexpectedly when sending small fragments
  2022-06-10 17:16       ` Ronny Meeus
@ 2022-06-10 17:42         ` Willy Tarreau
  2022-06-10 18:14           ` Ronny Meeus
  0 siblings, 1 reply; 9+ messages in thread
From: Willy Tarreau @ 2022-06-10 17:42 UTC (permalink / raw)
  To: Ronny Meeus; +Cc: David Laight, Eric Dumazet, netdev

On Fri, Jun 10, 2022 at 07:16:06PM +0200, Ronny Meeus wrote:
> Op vr 10 jun. 2022 om 17:21 schreef David Laight <David.Laight@aculab.com>:
> >
> > ...
> > > If the 5 queued packets on the sending side would cause the EAGAIN
> > > issue, the real question maybe is why the receiving side is not
> > > sending the ACK within the 10ms while for earlier messages the ACK is
> > > sent much sooner.
> >
> > Have you disabled Nagle (TCP_NODELAY) ?
> 
> Yes I enabled TCP_NODELAY so the Nagle algo is disabled.
> I did a lot of tests over the last couple of days but if I remember well
> enable or disable TCP_NODELAY does not influence the result.

There are many possible causes for what you're observing. For example
if your NIC has too small a tx ring and small buffers, you can imagine
that the Nx106 bytes fit in the buffers but not the N*107, which cause
a tiny delay waiting for the Tx IRQ to recycle the buffers, and that
during this time your subsequent send() are coalesced into larger
segments that are sent at once when using 107.

If you do not want packets to be sent individually and you know you
still have more to come, you need to put MSG_MORE on the send() flags
(or to disable TCP_NODELAY).

Clearly, when running with TCP_NODELAY you're asking the whole stack
"do your best to send as fast as possible", which implies "without any
consideration for efficiency optimization". I've seen a situation in the
past where it was impossible to send any extra segment after a first
unacked PUSH was in flight. Simply sending full segments was enough to
considerably increase the performance. I analysed this as a result of
the SWS avoidance algorithm and concluded that it was normal in that
situation, though I've not witnessed it anymore in a while.

So just keep in mind to try not to abuse TCP_NODELAY too much.

Willy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: TCP socket send return EAGAIN unexpectedly when sending small fragments
  2022-06-10 15:16   ` Ronny Meeus
  2022-06-10 15:21     ` David Laight
@ 2022-06-10 17:46     ` Eric Dumazet
  2022-06-10 18:21       ` Ronny Meeus
  1 sibling, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2022-06-10 17:46 UTC (permalink / raw)
  To: Ronny Meeus; +Cc: netdev


On 6/10/22 08:16, Ronny Meeus wrote:
> Op vr 10 jun. 2022 om 16:21 schreef Eric Dumazet <erdnetdev@gmail.com>:
>>
>> On 6/10/22 05:48, Ronny Meeus wrote:
>>> Hello
>>>
>>> I have a small test application written in C that creates a local (in
>>> process) TCP channel via loopback.
>>> (kernel version is 3.10.0 but the same issue is also seen for example
>>> on a 4.9 kernel).
>>>
>>> On the client side, an SO_SNDBUF of 5Kb is configured (which is
>>> doubled by the kernel) while on the server side the default size is
>>> used.
>>> Both client and server side are running in non-blocking mode.
>>>
>>> The server side is not reading its socket so all data is simply queued
>>> in the socket's receive buffer.
>>> On the client side, the client is writing data into the socket in a
>>> loop until the write returns an error (EAGAIN in this case).
>>>
>>> Depending on the size of data I send on this socket, I get the EAGAIN
>>> on the client side already after a small number of messages.
>>> For example when sending 106 byte messages, I see the first EAGAIN
>>> after 21 write calls:
>>> # ./tcp_client_server 106
>>> using data size 106
>>> client send buffer size 5000
>>> client socket snd_buf: 10000
>>> ERRNO = 11 count=21
>>>
>>> The same is observed for all sizes smaller than or equal to 107 bytes.
>>>
>>> When getting the socket stats using ss I see all data (2226b) pending
>>> in the socket on the server side:
>>> # ss -tmi  | grep -i X11 -A 1
>>> ESTAB      0      0      127.0.0.1:59792                127.0.0.1:x11
>>> skmem:(r0,rb1061296,t0,tb10000,f0,w0,o0,bl0,d0) cubic wscale:7,7
>>> rto:202 rtt:1.276/2.504 mss:21888 rcvmss:536 advmss:65483 cwnd:10
>>> bytes_acked:2227 segs_out:24 segs_in:18 send 1372.3Mbps lastsnd:3883
>>> lastrcv:771546999 lastack:3883 pacing_rate 2744.3Mbps retrans:0/1
>>> rcv_space:43690
>>> --
>>> ESTAB      2226   0      127.0.0.1:x11
>>> 127.0.0.1:59792
>>> skmem:(r4608,rb1061488,t0,tb2626560,f3584,w0,o0,bl0,d1) cubic
>>> wscale:7,7 rto:200 ato:40 mss:21888 rcvmss:536 advmss:65483 cwnd:10
>>> bytes_received:2226 segs_out:17 segs_in:24 lastsnd:3893 lastrcv:3893
>>> lastack:3883 rcv_space:43690
>>>
>>>
>>> When sending larger messages, the EAGAIN only is seen after 2116 writes.
>>> # ./tcp_client_server 108
>>> using data size 108
>>> client send buffer size 5000
>>> client socket snd_buf: 10000
>>> ERRNO = 11 count=2116
>>>
>>> Again, the ss shows all data being present on the server side (108 *
>>> 2116 = 228528)
>>> ESTAB      228528 0      127.0.0.1:x11
>>> 127.0.0.1:59830
>>> skmem:(r976896,rb1061488,t0,tb2626560,f2048,w0,o0,bl0,d1) cubic
>>> wscale:7,7 rto:200 ato:80 mss:21888 rcvmss:536 advmss:65483 cwnd:10
>>> bytes_received:228528 segs_out:436 segs_in:2119 lastsnd:3615
>>> lastrcv:3606 lastack:3596 rcv_rtt:1 rcv_space:43690
>>> --
>>> ESTAB      0      0      127.0.0.1:59830                127.0.0.1:x11
>>> skmem:(r0,rb1061296,t0,tb10000,f0,w0,o0,bl0,d0) cubic wscale:7,7
>>> rto:206 rtt:5.016/9.996 mss:21888 rcvmss:536 advmss:65483 cwnd:10
>>> bytes_acked:228529 segs_out:2119 segs_in:437 send 349.1Mbps
>>> lastsnd:3596 lastrcv:771704718 lastack:3566 pacing_rate 698.1Mbps
>>> retrans:0/1 rcv_space:43690
>>>
>>>
>>>
>>> When I enlarge the SNDBUF on the client side to for example 10K, I see
>>> that more messages can be sent:
>>> # ./tcp_client_server 106 10000
>>> using data size 106
>>> client send buffer size 10000
>>> client socket snd_buf: 20000
>>> ERRNO = 11 count=1291
>>>
>>> I also captured the packets on the interface using wireshark and it
>>> looks like the error is returned after the TCP layer has done a
>>> retransmit (after 10ms) of the last packet sent on the connection.
>>>
>>> 10:12:38.186451 IP localhost.48470 > localhost.etlservicemgr: Flags
>>> [P.], seq 1165:1265, ack 165, win 342, options [nop,nop,TS val
>>> 593860562 ecr 593860562], length 100
>>> 10:12:38.186461 IP localhost.etlservicemgr > localhost.48470: Flags
>>> [.], ack 1265, win 342, options [nop,nop,TS val 593860562 ecr
>>> 593860562], length 0
>>> 10:12:38.186478 IP localhost.48470 > localhost.etlservicemgr: Flags
>>> [P.], seq 1265:1365, ack 165, win 342, options [nop,nop,TS val
>>> 593860562 ecr 593860562], length 100
>>> 10:12:38.186488 IP localhost.etlservicemgr > localhost.48470: Flags
>>> [.], ack 1365, win 342, options [nop,nop,TS val 593860562 ecr
>>> 593860562], length 0
>>> 10:12:38.186505 IP localhost.48470 > localhost.etlservicemgr: Flags
>>> [P.], seq 1365:1465, ack 165, win 342, options [nop,nop,TS val
>>> 593860562 ecr 593860562], length 100
>>> 10:12:38.186516 IP localhost.etlservicemgr > localhost.48470: Flags
>>> [.], ack 1465, win 342, options [nop,nop,TS val 593860562 ecr
>>> 593860562], length 0
>>> 10:12:38.186533 IP localhost.48470 > localhost.etlservicemgr: Flags
>>> [P.], seq 1465:1565, ack 165, win 342, options [nop,nop,TS val
>>> 593860562 ecr 593860562], length 100
>>> 10:12:38.186543 IP localhost.etlservicemgr > localhost.48470: Flags
>>> [.], ack 1565, win 342, options [nop,nop,TS val 593860562 ecr
>>> 593860562], length 0
>>> 10:12:38.186560 IP localhost.48470 > localhost.etlservicemgr: Flags
>>> [P.], seq 1565:1665, ack 165, win 342, options [nop,nop,TS val
>>> 593860562 ecr 593860562], length 100
>>> 10:12:38.186578 IP localhost.48470 > localhost.etlservicemgr: Flags
>>> [P.], seq 1665:1765, ack 165, win 342, options [nop,nop,TS val
>>> 593860562 ecr 593860562], length 100
>>> 10:12:38.186595 IP localhost.48470 > localhost.etlservicemgr: Flags
>>> [P.], seq 1765:1865, ack 165, win 342, options [nop,nop,TS val
>>> 593860562 ecr 593860562], length 100
>>> 10:12:38.186615 IP localhost.48470 > localhost.etlservicemgr: Flags
>>> [P.], seq 1865:1965, ack 165, win 342, options [nop,nop,TS val
>>> 593860562 ecr 593860562], length 100
>>> 10:12:38.186632 IP localhost.48470 > localhost.etlservicemgr: Flags
>>> [P.], seq 1965:2065, ack 165, win 342, options [nop,nop,TS val
>>> 593860562 ecr 593860562], length 100
>>> 10:12:38.196064 IP localhost.48470 > localhost.etlservicemgr: Flags
>>> [P.], seq 1965:2065, ack 165, win 342, options [nop,nop,TS val
>>> 593860572 ecr 593860562], length 100
>>> 10:12:38.196128 IP localhost.etlservicemgr > localhost.48470: Flags
>>> [.], ack 2065, win 342, options [nop,nop,TS val 593860572 ecr
>>> 593860562,nop,nop,sack 1 {1965:2065}], length 0
>>>
>>> Now my question is: how is it possible that I can only send 21
>>> messages of 106 bytes before I see the EAGAIN while when sending
>>> larger messages I can send 2K+ messages.
>>
>> This is because kernel tracks kernel memory usage.
>>
>> Small packets incur more overhead.
>>
>> The doubling of SO_SNDBUF user value, is an heuristic based on the fact
>> the kernel
>>
>> uses a 2x overhead estimation,
>>
>> while effective overhead can vary from ~1.001x to ~768x depending on
>> number of bytes per skb.
>>
> Hi Eric,
>
> thanks for the feedback but it is not completely clear to me.
>
> The small SNDBUF is on the client side and a large part of the data
> has already ACKed by the receiving side.
> Only the last 5 or 6 messages are waiting to be ACKed.
> So I would expect that the biggest part of the overhead is located on
> the receiving side where there is sufficient memory space (default
> RCVBUF size is used).
>
> If the 5 queued packets on the sending side would cause the EAGAIN
> issue, the real question maybe is why the receiving side is not
> sending the ACK within the 10ms while for earlier messages the ACK is
> sent much sooner.


delayed acks on receiver, when receiving small packet(s)


>
> Do you have any clue which kernel function does generate this EAGAIN?
> And what would be the correct solution to this problem?


I do not really see what is the problem you have.

Perhaps you should not set SO_SNDBUF, and let the kernel decide (auto 
tune, and auto cork)


>
> Best regards,
> Ronny
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: TCP socket send return EAGAIN unexpectedly when sending small fragments
  2022-06-10 17:42         ` Willy Tarreau
@ 2022-06-10 18:14           ` Ronny Meeus
  0 siblings, 0 replies; 9+ messages in thread
From: Ronny Meeus @ 2022-06-10 18:14 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: David Laight, Eric Dumazet, netdev

Op vr 10 jun. 2022 om 19:42 schreef Willy Tarreau <w@1wt.eu>:
>
> On Fri, Jun 10, 2022 at 07:16:06PM +0200, Ronny Meeus wrote:
> > Op vr 10 jun. 2022 om 17:21 schreef David Laight <David.Laight@aculab.com>:
> > >
> > > ...
> > > > If the 5 queued packets on the sending side would cause the EAGAIN
> > > > issue, the real question maybe is why the receiving side is not
> > > > sending the ACK within the 10ms while for earlier messages the ACK is
> > > > sent much sooner.
> > >
> > > Have you disabled Nagle (TCP_NODELAY) ?
> >
> > Yes I enabled TCP_NODELAY so the Nagle algo is disabled.
> > I did a lot of tests over the last couple of days but if I remember well
> > enable or disable TCP_NODELAY does not influence the result.
>
> There are many possible causes for what you're observing. For example
> if your NIC has too small a tx ring and small buffers, you can imagine
> that the Nx106 bytes fit in the buffers but not the N*107, which cause
> a tiny delay waiting for the Tx IRQ to recycle the buffers, and that
> during this time your subsequent send() are coalesced into larger
> segments that are sent at once when using 107.
>

The test is running over the loopback interface ...

> If you do not want packets to be sent individually and you know you
> still have more to come, you need to put MSG_MORE on the send() flags
> (or to disable TCP_NODELAY).

Like I said, TCP_NODELAY does not have an impact.

> Clearly, when running with TCP_NODELAY you're asking the whole stack
> "do your best to send as fast as possible", which implies "without any
> consideration for efficiency optimization". I've seen a situation in the
> past where it was impossible to send any extra segment after a first
> unacked PUSH was in flight. Simply sending full segments was enough to
> considerably increase the performance. I analysed this as a result of
> the SWS avoidance algorithm and concluded that it was normal in that
> situation, though I've not witnessed it anymore in a while.
>
> So just keep in mind to try not to abuse TCP_NODELAY too much.

Thanks Willy for the feedback.

>
> Willy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: TCP socket send return EAGAIN unexpectedly when sending small fragments
  2022-06-10 17:46     ` Eric Dumazet
@ 2022-06-10 18:21       ` Ronny Meeus
  0 siblings, 0 replies; 9+ messages in thread
From: Ronny Meeus @ 2022-06-10 18:21 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

Op vr 10 jun. 2022 om 19:47 schreef Eric Dumazet <eric.dumazet@gmail.com>:
>
>
> On 6/10/22 08:16, Ronny Meeus wrote:
> > Op vr 10 jun. 2022 om 16:21 schreef Eric Dumazet <erdnetdev@gmail.com>:
> >>
> >> On 6/10/22 05:48, Ronny Meeus wrote:
> >>> Hello
> >>>
> >>> I have a small test application written in C that creates a local (in
> >>> process) TCP channel via loopback.
> >>> (kernel version is 3.10.0 but the same issue is also seen for example
> >>> on a 4.9 kernel).
> >>>
> >>> On the client side, an SO_SNDBUF of 5Kb is configured (which is
> >>> doubled by the kernel) while on the server side the default size is
> >>> used.
> >>> Both client and server side are running in non-blocking mode.
> >>>
> >>> The server side is not reading its socket so all data is simply queued
> >>> in the socket's receive buffer.
> >>> On the client side, the client is writing data into the socket in a
> >>> loop until the write returns an error (EAGAIN in this case).
> >>>
> >>> Depending on the size of data I send on this socket, I get the EAGAIN
> >>> on the client side already after a small number of messages.
> >>> For example when sending 106 byte messages, I see the first EAGAIN
> >>> after 21 write calls:
> >>> # ./tcp_client_server 106
> >>> using data size 106
> >>> client send buffer size 5000
> >>> client socket snd_buf: 10000
> >>> ERRNO = 11 count=21
> >>>
> >>> The same is observed for all sizes smaller than or equal to 107 bytes.
> >>>
> >>> When getting the socket stats using ss I see all data (2226b) pending
> >>> in the socket on the server side:
> >>> # ss -tmi  | grep -i X11 -A 1
> >>> ESTAB      0      0      127.0.0.1:59792                127.0.0.1:x11
> >>> skmem:(r0,rb1061296,t0,tb10000,f0,w0,o0,bl0,d0) cubic wscale:7,7
> >>> rto:202 rtt:1.276/2.504 mss:21888 rcvmss:536 advmss:65483 cwnd:10
> >>> bytes_acked:2227 segs_out:24 segs_in:18 send 1372.3Mbps lastsnd:3883
> >>> lastrcv:771546999 lastack:3883 pacing_rate 2744.3Mbps retrans:0/1
> >>> rcv_space:43690
> >>> --
> >>> ESTAB      2226   0      127.0.0.1:x11
> >>> 127.0.0.1:59792
> >>> skmem:(r4608,rb1061488,t0,tb2626560,f3584,w0,o0,bl0,d1) cubic
> >>> wscale:7,7 rto:200 ato:40 mss:21888 rcvmss:536 advmss:65483 cwnd:10
> >>> bytes_received:2226 segs_out:17 segs_in:24 lastsnd:3893 lastrcv:3893
> >>> lastack:3883 rcv_space:43690
> >>>
> >>>
> >>> When sending larger messages, the EAGAIN only is seen after 2116 writes.
> >>> # ./tcp_client_server 108
> >>> using data size 108
> >>> client send buffer size 5000
> >>> client socket snd_buf: 10000
> >>> ERRNO = 11 count=2116
> >>>
> >>> Again, the ss shows all data being present on the server side (108 *
> >>> 2116 = 228528)
> >>> ESTAB      228528 0      127.0.0.1:x11
> >>> 127.0.0.1:59830
> >>> skmem:(r976896,rb1061488,t0,tb2626560,f2048,w0,o0,bl0,d1) cubic
> >>> wscale:7,7 rto:200 ato:80 mss:21888 rcvmss:536 advmss:65483 cwnd:10
> >>> bytes_received:228528 segs_out:436 segs_in:2119 lastsnd:3615
> >>> lastrcv:3606 lastack:3596 rcv_rtt:1 rcv_space:43690
> >>> --
> >>> ESTAB      0      0      127.0.0.1:59830                127.0.0.1:x11
> >>> skmem:(r0,rb1061296,t0,tb10000,f0,w0,o0,bl0,d0) cubic wscale:7,7
> >>> rto:206 rtt:5.016/9.996 mss:21888 rcvmss:536 advmss:65483 cwnd:10
> >>> bytes_acked:228529 segs_out:2119 segs_in:437 send 349.1Mbps
> >>> lastsnd:3596 lastrcv:771704718 lastack:3566 pacing_rate 698.1Mbps
> >>> retrans:0/1 rcv_space:43690
> >>>
> >>>
> >>>
> >>> When I enlarge the SNDBUF on the client side to for example 10K, I see
> >>> that more messages can be sent:
> >>> # ./tcp_client_server 106 10000
> >>> using data size 106
> >>> client send buffer size 10000
> >>> client socket snd_buf: 20000
> >>> ERRNO = 11 count=1291
> >>>
> >>> I also captured the packets on the interface using wireshark and it
> >>> looks like the error is returned after the TCP layer has done a
> >>> retransmit (after 10ms) of the last packet sent on the connection.
> >>>
> >>> 10:12:38.186451 IP localhost.48470 > localhost.etlservicemgr: Flags
> >>> [P.], seq 1165:1265, ack 165, win 342, options [nop,nop,TS val
> >>> 593860562 ecr 593860562], length 100
> >>> 10:12:38.186461 IP localhost.etlservicemgr > localhost.48470: Flags
> >>> [.], ack 1265, win 342, options [nop,nop,TS val 593860562 ecr
> >>> 593860562], length 0
> >>> 10:12:38.186478 IP localhost.48470 > localhost.etlservicemgr: Flags
> >>> [P.], seq 1265:1365, ack 165, win 342, options [nop,nop,TS val
> >>> 593860562 ecr 593860562], length 100
> >>> 10:12:38.186488 IP localhost.etlservicemgr > localhost.48470: Flags
> >>> [.], ack 1365, win 342, options [nop,nop,TS val 593860562 ecr
> >>> 593860562], length 0
> >>> 10:12:38.186505 IP localhost.48470 > localhost.etlservicemgr: Flags
> >>> [P.], seq 1365:1465, ack 165, win 342, options [nop,nop,TS val
> >>> 593860562 ecr 593860562], length 100
> >>> 10:12:38.186516 IP localhost.etlservicemgr > localhost.48470: Flags
> >>> [.], ack 1465, win 342, options [nop,nop,TS val 593860562 ecr
> >>> 593860562], length 0
> >>> 10:12:38.186533 IP localhost.48470 > localhost.etlservicemgr: Flags
> >>> [P.], seq 1465:1565, ack 165, win 342, options [nop,nop,TS val
> >>> 593860562 ecr 593860562], length 100
> >>> 10:12:38.186543 IP localhost.etlservicemgr > localhost.48470: Flags
> >>> [.], ack 1565, win 342, options [nop,nop,TS val 593860562 ecr
> >>> 593860562], length 0
> >>> 10:12:38.186560 IP localhost.48470 > localhost.etlservicemgr: Flags
> >>> [P.], seq 1565:1665, ack 165, win 342, options [nop,nop,TS val
> >>> 593860562 ecr 593860562], length 100
> >>> 10:12:38.186578 IP localhost.48470 > localhost.etlservicemgr: Flags
> >>> [P.], seq 1665:1765, ack 165, win 342, options [nop,nop,TS val
> >>> 593860562 ecr 593860562], length 100
> >>> 10:12:38.186595 IP localhost.48470 > localhost.etlservicemgr: Flags
> >>> [P.], seq 1765:1865, ack 165, win 342, options [nop,nop,TS val
> >>> 593860562 ecr 593860562], length 100
> >>> 10:12:38.186615 IP localhost.48470 > localhost.etlservicemgr: Flags
> >>> [P.], seq 1865:1965, ack 165, win 342, options [nop,nop,TS val
> >>> 593860562 ecr 593860562], length 100
> >>> 10:12:38.186632 IP localhost.48470 > localhost.etlservicemgr: Flags
> >>> [P.], seq 1965:2065, ack 165, win 342, options [nop,nop,TS val
> >>> 593860562 ecr 593860562], length 100
> >>> 10:12:38.196064 IP localhost.48470 > localhost.etlservicemgr: Flags
> >>> [P.], seq 1965:2065, ack 165, win 342, options [nop,nop,TS val
> >>> 593860572 ecr 593860562], length 100
> >>> 10:12:38.196128 IP localhost.etlservicemgr > localhost.48470: Flags
> >>> [.], ack 2065, win 342, options [nop,nop,TS val 593860572 ecr
> >>> 593860562,nop,nop,sack 1 {1965:2065}], length 0
> >>>
> >>> Now my question is: how is it possible that I can only send 21
> >>> messages of 106 bytes before I see the EAGAIN while when sending
> >>> larger messages I can send 2K+ messages.
> >>
> >> This is because kernel tracks kernel memory usage.
> >>
> >> Small packets incur more overhead.
> >>
> >> The doubling of SO_SNDBUF user value, is an heuristic based on the fact
> >> the kernel
> >>
> >> uses a 2x overhead estimation,
> >>
> >> while effective overhead can vary from ~1.001x to ~768x depending on
> >> number of bytes per skb.
> >>
> > Hi Eric,
> >
> > thanks for the feedback but it is not completely clear to me.
> >
> > The small SNDBUF is on the client side and a large part of the data
> > has already ACKed by the receiving side.
> > Only the last 5 or 6 messages are waiting to be ACKed.
> > So I would expect that the biggest part of the overhead is located on
> > the receiving side where there is sufficient memory space (default
> > RCVBUF size is used).
> >
> > If the 5 queued packets on the sending side would cause the EAGAIN
> > issue, the real question maybe is why the receiving side is not
> > sending the ACK within the 10ms while for earlier messages the ACK is
> > sent much sooner.
>
>
> delayed acks on receiver, when receiving small packet(s)
>

Great feedback.

>
> >
> > Do you have any clue which kernel function does generate this EAGAIN?
> > And what would be the correct solution to this problem?
>
>
> I do not really see what is the problem you have.

The actual problem I have is that the application behavior is
completely different depending on the message size.
I created the test application to be able to investigate something we
see in the real-application and that is that libevent either queues a
lot or does not queue anything.
Libevent starts to queue data as soon as it sees the EAGAIN error.

>
> Perhaps you should not set SO_SNDBUF, and let the kernel decide (auto
> tune, and auto cork)
>

I think this might indeed be a solution. Just remove the SO_SNDBUF, or
at least use a much larger value.

> >
> > Best regards,
> > Ronny
> >

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-06-10 18:21 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-10 12:48 TCP socket send return EAGAIN unexpectedly when sending small fragments Ronny Meeus
2022-06-10 14:21 ` Eric Dumazet
2022-06-10 15:16   ` Ronny Meeus
2022-06-10 15:21     ` David Laight
2022-06-10 17:16       ` Ronny Meeus
2022-06-10 17:42         ` Willy Tarreau
2022-06-10 18:14           ` Ronny Meeus
2022-06-10 17:46     ` Eric Dumazet
2022-06-10 18:21       ` Ronny Meeus

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.