* TCP stack gets into state of continually advertising “silly window” size of 1
[not found] <BY3PR05MB8002750FAB3DC34F3B18AD9AD0E79@BY3PR05MB8002.namprd05.prod.outlook.com>
@ 2022-04-06 14:19 ` Erin MacNeil
2022-04-06 17:40 ` Eric Dumazet
0 siblings, 1 reply; 6+ messages in thread
From: Erin MacNeil @ 2022-04-06 14:19 UTC (permalink / raw)
To: netdev
This issue has been observed with the 4.8.28 kernel, I am wondering if it may be a known issue with an available fix?
Description:
Device A hosting IP address <Device A i/f addr> is running Linux version: 4.8.28, and device B hosting IP address <Device B i/f addr> is non-Linux based.
Both devices are configured with an interface MTU of 9114 bytes.
The TCP connection gets established via frames 1418-1419, where a window size + MSS of 9060 is agreed upon; SACK is disabled as device B does not support it + window scaling is not in play.
No. Time Source Destination Protocol Length Info
*1418 2022-03-15 06:52:49.693168 <Device A i/f addr> <Device B i/f addr> TCP 122 57486 -> 179 [SYN] Seq=0 Win=9060 Len=0 MSS=9060 SACK_PERM=1 TSval=3368771415 TSecr=0 WS=1
*1419 2022-03-15 06:52:49.709325 <Device B i/f addr> <Device A i/f addr> TCP 132 179 -> 57486 [SYN, ACK] Seq=0 Ack=1 Win=16384 Len=0 MSS=9060 WS=1
...
4661 2022-03-15 06:53:52.437668 <Device B i/f addr> <Device A i/f addr> BGP 9184
4662 2022-03-15 06:53:52.437747 <Device A i/f addr> <Device B i/f addr> TCP 102 57486 -> 179 [ACK] Seq=3177223 Ack=9658065 Win=9060 Len=0
4663 2022-03-15 06:53:52.454599 <Device B i/f addr> <Device A i/f addr> BGP 9184
4664 2022-03-15 06:53:52.454661 <Device A i/f addr> <Device B i/f addr> TCP 102 57486 -> 179 [ACK] Seq=3177223 Ack=9667125 Win=9060 Len=0
4665 2022-03-15 06:53:52.471377 <Device B i/f addr> <Device A i/f addr> BGP 9184
4666 2022-03-15 06:53:52.512396 <Device A i/f addr> <Device B i/f addr> TCP 102 57486 -> 179 [ACK] Seq=3177223 Ack=9676185 Win=0 Len=0
4667 2022-03-15 06:53:52.828918 <Device A i/f addr> <Device B i/f addr> TCP 102 57486 -> 179 [ACK] Seq=3177223 Ack=9676185 Win=9060 Len=0
4668 2022-03-15 06:53:52.829001 <Device B i/f addr> <Device A i/f addr> BGP 125
4669 2022-03-15 06:53:52.829032 <Device A i/f addr> <Device B i/f addr> TCP 102 57486 -> 179 [ACK] Seq=3177223 Ack=9676186 Win=9060 Len=0
4670 2022-03-15 06:53:52.845494 <Device B i/f addr> <Device A i/f addr> BGP 9184
*4671 2022-03-15 06:53:52.845532 <Device A i/f addr> <Device B i/f addr> TCP 102 57486 -> 179 [ACK] Seq=3177223 Ack=9685245 Win=1 Len=0
4672 2022-03-15 06:53:52.861968 <Device B i/f addr> <Device A i/f addr> TCP 125 179 -> 57486 [ACK] Seq=9685245 Ack=3177223 Win=27803 Len=1
...
At frame 4671, some 63 seconds after the connection has been established, device A advertises a window size of 1, and the connection never recovers from this; a window size of 1 is continually advertised. The issue seems to be triggered by device B sending a TCP window probe conveying a single byte of data (the next byte in its send window) in frame 4668; when this is ACKed by device A, device A also re-advertises its receive window as 9060. The next packet from device B, frame 4670, conveys 9060 bytes of data, the first byte of which is the same byte that it sent in frame 4668 which device A has already ACKed, but which device B may not yet have seen.
On device A, the TCP socket was configured with setsockopt() SO_RCVBUF & SO_SNDBUF values of 16k.
Here is the sequence detail:
|2022-03-15 06:53:52.437668| ACK - Len: 9060 |Seq = 4236355144 Ack = 502383504 | |(57486) <------------------ (179) |
|2022-03-15 06:53:52.437747| ACK | |Seq = 502383551 Ack = 4236364204 | |(57486) ------------------> (179) |
|2022-03-15 06:53:52.454599| ACK - Len: 9060 |Seq = 4236364204 Ack = 502383551 | |(57486) <------------------ (179) |
|2022-03-15 06:53:52.454661| ACK | |Seq = 502383551 Ack = 4236373264 | |(57486) ------------------> (179) |
|2022-03-15 06:53:52.471377| ACK - Len: 9060 |Seq = 4236373264 Ack = 502383551 | |(57486) <------------------ (179) |
|2022-03-15 06:53:52.512396| ACK | |Seq = 502383551 Ack = 4236382324 | |(57486) ------------------> (179) |
|2022-03-15 06:53:52.828918| ACK | |Seq = 502383551 Ack = 4236382324 | |(57486) ------------------> (179) |
|2022-03-15 06:53:52.829001| ACK - Len: 1 |Seq = 4236382324 Ack = 502383551 | |(57486) <------------------ (179) |
|2022-03-15 06:53:52.829032| ACK | |Seq = 502383551 Ack = 4236382325 | |(57486) ------------------> (179) |
|2022-03-15 06:53:52.845494| ACK - Len: 9060 |Seq = 4236382324 Ack = 502383551 | |(57486) <------------------ (179) |
|2022-03-15 06:53:52.845532| ACK | |Seq = 502383551 Ack = 4236391384 | |(57486) ------------------> (179) |
|2022-03-15 06:53:52.861968| ACK - Len: 1 |Seq = 4236391384 Ack = 502383551 | |(57486) <------------------ (179) |
|2022-03-15 06:53:52.862022| ACK | |Seq = 502383551 Ack = 4236391385 | |(57486) ------------------> (179) |
|2022-03-15 06:53:52.878445| ACK - Len: 1 |Seq = 4236391385 Ack = 502383551 | |(57486) <------------------ (179) |
|2022-03-15 06:53:52.878529| ACK | |Seq = 502383551 Ack = 4236391386 | |(57486) ------------------> (179) |
|2022-03-15 06:53:52.895212| ACK - Len: 1 |Seq = 4236391386 Ack = 502383551 | |(57486) <------------------ (179) |
There is no data in the recv-q or send-q at this point, yet the window stays at size 1:
$ ss -o state established -ntepi '( dport = 179 or sport = 179 )' dst <Device B i/f addr>
Recv-Q Send-Q Local Address:Port Peer Address:Port
0 0 <Device A i/f addr>:57486 <Device B i/f addr>:179 ino:1170981660 sk:d9d <->
Thanks
-Erin
--
Juniper Business Use Only
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: TCP stack gets into state of continually advertising “silly window” size of 1
2022-04-06 14:19 ` TCP stack gets into state of continually advertising “silly window” size of 1 Erin MacNeil
@ 2022-04-06 17:40 ` Eric Dumazet
0 siblings, 0 replies; 6+ messages in thread
From: Eric Dumazet @ 2022-04-06 17:40 UTC (permalink / raw)
To: Erin MacNeil, netdev
On 4/6/22 07:19, Erin MacNeil wrote:
> This issue has been observed with the 4.8.28 kernel, I am wondering if it may be a known issue with an available fix?
>
> Description:
> Device A hosting IP address <Device A i/f addr> is running Linux version: 4.8.28, and device B hosting IP address <Device B i/f addr> is non-Linux based.
> Both devices are configured with an interface MTU of 9114 bytes.
>
> The TCP connection gets established via frames 1418-1419, where a window size + MSS of 9060 is agreed upon; SACK is disabled as device B does not support it + window scaling is not in play.
>
> No. Time Source Destination Protocol Length Info
> *1418 2022-03-15 06:52:49.693168 <Device A i/f addr> <Device B i/f addr> TCP 122 57486 -> 179 [SYN] Seq=0 Win=9060 Len=0 MSS=9060 SACK_PERM=1 TSval=3368771415 TSecr=0 WS=1
> *1419 2022-03-15 06:52:49.709325 <Device B i/f addr> <Device A i/f addr> TCP 132 179 -> 57486 [SYN, ACK] Seq=0 Ack=1 Win=16384 Len=0 MSS=9060 WS=1
> ...
> 4661 2022-03-15 06:53:52.437668 <Device B i/f addr> <Device A i/f addr> BGP 9184
> 4662 2022-03-15 06:53:52.437747 <Device A i/f addr> <Device B i/f addr> TCP 102 57486 -> 179 [ACK] Seq=3177223 Ack=9658065 Win=9060 Len=0
> 4663 2022-03-15 06:53:52.454599 <Device B i/f addr> <Device A i/f addr> BGP 9184
> 4664 2022-03-15 06:53:52.454661 <Device A i/f addr> <Device B i/f addr> TCP 102 57486 -> 179 [ACK] Seq=3177223 Ack=9667125 Win=9060 Len=0
> 4665 2022-03-15 06:53:52.471377 <Device B i/f addr> <Device A i/f addr> BGP 9184
> 4666 2022-03-15 06:53:52.512396 <Device A i/f addr> <Device B i/f addr> TCP 102 57486 -> 179 [ACK] Seq=3177223 Ack=9676185 Win=0 Len=0
> 4667 2022-03-15 06:53:52.828918 <Device A i/f addr> <Device B i/f addr> TCP 102 57486 -> 179 [ACK] Seq=3177223 Ack=9676185 Win=9060 Len=0
> 4668 2022-03-15 06:53:52.829001 <Device B i/f addr> <Device A i/f addr> BGP 125
> 4669 2022-03-15 06:53:52.829032 <Device A i/f addr> <Device B i/f addr> TCP 102 57486 -> 179 [ACK] Seq=3177223 Ack=9676186 Win=9060 Len=0
> 4670 2022-03-15 06:53:52.845494 <Device B i/f addr> <Device A i/f addr> BGP 9184
> *4671 2022-03-15 06:53:52.845532 <Device A i/f addr> <Device B i/f addr> TCP 102 57486 -> 179 [ACK] Seq=3177223 Ack=9685245 Win=1 Len=0
> 4672 2022-03-15 06:53:52.861968 <Device B i/f addr> <Device A i/f addr> TCP 125 179 -> 57486 [ACK] Seq=9685245 Ack=3177223 Win=27803 Len=1
> ...
> At frame 4671, some 63 seconds after the connection has been established, device A advertises a window size of 1, and the connection never recovers from this; a window size of 1 is continually advertised. The issue seems to be triggered by device B sending a TCP window probe conveying a single byte of data (the next byte in its send window) in frame 4668; when this is ACKed by device A, device A also re-advertises its receive window as 9060. The next packet from device B, frame 4670, conveys 9060 bytes of data, the first byte of which is the same byte that it sent in frame 4668 which device A has already ACKed, but which device B may not yet have seen.
>
> On device A, the TCP socket was configured with setsockopt() SO_RCVBUF & SO_SNDBUF values of 16k.
Presumably 16k buffers while MTU is 9000 is not correct.
Kernel has some logic to ensure a minimal value, based on standard MTU
sizes.
Have you tried not using setsockopt() SO_RCVBUF & SO_SNDBUF ?
>
> Here is the sequence detail:
>
> |2022-03-15 06:53:52.437668| ACK - Len: 9060 |Seq = 4236355144 Ack = 502383504 | |(57486) <------------------ (179) |
> |2022-03-15 06:53:52.437747| ACK | |Seq = 502383551 Ack = 4236364204 | |(57486) ------------------> (179) |
> |2022-03-15 06:53:52.454599| ACK - Len: 9060 |Seq = 4236364204 Ack = 502383551 | |(57486) <------------------ (179) |
> |2022-03-15 06:53:52.454661| ACK | |Seq = 502383551 Ack = 4236373264 | |(57486) ------------------> (179) |
> |2022-03-15 06:53:52.471377| ACK - Len: 9060 |Seq = 4236373264 Ack = 502383551 | |(57486) <------------------ (179) |
> |2022-03-15 06:53:52.512396| ACK | |Seq = 502383551 Ack = 4236382324 | |(57486) ------------------> (179) |
> |2022-03-15 06:53:52.828918| ACK | |Seq = 502383551 Ack = 4236382324 | |(57486) ------------------> (179) |
> |2022-03-15 06:53:52.829001| ACK - Len: 1 |Seq = 4236382324 Ack = 502383551 | |(57486) <------------------ (179) |
> |2022-03-15 06:53:52.829032| ACK | |Seq = 502383551 Ack = 4236382325 | |(57486) ------------------> (179) |
> |2022-03-15 06:53:52.845494| ACK - Len: 9060 |Seq = 4236382324 Ack = 502383551 | |(57486) <------------------ (179) |
> |2022-03-15 06:53:52.845532| ACK | |Seq = 502383551 Ack = 4236391384 | |(57486) ------------------> (179) |
> |2022-03-15 06:53:52.861968| ACK - Len: 1 |Seq = 4236391384 Ack = 502383551 | |(57486) <------------------ (179) |
> |2022-03-15 06:53:52.862022| ACK | |Seq = 502383551 Ack = 4236391385 | |(57486) ------------------> (179) |
> |2022-03-15 06:53:52.878445| ACK - Len: 1 |Seq = 4236391385 Ack = 502383551 | |(57486) <------------------ (179) |
> |2022-03-15 06:53:52.878529| ACK | |Seq = 502383551 Ack = 4236391386 | |(57486) ------------------> (179) |
> |2022-03-15 06:53:52.895212| ACK - Len: 1 |Seq = 4236391386 Ack = 502383551 | |(57486) <------------------ (179) |
>
>
> There is no data in the recv-q or send-q at this point, yet the window stays at size 1:
>
> $ ss -o state established -ntepi '( dport = 179 or sport = 179 )' dst <Device B i/f addr>
> Recv-Q Send-Q Local Address:Port Peer Address:Port
> 0 0 <Device A i/f addr>:57486 <Device B i/f addr>:179 ino:1170981660 sk:d9d <->
>
>
> Thanks
> -Erin
>
> --
>
> Juniper Business Use Only
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: TCP stack gets into state of continually advertising “silly window” size of 1
2022-04-08 1:10 ` Erin MacNeil
@ 2022-04-08 22:44 ` Eric Dumazet
0 siblings, 0 replies; 6+ messages in thread
From: Eric Dumazet @ 2022-04-08 22:44 UTC (permalink / raw)
To: Erin MacNeil, netdev
On 4/7/22 18:10, Erin MacNeil wrote:
>
>
> On 2022-04-07 4:31 p.m., Eric Dumazet wrote:
>> [External Email. Be cautious of content]
>>
>>
>> On 4/7/22 10:57, Erin MacNeil wrote:
>>> In-Reply-To:
>>> <BY3PR05MB80023CD8700DA1B1F203A975D0E79@BY3PR05MB8002.namprd05.prod.outlook.com>
>>>
>>>
>>>> On 4/6/22 10:40, Eric Dumazet wrote:
>>>>> On 4/6/22 07:19, Erin MacNeil wrote:
>>>>> This issue has been observed with the 4.8.28 kernel, I am
>>>>> wondering if it may be a known issue with an available fix?
>>>>>
> ...
>>>
>>>> Presumably 16k buffers while MTU is 9000 is not correct.
>>>>
>>>> Kernel has some logic to ensure a minimal value, based on standard MTU
>>>> sizes.
>>>>
>>>>
>>>> Have you tried not using setsockopt() SO_RCVBUF & SO_SNDBUF ?
>>> Yes, a temporary workaround for the issue is to increase the value
>>> of SO_SNDBUF which reduces the likelihood of device A’s receive
>>> window dropping to 0, and hence device B sending problematic TCP
>>> window probes.
>>>
>>
>> Not sure how 'temporary' it is.
>>
>> For ABI reason, and the fact that setsockopt() can be performed
>> _before_ the connect() or accept() is done, thus before knowing MTU
>> size, we can not after the MTU is known increase buffers, as it might
>>
>> break some applications expecting getsockopt() to return a stable value
>> (if a prior setsockopt() has set a value)
>>
>> If we chose to increase minimal limits, I think some users might
>> complain.
>>
>
> Is this not a TCP bug though? The stream was initially working "ok"
> until the window closed. There is no data the in the socket queue
> should the window not re-open to where it had been.
We do not want to deal with user forcing TCP stack into a stupid
ping-pong mode, one packet at a time.
If you have a patch that is reasonable, please let us know, but I bet
this will break some applications.
Adding code in linux TCP fast path, testing for conditions that will
never trigger in 99.9999999% of the time makes little sense.
MTU=9000 is 6 times bigger than MTU=1500, make sure you have increased
SO_XXX values by 6x.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: TCP stack gets into state of continually advertising “silly window” size of 1
2022-04-07 20:31 ` Eric Dumazet
@ 2022-04-08 1:10 ` Erin MacNeil
2022-04-08 22:44 ` Eric Dumazet
0 siblings, 1 reply; 6+ messages in thread
From: Erin MacNeil @ 2022-04-08 1:10 UTC (permalink / raw)
To: Eric Dumazet, netdev
On 2022-04-07 4:31 p.m., Eric Dumazet wrote:
> [External Email. Be cautious of content]
>
>
> On 4/7/22 10:57, Erin MacNeil wrote:
>> In-Reply-To:
>> <BY3PR05MB80023CD8700DA1B1F203A975D0E79@BY3PR05MB8002.namprd05.prod.outlook.com>
>>
>>
>>> On 4/6/22 10:40, Eric Dumazet wrote:
>>>> On 4/6/22 07:19, Erin MacNeil wrote:
>>>> This issue has been observed with the 4.8.28 kernel, I am wondering
>>>> if it may be a known issue with an available fix?
>>>>
...
>>
>>> Presumably 16k buffers while MTU is 9000 is not correct.
>>>
>>> Kernel has some logic to ensure a minimal value, based on standard MTU
>>> sizes.
>>>
>>>
>>> Have you tried not using setsockopt() SO_RCVBUF & SO_SNDBUF ?
>> Yes, a temporary workaround for the issue is to increase the value of
>> SO_SNDBUF which reduces the likelihood of device A’s receive window
>> dropping to 0, and hence device B sending problematic TCP window probes.
>>
>
> Not sure how 'temporary' it is.
>
> For ABI reason, and the fact that setsockopt() can be performed
> _before_ the connect() or accept() is done, thus before knowing MTU
> size, we can not after the MTU is known increase buffers, as it might
>
> break some applications expecting getsockopt() to return a stable value
> (if a prior setsockopt() has set a value)
>
> If we chose to increase minimal limits, I think some users might complain.
>
Is this not a TCP bug though? The stream was initially working "ok"
until the window closed. There is no data the in the socket queue
should the window not re-open to where it had been.
Thanks
-Erin
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: TCP stack gets into state of continually advertising “silly window” size of 1
2022-04-07 17:57 Erin MacNeil
@ 2022-04-07 20:31 ` Eric Dumazet
2022-04-08 1:10 ` Erin MacNeil
0 siblings, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2022-04-07 20:31 UTC (permalink / raw)
To: Erin MacNeil, netdev
On 4/7/22 10:57, Erin MacNeil wrote:
> In-Reply-To: <BY3PR05MB80023CD8700DA1B1F203A975D0E79@BY3PR05MB8002.namprd05.prod.outlook.com>
>
>> On 4/6/22 10:40, Eric Dumazet wrote:
>>> On 4/6/22 07:19, Erin MacNeil wrote:
>>> This issue has been observed with the 4.8.28 kernel, I am wondering if it may be a known issue with an available fix?
>>>
> ...
>
>>> At frame 4671, some 63 seconds after the connection has been established, device A advertises a window size of 1, and the connection never recovers from this; a window size of 1 is continually advertised. The issue seems to be triggered by device B sending a TCP window probe conveying a single byte of data (the next byte in its send window) in frame 4668; when this is ACKed by device A, device A also re-advertises its receive window as 9060. The next packet from device B, frame 4670, conveys 9060 bytes of data, the first byte of which is the same byte that it sent in frame 4668 which device A has already ACKed, but which device B may not yet have seen.
>>>
>>> On device A, the TCP socket was configured with setsockopt() SO_RCVBUF & SO_SNDBUF values of 16k.
> ...
>
>> Presumably 16k buffers while MTU is 9000 is not correct.
>>
>> Kernel has some logic to ensure a minimal value, based on standard MTU
>> sizes.
>>
>>
>> Have you tried not using setsockopt() SO_RCVBUF & SO_SNDBUF ?
> Yes, a temporary workaround for the issue is to increase the value of SO_SNDBUF which reduces the likelihood of device A’s receive window dropping to 0, and hence device B sending problematic TCP window probes.
>
Not sure how 'temporary' it is.
For ABI reason, and the fact that setsockopt() can be performed
_before_ the connect() or accept() is done, thus before knowing MTU
size, we can not after the MTU is known increase buffers, as it might
break some applications expecting getsockopt() to return a stable value
(if a prior setsockopt() has set a value)
If we chose to increase minimal limits, I think some users might complain.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: TCP stack gets into state of continually advertising “silly window” size of 1
@ 2022-04-07 17:57 Erin MacNeil
2022-04-07 20:31 ` Eric Dumazet
0 siblings, 1 reply; 6+ messages in thread
From: Erin MacNeil @ 2022-04-07 17:57 UTC (permalink / raw)
To: eric.dumazet, netdev
In-Reply-To: <BY3PR05MB80023CD8700DA1B1F203A975D0E79@BY3PR05MB8002.namprd05.prod.outlook.com>
>On 4/6/22 10:40, Eric Dumazet wrote:
>>On 4/6/22 07:19, Erin MacNeil wrote:
>> This issue has been observed with the 4.8.28 kernel, I am wondering if it may be a known issue with an available fix?
>>
...
>> At frame 4671, some 63 seconds after the connection has been established, device A advertises a window size of 1, and the connection never recovers from this; a window size of 1 is continually advertised. The issue seems to be triggered by device B sending a TCP window probe conveying a single byte of data (the next byte in its send window) in frame 4668; when this is ACKed by device A, device A also re-advertises its receive window as 9060. The next packet from device B, frame 4670, conveys 9060 bytes of data, the first byte of which is the same byte that it sent in frame 4668 which device A has already ACKed, but which device B may not yet have seen.
>>
>> On device A, the TCP socket was configured with setsockopt() SO_RCVBUF & SO_SNDBUF values of 16k.
...
>Presumably 16k buffers while MTU is 9000 is not correct.
>
>Kernel has some logic to ensure a minimal value, based on standard MTU
>sizes.
>
>
>Have you tried not using setsockopt() SO_RCVBUF & SO_SNDBUF ?
Yes, a temporary workaround for the issue is to increase the value of SO_SNDBUF which reduces the likelihood of device A’s receive window dropping to 0, and hence device B sending problematic TCP window probes.
Juniper Business Use Only
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2022-04-08 22:44 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <BY3PR05MB8002750FAB3DC34F3B18AD9AD0E79@BY3PR05MB8002.namprd05.prod.outlook.com>
2022-04-06 14:19 ` TCP stack gets into state of continually advertising “silly window” size of 1 Erin MacNeil
2022-04-06 17:40 ` Eric Dumazet
2022-04-07 17:57 Erin MacNeil
2022-04-07 20:31 ` Eric Dumazet
2022-04-08 1:10 ` Erin MacNeil
2022-04-08 22:44 ` Eric Dumazet
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.