* socket leaks observed in Linux kernel's passive close path
@ 2022-10-13 6:47 Arankal, Nagaraj
2022-10-13 14:20 ` Andrew Lunn
0 siblings, 1 reply; 4+ messages in thread
From: Arankal, Nagaraj @ 2022-10-13 6:47 UTC (permalink / raw)
To: netdev
Description:
We have observed a strange race condition , where sockets are not freed in kernel in the following condition.
We have a kernel module , which monitors the TCP connection state changes , as part of the functionality it replaces the default sk_destruct function of all TCP sockets with our module specific routine. Looks like sk_destruct() is not invoked in following condition and hence the sockets are leaked despite receiving RESET from the remote.
1. Establish a TCP connection between Host A and Host B.
2. Make the client at B to initiate the CLOSE() immediately after 3-way handshake.
3. Server end sends huge amount of data to client and does close on FD.
4. FIN from the client is not ACKED, and server is busy sending the data.
5. RESET is received from the remote client.
6. Sk_destruct() is not invoked due to non-null sk_refcnt or sk_wmem_alloc count.
Kernel version: Debian Linux 4.19.y(238,247)
Please find below tcpdump
No. Source Destination Protocol Info
97 10.10.10.41 10.10.10.21 TCP [TCP Port numbers reused] 33968 → 6570 [SYN] Seq=74596442 Win=43800 Len=0 MSS=1460 SACK_PERM=1 TSval=466120930 TSecr=0 WS=32
98 10.10.10.21 10.10.10.41 TCP 6570 → 33968 [SYN, ACK] Seq=2529360114 Ack=74596443 Win=65535 Len=0 MSS=1460 SACK_PERM=1 TSval=2085271968 TSecr=466120930 WS=32
99 10.10.10.41 10.10.10.21 TCP 33968 → 6570 [ACK] Seq=74596443 Ack=2529360115 Win=43808 Len=0 TSval=466120930 TSecr=2085271968
100 10.10.10.41 10.10.10.21 TCP 33968 → 6570 [FIN, ACK] Seq=74596443 Ack=2529360115 Win=43808 Len=0 TSval=466120930 TSecr=2085271968
101 10.10.10.21 10.10.10.41 TCP 6570 → 33968 [ACK] Seq=2529360115 Ack=74596443 Win=65536 Len=1448 TSval=2085271969 TSecr=466120930
102 10.10.10.21 10.10.10.41 TCP 6570 → 33968 [ACK] Seq=2529361563 Ack=74596443 Win=65536 Len=1448 TSval=2085271969 TSecr=466120930
103 10.10.10.21 10.10.10.41 TCP 6570 → 33968 [ACK] Seq=2529363011 Ack=74596443 Win=65536 Len=1448 TSval=2085271969 TSecr=466120930
104 10.10.10.21 10.10.10.41 TCP 6570 → 33968 [ACK] Seq=2529364459 Ack=74596443 Win=65536 Len=1448 TSval=2085271969 TSecr=466120930
105 10.10.10.21 10.10.10.41 TCP 6570 → 33968 [ACK] Seq=2529365907 Ack=74596443 Win=65536 Len=1448 TSval=2085271969 TSecr=466120930
106 10.10.10.21 10.10.10.41 TCP 6570 → 33968 [ACK] Seq=2529367355 Ack=74596443 Win=65536 Len=1448 TSval=2085271969 TSecr=466120930
107 10.10.10.21 10.10.10.41 TCP 6570 → 33968 [ACK] Seq=2529368803 Ack=74596443 Win=65536 Len=1448 TSval=2085271969 TSecr=466120930
108 10.10.10.21 10.10.10.41 TCP 6570 → 33968 [ACK] Seq=2529370251 Ack=74596443 Win=65536 Len=1448 TSval=2085271969 TSecr=466120930
109 10.10.10.21 10.10.10.41 TCP 6570 → 33968 [ACK] Seq=2529371699 Ack=74596443 Win=65536 Len=1448 TSval=2085271969 TSecr=466120930
110 10.10.10.21 10.10.10.41 TCP 6570 → 33968 [ACK] Seq=2529373147 Ack=74596443 Win=65536 Len=1448 TSval=2085271969 TSecr=466120930
111 10.10.10.41 10.10.10.21 TCP 33968 → 6570 [RST] Seq=74596443 Win=0 Len=0
112 10.10.10.41 10.10.10.21 TCP 33968 → 6570 [RST] Seq=74596443 Win=0 Len=0
113 10.10.10.41 10.10.10.21 TCP 33968 → 6570 [RST] Seq=74596443 Win=0 Len=0
114 10.10.10.41 10.10.10.21 TCP 33968 → 6570 [RST] Seq=74596443 Win=0 Len=0
Bisecting the state of one of the leaked socket.
crash> p *(struct sock *) 0xffff926f465aa200| grep state
skc_state = 7 '\a', << TCP_CLOSE
..
skc_refcnt = {
refs = {
counter = 1
....
sk_wmem_alloc = {
refs = {
counter = 3
sk_err = 104,
sk_destruct = 0xffffffffc06d6240 <socket_destruct_func>,
}
tcp_header_len = 32,
gso_segs = 15,
pred_flags = 1493504128,
bytes_received = 1,
segs_in = 4,
data_segs_in = 0,
rcv_nxt = 74596444,
copied_seq = 74596443,
rcv_wup = 74596444,
snd_nxt = 2529374595,
segs_out = 11,
data_segs_out = 10,
bytes_sent = 14480,
bytes_acked = 0,
dsack_dups = 0,
snd_una = 2529360115,
snd_sml = 2529360115,
rcv_tstamp = 521240444,
lsndtime = 521240445,
Regards,
Nagaraj P Arankal
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: socket leaks observed in Linux kernel's passive close path
2022-10-13 6:47 socket leaks observed in Linux kernel's passive close path Arankal, Nagaraj
@ 2022-10-13 14:20 ` Andrew Lunn
2022-10-13 14:44 ` Arankal, Nagaraj
0 siblings, 1 reply; 4+ messages in thread
From: Andrew Lunn @ 2022-10-13 14:20 UTC (permalink / raw)
To: Arankal, Nagaraj; +Cc: netdev
On Thu, Oct 13, 2022 at 06:47:56AM +0000, Arankal, Nagaraj wrote:
> Description:
> We have observed a strange race condition , where sockets are not freed in kernel in the following condition.
> We have a kernel module , which monitors the TCP connection state changes , as part of the functionality it replaces the default sk_destruct function of all TCP sockets with our module specific routine. Looks like sk_destruct() is not invoked in following condition and hence the sockets are leaked despite receiving RESET from the remote.
>
> 1. Establish a TCP connection between Host A and Host B.
> 2. Make the client at B to initiate the CLOSE() immediately after 3-way handshake.
> 3. Server end sends huge amount of data to client and does close on FD.
> 4. FIN from the client is not ACKED, and server is busy sending the data.
> 5. RESET is received from the remote client.
> 6. Sk_destruct() is not invoked due to non-null sk_refcnt or sk_wmem_alloc count.
>
> Kernel version: Debian Linux 4.19.y(238,247)
Is this reproducible with a modern kernel? v6.0? If this is already
fixed, we need to identify what change fixed it, and get it back
ported. If it is broken in v6.0, and net-next, it first needs fixing
in net-next, and then back porting to the different LTS kernels.
Andrew
^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: socket leaks observed in Linux kernel's passive close path
2022-10-13 14:20 ` Andrew Lunn
@ 2022-10-13 14:44 ` Arankal, Nagaraj
2022-10-13 15:59 ` Andrew Lunn
0 siblings, 1 reply; 4+ messages in thread
From: Arankal, Nagaraj @ 2022-10-13 14:44 UTC (permalink / raw)
To: Andrew Lunn; +Cc: netdev
Hi Andrew,
Thanks for looking into this, I have not tested this on V6.0 kernel, and as far as I know I have not observed any fixes in this area, that's why I posted this, as this seems to be a valid case.
Thanks,
Nagaraj P Arankal
-----Original Message-----
From: Andrew Lunn <andrew@lunn.ch>
Sent: Thursday, October 13, 2022 7:50 PM
To: Arankal, Nagaraj <nagaraj.p.arankal@hpe.com>
Cc: netdev@vger.kernel.org
Subject: Re: socket leaks observed in Linux kernel's passive close path
On Thu, Oct 13, 2022 at 06:47:56AM +0000, Arankal, Nagaraj wrote:
> Description:
> We have observed a strange race condition , where sockets are not freed in kernel in the following condition.
> We have a kernel module , which monitors the TCP connection state changes , as part of the functionality it replaces the default sk_destruct function of all TCP sockets with our module specific routine. Looks like sk_destruct() is not invoked in following condition and hence the sockets are leaked despite receiving RESET from the remote.
>
> 1. Establish a TCP connection between Host A and Host B.
> 2. Make the client at B to initiate the CLOSE() immediately after 3-way handshake.
> 3. Server end sends huge amount of data to client and does close on FD.
> 4. FIN from the client is not ACKED, and server is busy sending the data.
> 5. RESET is received from the remote client.
> 6. Sk_destruct() is not invoked due to non-null sk_refcnt or sk_wmem_alloc count.
>
> Kernel version: Debian Linux 4.19.y(238,247)
Is this reproducible with a modern kernel? v6.0? If this is already fixed, we need to identify what change fixed it, and get it back ported. If it is broken in v6.0, and net-next, it first needs fixing in net-next, and then back porting to the different LTS kernels.
Andrew
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: socket leaks observed in Linux kernel's passive close path
2022-10-13 14:44 ` Arankal, Nagaraj
@ 2022-10-13 15:59 ` Andrew Lunn
0 siblings, 0 replies; 4+ messages in thread
From: Andrew Lunn @ 2022-10-13 15:59 UTC (permalink / raw)
To: Arankal, Nagaraj; +Cc: netdev
On Thu, Oct 13, 2022 at 02:44:02PM +0000, Arankal, Nagaraj wrote:
> Hi Andrew,
> Thanks for looking into this, I have not tested this on V6.0 kernel, and as far as I know I have not observed any fixes in this area, that's why I posted this, as this seems to be a valid case.
Please don't top post. And set your mailer to wrap lines at around 78
characters.
Please post your test results for v6.0. Just because you have not seen
any fixes in the last 4 years does not mean it has not been fixed.
Andrew
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2022-10-13 15:59 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-13 6:47 socket leaks observed in Linux kernel's passive close path Arankal, Nagaraj
2022-10-13 14:20 ` Andrew Lunn
2022-10-13 14:44 ` Arankal, Nagaraj
2022-10-13 15:59 ` Andrew Lunn
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).