netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* socket leaks observed in Linux kernel's passive close path
@ 2022-10-13  6:47 Arankal, Nagaraj
  2022-10-13 14:20 ` Andrew Lunn
  0 siblings, 1 reply; 4+ messages in thread
From: Arankal, Nagaraj @ 2022-10-13  6:47 UTC (permalink / raw)
  To: netdev

Description:
We have observed a strange race condition , where sockets are not freed in kernel in the following condition.
We have a kernel module , which monitors the TCP connection state changes , as part of the functionality it replaces the default sk_destruct function of all TCP sockets with our module specific routine.  Looks like sk_destruct() is not invoked in following condition and hence the sockets are leaked despite receiving RESET from the remote.

1.	Establish a TCP connection between Host A and Host B.
2.	Make the client at B to initiate the CLOSE() immediately after 3-way handshake.
3.	Server end sends huge amount of data to client and does close on FD.
4.	FIN from the client is not ACKED, and server is busy sending the data.
5.	RESET is received from the remote client.
6.	Sk_destruct() is not invoked due to non-null sk_refcnt or sk_wmem_alloc count.

Kernel version: Debian Linux 4.19.y(238,247)

Please find below tcpdump 

No.             Source       Destination                  Protocol     Info
97              10.10.10.41                 10.10.10.21                 TCP            [TCP Port numbers reused] 33968 → 6570 [SYN] Seq=74596442 Win=43800 Len=0 MSS=1460 SACK_PERM=1 TSval=466120930 TSecr=0 WS=32
98              10.10.10.21                 10.10.10.41                 TCP            6570 → 33968 [SYN, ACK] Seq=2529360114 Ack=74596443 Win=65535 Len=0 MSS=1460 SACK_PERM=1 TSval=2085271968 TSecr=466120930 WS=32
99              10.10.10.41                 10.10.10.21                 TCP            33968 → 6570 [ACK] Seq=74596443 Ack=2529360115 Win=43808 Len=0 TSval=466120930 TSecr=2085271968
100            10.10.10.41                 10.10.10.21                 TCP            33968 → 6570 [FIN, ACK] Seq=74596443 Ack=2529360115 Win=43808 Len=0 TSval=466120930 TSecr=2085271968
101            10.10.10.21                 10.10.10.41                 TCP            6570 → 33968 [ACK] Seq=2529360115 Ack=74596443 Win=65536 Len=1448 TSval=2085271969 TSecr=466120930
102            10.10.10.21                 10.10.10.41                 TCP            6570 → 33968 [ACK] Seq=2529361563 Ack=74596443 Win=65536 Len=1448 TSval=2085271969 TSecr=466120930
103            10.10.10.21                 10.10.10.41                 TCP            6570 → 33968 [ACK] Seq=2529363011 Ack=74596443 Win=65536 Len=1448 TSval=2085271969 TSecr=466120930
104            10.10.10.21                 10.10.10.41                 TCP            6570 → 33968 [ACK] Seq=2529364459 Ack=74596443 Win=65536 Len=1448 TSval=2085271969 TSecr=466120930
105            10.10.10.21                 10.10.10.41                 TCP            6570 → 33968 [ACK] Seq=2529365907 Ack=74596443 Win=65536 Len=1448 TSval=2085271969 TSecr=466120930
106            10.10.10.21                 10.10.10.41                 TCP            6570 → 33968 [ACK] Seq=2529367355 Ack=74596443 Win=65536 Len=1448 TSval=2085271969 TSecr=466120930
107            10.10.10.21                 10.10.10.41                 TCP            6570 → 33968 [ACK] Seq=2529368803 Ack=74596443 Win=65536 Len=1448 TSval=2085271969 TSecr=466120930
108            10.10.10.21                 10.10.10.41                 TCP            6570 → 33968 [ACK] Seq=2529370251 Ack=74596443 Win=65536 Len=1448 TSval=2085271969 TSecr=466120930
109            10.10.10.21                 10.10.10.41                 TCP            6570 → 33968 [ACK] Seq=2529371699 Ack=74596443 Win=65536 Len=1448 TSval=2085271969 TSecr=466120930
110            10.10.10.21                 10.10.10.41                 TCP            6570 → 33968 [ACK] Seq=2529373147 Ack=74596443 Win=65536 Len=1448 TSval=2085271969 TSecr=466120930
111            10.10.10.41                 10.10.10.21                 TCP            33968 → 6570 [RST] Seq=74596443 Win=0 Len=0
112            10.10.10.41                 10.10.10.21                 TCP            33968 → 6570 [RST] Seq=74596443 Win=0 Len=0
113            10.10.10.41                 10.10.10.21                 TCP            33968 → 6570 [RST] Seq=74596443 Win=0 Len=0
114            10.10.10.41                 10.10.10.21                 TCP            33968 → 6570 [RST] Seq=74596443 Win=0 Len=0


Bisecting the state of one of the leaked socket.

crash> p *(struct sock *) 0xffff926f465aa200| grep state
    skc_state = 7 '\a', << TCP_CLOSE
..
  skc_refcnt = {
      refs = {
        counter = 1
....
  sk_wmem_alloc = {
    refs = {
      counter = 3

sk_err = 104,
sk_destruct = 0xffffffffc06d6240 <socket_destruct_func>,

}

 tcp_header_len = 32,
  gso_segs = 15,
  pred_flags = 1493504128,
  bytes_received = 1,
  segs_in = 4,
  data_segs_in = 0,
  rcv_nxt = 74596444,
  copied_seq = 74596443,
  rcv_wup = 74596444,
  snd_nxt = 2529374595,
  segs_out = 11,
  data_segs_out = 10,
  bytes_sent = 14480,
  bytes_acked = 0,
  dsack_dups = 0,
  snd_una = 2529360115,
  snd_sml = 2529360115,
  rcv_tstamp = 521240444,
  lsndtime = 521240445,

Regards,
Nagaraj P Arankal

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: socket leaks observed in Linux kernel's passive close path
  2022-10-13  6:47 socket leaks observed in Linux kernel's passive close path Arankal, Nagaraj
@ 2022-10-13 14:20 ` Andrew Lunn
  2022-10-13 14:44   ` Arankal, Nagaraj
  0 siblings, 1 reply; 4+ messages in thread
From: Andrew Lunn @ 2022-10-13 14:20 UTC (permalink / raw)
  To: Arankal, Nagaraj; +Cc: netdev

On Thu, Oct 13, 2022 at 06:47:56AM +0000, Arankal, Nagaraj wrote:
> Description:
> We have observed a strange race condition , where sockets are not freed in kernel in the following condition.
> We have a kernel module , which monitors the TCP connection state changes , as part of the functionality it replaces the default sk_destruct function of all TCP sockets with our module specific routine.  Looks like sk_destruct() is not invoked in following condition and hence the sockets are leaked despite receiving RESET from the remote.
> 
> 1.	Establish a TCP connection between Host A and Host B.
> 2.	Make the client at B to initiate the CLOSE() immediately after 3-way handshake.
> 3.	Server end sends huge amount of data to client and does close on FD.
> 4.	FIN from the client is not ACKED, and server is busy sending the data.
> 5.	RESET is received from the remote client.
> 6.	Sk_destruct() is not invoked due to non-null sk_refcnt or sk_wmem_alloc count.
> 
> Kernel version: Debian Linux 4.19.y(238,247)

Is this reproducible with a modern kernel? v6.0? If this is already
fixed, we need to identify what change fixed it, and get it back
ported. If it is broken in v6.0, and net-next, it first needs fixing
in net-next, and then back porting to the different LTS kernels.

   Andrew

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: socket leaks observed in Linux kernel's passive close path
  2022-10-13 14:20 ` Andrew Lunn
@ 2022-10-13 14:44   ` Arankal, Nagaraj
  2022-10-13 15:59     ` Andrew Lunn
  0 siblings, 1 reply; 4+ messages in thread
From: Arankal, Nagaraj @ 2022-10-13 14:44 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: netdev

Hi Andrew,
Thanks for looking into this,  I have not tested this on V6.0 kernel, and as far as I know I have not observed any fixes in this area, that's why I posted this, as this seems to be a valid case.

Thanks,
Nagaraj P Arankal

-----Original Message-----
From: Andrew Lunn <andrew@lunn.ch> 
Sent: Thursday, October 13, 2022 7:50 PM
To: Arankal, Nagaraj <nagaraj.p.arankal@hpe.com>
Cc: netdev@vger.kernel.org
Subject: Re: socket leaks observed in Linux kernel's passive close path

On Thu, Oct 13, 2022 at 06:47:56AM +0000, Arankal, Nagaraj wrote:
> Description:
> We have observed a strange race condition , where sockets are not freed in kernel in the following condition.
> We have a kernel module , which monitors the TCP connection state changes , as part of the functionality it replaces the default sk_destruct function of all TCP sockets with our module specific routine.  Looks like sk_destruct() is not invoked in following condition and hence the sockets are leaked despite receiving RESET from the remote.
> 
> 1.	Establish a TCP connection between Host A and Host B.
> 2.	Make the client at B to initiate the CLOSE() immediately after 3-way handshake.
> 3.	Server end sends huge amount of data to client and does close on FD.
> 4.	FIN from the client is not ACKED, and server is busy sending the data.
> 5.	RESET is received from the remote client.
> 6.	Sk_destruct() is not invoked due to non-null sk_refcnt or sk_wmem_alloc count.
> 
> Kernel version: Debian Linux 4.19.y(238,247)

Is this reproducible with a modern kernel? v6.0? If this is already fixed, we need to identify what change fixed it, and get it back ported. If it is broken in v6.0, and net-next, it first needs fixing in net-next, and then back porting to the different LTS kernels.

   Andrew

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: socket leaks observed in Linux kernel's passive close path
  2022-10-13 14:44   ` Arankal, Nagaraj
@ 2022-10-13 15:59     ` Andrew Lunn
  0 siblings, 0 replies; 4+ messages in thread
From: Andrew Lunn @ 2022-10-13 15:59 UTC (permalink / raw)
  To: Arankal, Nagaraj; +Cc: netdev

On Thu, Oct 13, 2022 at 02:44:02PM +0000, Arankal, Nagaraj wrote:
> Hi Andrew,
> Thanks for looking into this,  I have not tested this on V6.0 kernel, and as far as I know I have not observed any fixes in this area, that's why I posted this, as this seems to be a valid case.

Please don't top post. And set your mailer to wrap lines at around 78
characters.

Please post your test results for v6.0. Just because you have not seen
any fixes in the last 4 years does not mean it has not been fixed.

    Andrew

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-10-13 15:59 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-13  6:47 socket leaks observed in Linux kernel's passive close path Arankal, Nagaraj
2022-10-13 14:20 ` Andrew Lunn
2022-10-13 14:44   ` Arankal, Nagaraj
2022-10-13 15:59     ` Andrew Lunn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).