linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [net-next] tcp: don't free a FIN sk_buff in tcp_remove_empty_skb()
@ 2021-10-20 23:24 Jon Maxwell
  2021-10-21  1:10 ` Eric Dumazet
  2021-10-21  5:35 ` kernel test robot
  0 siblings, 2 replies; 5+ messages in thread
From: Jon Maxwell @ 2021-10-20 23:24 UTC (permalink / raw)
  To: edumazet; +Cc: davem, yoshfuji, dsahern, kuba, netdev, linux-kernel, jmaxwell37

A customer reported sockets stuck in the CLOSING state. A Vmcore revealed that 
the write_queue was not empty as determined by tcp_write_queue_empty() but the 
sk_buff containing the FIN flag had been freed and the socket was zombied in 
that state. Corresponding pcaps show no FIN from the Linux kernel on the wire.

Some instrumentation was added to the kernel and it was found that there is a 
timing window where tcp_sendmsg() can run after tcp_send_fin().

tcp_sendmsg() will hit an error, for example:

1269 ▹       if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN))↩
1270 ▹       ▹       goto do_error;↩

tcp_remove_empty_skb() will then free the FIN sk_buff as "skb->len == 0". The
TCP socket is now wedged in the FIN-WAIT-1 state because the FIN is never sent.

If the other side sends a FIN packet the socket will transition to CLOSING and
remain that way until the system is rebooted.

Fix this by checking for the FIN flag in the sk_buff and don't free it if that 
is the case. Testing confirmed that fixed the issue.

Fixes: fdfc5c8594c2 ("tcp: remove empty skb from write queue in error cases")
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
---
 net/ipv4/tcp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index c2d9830136d2..d2b06d8f0c37 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -938,7 +938,7 @@ int tcp_send_mss(struct sock *sk, int *size_goal, int flags)
  */
 void tcp_remove_empty_skb(struct sock *sk, struct sk_buff *skb)
 {
-	if (skb && !skb->len) {
+	if (skb && !skb->len && !TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN) {
 		tcp_unlink_write_queue(skb, sk);
 		if (tcp_write_queue_empty(sk))
 			tcp_chrono_stop(sk, TCP_CHRONO_BUSY);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [net-next] tcp: don't free a FIN sk_buff in tcp_remove_empty_skb()
  2021-10-20 23:24 [net-next] tcp: don't free a FIN sk_buff in tcp_remove_empty_skb() Jon Maxwell
@ 2021-10-21  1:10 ` Eric Dumazet
  2021-10-21  1:48   ` Jonathan Maxwell
  2021-10-21  5:35 ` kernel test robot
  1 sibling, 1 reply; 5+ messages in thread
From: Eric Dumazet @ 2021-10-21  1:10 UTC (permalink / raw)
  To: Jon Maxwell
  Cc: David Miller, Hideaki YOSHIFUJI, David Ahern, Jakub Kicinski,
	netdev, LKML

On Wed, Oct 20, 2021 at 4:25 PM Jon Maxwell <jmaxwell37@gmail.com> wrote:
>
> A customer reported sockets stuck in the CLOSING state. A Vmcore revealed that
> the write_queue was not empty as determined by tcp_write_queue_empty() but the
> sk_buff containing the FIN flag had been freed and the socket was zombied in
> that state. Corresponding pcaps show no FIN from the Linux kernel on the wire.
>
> Some instrumentation was added to the kernel and it was found that there is a
> timing window where tcp_sendmsg() can run after tcp_send_fin().
>
> tcp_sendmsg() will hit an error, for example:
>
> 1269 ▹       if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN))↩
> 1270 ▹       ▹       goto do_error;↩
>
> tcp_remove_empty_skb() will then free the FIN sk_buff as "skb->len == 0". The
> TCP socket is now wedged in the FIN-WAIT-1 state because the FIN is never sent.
>
> If the other side sends a FIN packet the socket will transition to CLOSING and
> remain that way until the system is rebooted.
>
> Fix this by checking for the FIN flag in the sk_buff and don't free it if that
> is the case. Testing confirmed that fixed the issue.
>
> Fixes: fdfc5c8594c2 ("tcp: remove empty skb from write queue in error cases")
> Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
> ---
>  net/ipv4/tcp.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index c2d9830136d2..d2b06d8f0c37 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -938,7 +938,7 @@ int tcp_send_mss(struct sock *sk, int *size_goal, int flags)
>   */
>  void tcp_remove_empty_skb(struct sock *sk, struct sk_buff *skb)
>  {
> -       if (skb && !skb->len) {
> +       if (skb && !skb->len && !TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN) {
>                 tcp_unlink_write_queue(skb, sk);
>                 if (tcp_write_queue_empty(sk))
>                         tcp_chrono_stop(sk, TCP_CHRONO_BUSY);
>

Very nice catch !

The FIN flag is a really special case here.

What we need is to make sure the skb is 'empty' .

What about using a single condition ?

if (skb && TCP_SKB_CB(skb)->seq == TCP_SKB_CB(skb)->end_seq)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [net-next] tcp: don't free a FIN sk_buff in tcp_remove_empty_skb()
  2021-10-21  1:10 ` Eric Dumazet
@ 2021-10-21  1:48   ` Jonathan Maxwell
  0 siblings, 0 replies; 5+ messages in thread
From: Jonathan Maxwell @ 2021-10-21  1:48 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, Hideaki YOSHIFUJI, David Ahern, Jakub Kicinski,
	netdev, LKML

On Thu, Oct 21, 2021 at 12:11 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Wed, Oct 20, 2021 at 4:25 PM Jon Maxwell <jmaxwell37@gmail.com> wrote:
> >
> > A customer reported sockets stuck in the CLOSING state. A Vmcore revealed that
> > the write_queue was not empty as determined by tcp_write_queue_empty() but the
> > sk_buff containing the FIN flag had been freed and the socket was zombied in
> > that state. Corresponding pcaps show no FIN from the Linux kernel on the wire.
> >
> > Some instrumentation was added to the kernel and it was found that there is a
> > timing window where tcp_sendmsg() can run after tcp_send_fin().
> >
> > tcp_sendmsg() will hit an error, for example:
> >
> > 1269 ▹       if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN))↩
> > 1270 ▹       ▹       goto do_error;↩
> >
> > tcp_remove_empty_skb() will then free the FIN sk_buff as "skb->len == 0". The
> > TCP socket is now wedged in the FIN-WAIT-1 state because the FIN is never sent.
> >
> > If the other side sends a FIN packet the socket will transition to CLOSING and
> > remain that way until the system is rebooted.
> >
> > Fix this by checking for the FIN flag in the sk_buff and don't free it if that
> > is the case. Testing confirmed that fixed the issue.
> >
> > Fixes: fdfc5c8594c2 ("tcp: remove empty skb from write queue in error cases")
> > Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
> > ---
> >  net/ipv4/tcp.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > index c2d9830136d2..d2b06d8f0c37 100644
> > --- a/net/ipv4/tcp.c
> > +++ b/net/ipv4/tcp.c
> > @@ -938,7 +938,7 @@ int tcp_send_mss(struct sock *sk, int *size_goal, int flags)
> >   */
> >  void tcp_remove_empty_skb(struct sock *sk, struct sk_buff *skb)
> >  {
> > -       if (skb && !skb->len) {
> > +       if (skb && !skb->len && !TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN) {
> >                 tcp_unlink_write_queue(skb, sk);
> >                 if (tcp_write_queue_empty(sk))
> >                         tcp_chrono_stop(sk, TCP_CHRONO_BUSY);
> >
>
> Very nice catch !
>

Thanks Eric.

> The FIN flag is a really special case here.
>
> What we need is to make sure the skb is 'empty' .
>
> What about using a single condition ?
>
> if (skb && TCP_SKB_CB(skb)->seq == TCP_SKB_CB(skb)->end_seq)

Good call as the end_seq will be +1 for a FIN. So that's better.

Let me give the customer a kernel with your idea:

--- net/ipv4/tcp.c 2021-10-20 22:50:35.836001950 +0530
+++ net/ipv4/tcp.c.patch 2021-10-21 01:42:08.493569483 +0530
@@ -955,7 +955,7 @@
  */
 void tcp_remove_empty_skb(struct sock *sk, struct sk_buff *skb)
 {
- if (skb && !skb->len) {
+ if (skb && TCP_SKB_CB(skb)->seq == TCP_SKB_CB(skb)->end_seq) {
  tcp_unlink_write_queue(skb, sk);
  if (tcp_write_queue_empty(sk))
  tcp_chrono_stop(sk, TCP_CHRONO_BUSY);

I'll ask the customer to confirm that the v1 patch as above also resolves
the issue. Although I expect it will.

Then I'll resubmit a v1 patch with your suggestion probably early next week.

Regards

Jon

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [net-next] tcp: don't free a FIN sk_buff in tcp_remove_empty_skb()
  2021-10-20 23:24 [net-next] tcp: don't free a FIN sk_buff in tcp_remove_empty_skb() Jon Maxwell
  2021-10-21  1:10 ` Eric Dumazet
@ 2021-10-21  5:35 ` kernel test robot
  1 sibling, 0 replies; 5+ messages in thread
From: kernel test robot @ 2021-10-21  5:35 UTC (permalink / raw)
  To: Jon Maxwell, edumazet
  Cc: llvm, kbuild-all, davem, yoshfuji, dsahern, kuba, netdev,
	linux-kernel, jmaxwell37

[-- Attachment #1: Type: text/plain, Size: 3028 bytes --]

Hi Jon,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Jon-Maxwell/tcp-don-t-free-a-FIN-sk_buff-in-tcp_remove_empty_skb/20211021-072644
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 2641b62d2fab52648e34cdc6994b2eacde2d27c1
config: i386-randconfig-a004-20211021 (attached as .config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 3cea2505fd8d99a9ba0cb625aecfe28a47c4e3f8)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/87f5ea309107de5183ec0bd7d0b48ec90546d350
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Jon-Maxwell/tcp-don-t-free-a-FIN-sk_buff-in-tcp_remove_empty_skb/20211021-072644
        git checkout 87f5ea309107de5183ec0bd7d0b48ec90546d350
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 ARCH=i386 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> net/ipv4/tcp.c:941:26: warning: logical not is only applied to the left hand side of this bitwise operator [-Wlogical-not-parentheses]
           if (skb && !skb->len && !TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN) {
                                   ^                           ~
   net/ipv4/tcp.c:941:26: note: add parentheses after the '!' to evaluate the bitwise operator first
           if (skb && !skb->len && !TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN) {
                                   ^
                                    (                                      )
   net/ipv4/tcp.c:941:26: note: add parentheses around left hand side expression to silence this warning
           if (skb && !skb->len && !TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN) {
                                   ^
                                   (                          )
   1 warning generated.


vim +941 net/ipv4/tcp.c

   932	
   933	/* In some cases, both sendpage() and sendmsg() could have added
   934	 * an skb to the write queue, but failed adding payload on it.
   935	 * We need to remove it to consume less memory, but more
   936	 * importantly be able to generate EPOLLOUT for Edge Trigger epoll()
   937	 * users.
   938	 */
   939	void tcp_remove_empty_skb(struct sock *sk, struct sk_buff *skb)
   940	{
 > 941		if (skb && !skb->len && !TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN) {
   942			tcp_unlink_write_queue(skb, sk);
   943			if (tcp_write_queue_empty(sk))
   944				tcp_chrono_stop(sk, TCP_CHRONO_BUSY);
   945			sk_wmem_free_skb(sk, skb);
   946		}
   947	}
   948	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 29372 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [net-next] tcp: don't free a FIN sk_buff in tcp_remove_empty_skb()
       [not found] <202110221513.52ubgVaN-lkp@intel.com>
@ 2021-10-24 17:18 ` kernel test robot
  0 siblings, 0 replies; 5+ messages in thread
From: kernel test robot @ 2021-10-24 17:18 UTC (permalink / raw)
  To: Jon Maxwell
  Cc: kbuild-all, Eric Dumazet, David S . Miller, yoshfuji,
	David Ahern, Jakub Kicinski, netdev, Linux Kernel Mailing List,
	jmaxwell37

[-- Attachment #1: Type: text/plain, Size: 2731 bytes --]

Hi Jon,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Jon-Maxwell/tcp-don-t-free-a-FIN-sk_buff-in-tcp_remove_empty_skb/20211021-072644
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 2641b62d2fab52648e34cdc6994b2eacde2d27c1
config: i386-randconfig-s002-20211021 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce:
         # apt-get install sparse
         # sparse version: v0.6.4-dirty
         # https://github.com/0day-ci/linux/commit/87f5ea309107de5183ec0bd7d0b48ec90546d350
         git remote add linux-review https://github.com/0day-ci/linux
         git fetch --no-tags linux-review Jon-Maxwell/tcp-don-t-free-a-FIN-sk_buff-in-tcp_remove_empty_skb/20211021-072644
         git checkout 87f5ea309107de5183ec0bd7d0b48ec90546d350
         # save the attached .config to linux build tree
         make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=i386 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)
 >> net/ipv4/tcp.c:941:61: sparse: sparse: dubious: !x & y

vim +941 net/ipv4/tcp.c

0c54b85f282812 Ilpo Järvinen 2009-03-14  932
fdfc5c8594c24c Eric Dumazet  2019-08-26  933  /* In some cases, both sendpage() and sendmsg() could have added
fdfc5c8594c24c Eric Dumazet  2019-08-26  934   * an skb to the write queue, but failed adding payload on it.
fdfc5c8594c24c Eric Dumazet  2019-08-26  935   * We need to remove it to consume less memory, but more
fdfc5c8594c24c Eric Dumazet  2019-08-26  936   * importantly be able to generate EPOLLOUT for Edge Trigger epoll()
fdfc5c8594c24c Eric Dumazet  2019-08-26  937   * users.
fdfc5c8594c24c Eric Dumazet  2019-08-26  938   */
b796d04bd014fd Paolo Abeni   2020-11-16  939  void tcp_remove_empty_skb(struct sock *sk, struct sk_buff *skb)
fdfc5c8594c24c Eric Dumazet  2019-08-26  940  {
87f5ea309107de Jon Maxwell   2021-10-21 @941  	if (skb && !skb->len && !TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN) {
fdfc5c8594c24c Eric Dumazet  2019-08-26  942  		tcp_unlink_write_queue(skb, sk);
fdfc5c8594c24c Eric Dumazet  2019-08-26  943  		if (tcp_write_queue_empty(sk))
fdfc5c8594c24c Eric Dumazet  2019-08-26  944  			tcp_chrono_stop(sk, TCP_CHRONO_BUSY);
fdfc5c8594c24c Eric Dumazet  2019-08-26  945  		sk_wmem_free_skb(sk, skb);
fdfc5c8594c24c Eric Dumazet  2019-08-26  946  	}
fdfc5c8594c24c Eric Dumazet  2019-08-26  947  }
fdfc5c8594c24c Eric Dumazet  2019-08-26  948

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 33947 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-10-24 17:19 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-20 23:24 [net-next] tcp: don't free a FIN sk_buff in tcp_remove_empty_skb() Jon Maxwell
2021-10-21  1:10 ` Eric Dumazet
2021-10-21  1:48   ` Jonathan Maxwell
2021-10-21  5:35 ` kernel test robot
     [not found] <202110221513.52ubgVaN-lkp@intel.com>
2021-10-24 17:18 ` kernel test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).