From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.1 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9F6FBC49ED6 for ; Wed, 11 Sep 2019 17:36:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 588392084F for ; Wed, 11 Sep 2019 17:36:59 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Ma1ydwHg" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729677AbfIKRg6 (ORCPT ); Wed, 11 Sep 2019 13:36:58 -0400 Received: from mail-pl1-f195.google.com ([209.85.214.195]:33848 "EHLO mail-pl1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729349AbfIKRg6 (ORCPT ); Wed, 11 Sep 2019 13:36:58 -0400 Received: by mail-pl1-f195.google.com with SMTP id d3so10501983plr.1 for ; Wed, 11 Sep 2019 10:36:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=ve4B7Xa9EmF4SP+XPGJsIOYpbQhJ/xQj4E3/FabbKAQ=; b=Ma1ydwHgKoEuZyVJM5s5WhRmZjOQMPKoQmQfOR0RDiNA8JYyVJbsxDcJZjQQr4jP4i ah9GG5MsszaqO1fGPxpDJPxRfkF2SfEpmZjn/N0r0EOsiDPtGr5R7G8n3VCuUUOp7NYK xIF8uwcuCiOkdVgkVJig0n3lIl2FDtxLxmGQF1fMUGOFxLtFQ/F95bMeaOWqniEH4iMj TyDiZiGaL0sh0op5L0XdwdLv/5G8R+uvAjWJnDOvBaO/vzmjlg4daLuPOJu0uhDc71/F NPz6MUUHNayYAd1lPeIyQQ9vpMIhQhrEXVwYk7MmeEPeefKForDSgoHxeLsMiRzTaeYD +yrQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ve4B7Xa9EmF4SP+XPGJsIOYpbQhJ/xQj4E3/FabbKAQ=; b=l767wrXP4efadhSxD1yc1tLXhUwvetd5VCbWjgen0wQnSY0Uzo/yBflREJojaZqNzL Ohnv78w8Rpdmp5z5UsSh3ztDREqnghupMGj3T4daA0lRdH1muktrahVKcKpz3b8w4TDb GlYvHWDSyQDf1+lDqkSHL8mPtHT4584Ep5zLVF0FFSIVNMxDH5PpDUQIYuzz7wJvKFqM F3aSRv0MeKqm2O+gAiM+6sXppvyIchacbndXCCSNFLMT1Jjeai7Iuz+EGw8aGyHoUiHO BEfJ7iFtx9+JaPPzM5En4LJl1HHlQk5zossQtbfgs6UrdE6fpe5sCrWFGp5wRYd1L6hg wxxw== X-Gm-Message-State: APjAAAWb3gsDWk/sW/w4v4uv1AiYyx27kHnvRSsPo2sySzo2o8Wg7ifn XRYT8+6qO1MRpinB9wPJa+EwiBRW0JSK4TNCY6w= X-Google-Smtp-Source: APXvYqzs9dp/BGaBdu+3g8Rlv0efssKW/i6lpW2HucH4ZhlTRXiG990YIchWlFCosAHAbcPYpGXSy7w+JE4nHtqL54A= X-Received: by 2002:a17:902:a983:: with SMTP id bh3mr38037652plb.311.1568223416435; Wed, 11 Sep 2019 10:36:56 -0700 (PDT) MIME-Version: 1.0 References: <20190826161915.81676-1-edumazet@google.com> In-Reply-To: <20190826161915.81676-1-edumazet@google.com> From: Christoph Paasch Date: Wed, 11 Sep 2019 10:36:45 -0700 Message-ID: Subject: Re: [PATCH net] tcp: remove empty skb from write queue in error cases To: Eric Dumazet Cc: "David S . Miller" , netdev , Soheil Hassas Yeganeh , Neal Cardwell , Eric Dumazet , Jason Baron , Vladimir Rutsky Content-Type: text/plain; charset="UTF-8" Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Hello, On Mon, Aug 26, 2019 at 11:04 AM Eric Dumazet wrote: > > Vladimir Rutsky reported stuck TCP sessions after memory pressure > events. Edge Trigger epoll() user would never receive an EPOLLOUT > notification allowing them to retry a sendmsg(). > > Jason tested the case of sk_stream_alloc_skb() returning NULL, > but there are other paths that could lead both sendmsg() and sendpage() > to return -1 (EAGAIN), with an empty skb queued on the write queue. > > This patch makes sure we remove this empty skb so that > Jason code can detect that the queue is empty, and > call sk->sk_write_space(sk) accordingly. > > Fixes: ce5ec440994b ("tcp: ensure epoll edge trigger wakeup when write queue is empty") > Signed-off-by: Eric Dumazet > Cc: Jason Baron > Reported-by: Vladimir Rutsky > Cc: Soheil Hassas Yeganeh > Cc: Neal Cardwell > --- > net/ipv4/tcp.c | 30 ++++++++++++++++++++---------- > 1 file changed, 20 insertions(+), 10 deletions(-) I got syzkaller complaining now on 4.14.143 with the following reproducer: # {Threaded:true Collide:true Repeat:true RepeatTimes:0 Procs:1 Sandbox: Fault:false FaultCall:-1 FaultNth:0 EnableTun:false UseTmpDir:false EnableCgroups:false EnableNetdev:false ResetNet:false HandleSegv:false Repro:false Trace:false} r0 = socket$inet_tcp(0x2, 0x1, 0x0) setsockopt$inet_tcp_TCP_REPAIR(r0, 0x6, 0x13, &(0x7f0000000040)=0x1, 0x4) setsockopt$inet_tcp_TCP_REPAIR_QUEUE(r0, 0x6, 0x14, &(0x7f00000012c0)=0x2, 0x4) setsockopt$inet_tcp_int(r0, 0x6, 0x19, &(0x7f0000000000)=0x9, 0x4) setsockopt$inet_tcp_TCP_MD5SIG(r0, 0x6, 0xe, &(0x7f00000001c0)={@in={{0x2, 0x0, @empty}}, 0x0, 0x2, 0x0, "c157cf4809151e5e89cfd6d934fbe981ec8ff6afc252ccf486c325c7ff3d35f3a89412a5cb6430e169092617df2ba65bf0ab844572e4e7dd4ece8ec1de5ac1ccd870067b018cb3b1f05f2391d872b67d"}, 0xd8) connect$inet(r0, &(0x7f0000000080)={0x2, 0x0, @dev={0xac, 0x14, 0x14, 0x1d}}, 0x10) sendto(r0, 0x0, 0x87, 0x0, 0x0, 0x391) kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN PTI Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 1 PID: 2529 Comm: syz-executor709 Not tainted 4.14.143 #5 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.5.1 01/01/2011 task: ffff8880677fdc00 task.stack: ffff8880642b0000 RIP: 0010:tcp_sendmsg_locked+0x6b4/0x4390 net/ipv4/tcp.c:1350 RSP: 0018:ffff8880642bf718 EFLAGS: 00010206 RAX: 0000000000000014 RBX: 0000000000000087 RCX: ffff88806a794f50 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000000000a0 RBP: ffff8880642bfaa8 R08: 0000000000000006 R09: ffff8880677fe3a0 R10: 0000000000000000 R11: 0000000000000000 R12: dffffc0000000000 R13: ffff88806a794f50 R14: ffff88806a794d00 R15: 0000000000000087 FS: 00007f644b697700(0000) GS:ffff88806cf00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffcd37370b0 CR3: 00000000679f2006 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: tcp_sendmsg+0x2a/0x40 net/ipv4/tcp.c:1533 inet_sendmsg+0x173/0x4e0 net/ipv4/af_inet.c:784 sock_sendmsg_nosec net/socket.c:646 [inline] sock_sendmsg+0xc3/0x100 net/socket.c:656 SYSC_sendto+0x35d/0x5e0 net/socket.c:1766 do_syscall_64+0x241/0x680 arch/x86/entry/common.c:292 entry_SYSCALL_64_after_hwframe+0x42/0xb7 RIP: 0033:0x7f644afc6469 RSP: 002b:00007f644b696f28 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 0000000000602130 RCX: 00007f644afc6469 RDX: 0000000000000087 RSI: 0000000000000000 RDI: 0000000000000003 RBP: 0000000000602138 R08: 0000000000000000 R09: 0000000000000391 R10: 0000000000000000 R11: 0000000000000246 R12: 000000000060213c R13: 00007ffcd373700f R14: 00007f644b677000 R15: 0000000000000003 Code: 74 08 3c 03 0f 8e f1 32 00 00 8b 85 98 fd ff ff 89 85 60 fd ff ff 48 8b 85 70 fd ff ff 48 8d b8 a0 00 00 00 48 89 f8 48 c1 e8 03 <42> 0f b6 04 20 84 c0 74 06 0f 8e d2 32 00 00 4c 8b bd 70 fd ff RIP: tcp_sendmsg_locked+0x6b4/0x4390 net/ipv4/tcp.c:1350 RSP: ffff8880642bf718 ---[ end trace 70f07f242cd3b9d8 ]--- It's because skb is NULL in tcp_sendmsg_locked at: skb = tcp_write_queue_tail(sk); if (tcp_send_head(sk)) { if (skb->ip_summed == CHECKSUM_NONE) I think we need this here on pre-rb-tree kernels : diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 5ce069ce2a97..efe767e20d01 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -924,8 +924,7 @@ static void tcp_remove_empty_skb(struct sock *sk, struct sk_buff *skb) { if (skb && !skb->len) { tcp_unlink_write_queue(skb, sk); - if (tcp_write_queue_empty(sk)) - tcp_chrono_stop(sk, TCP_CHRONO_BUSY); + tcp_check_send_head(sk, skb); sk_wmem_free_skb(sk, skb); } } Does that look good? Thanks, Christoph