From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7AAB8C433ED for ; Sat, 8 May 2021 03:35:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 58F9661431 for ; Sat, 8 May 2021 03:35:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229956AbhEHDgL (ORCPT ); Fri, 7 May 2021 23:36:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59604 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229583AbhEHDgG (ORCPT ); Fri, 7 May 2021 23:36:06 -0400 Received: from mail-oo1-xc32.google.com (mail-oo1-xc32.google.com [IPv6:2607:f8b0:4864:20::c32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7682EC061574; Fri, 7 May 2021 20:34:54 -0700 (PDT) Received: by mail-oo1-xc32.google.com with SMTP id v13-20020a4aa40d0000b02902052145a469so765340ool.3; Fri, 07 May 2021 20:34:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=JlWWLJu+SOU+OleF8b9iNUr3ONW3oZ22rK7JJmpNKGM=; b=m626vE1H+y7lCGKUvaStCwyNFCWbaoHpaaFnCRHEZz1ewCn8HHfZRPYdHGSnMOCmwP OuWtKYVL07svmyJqJez+afEJdQWf6Nnvpt1Qs37St+SuG978lK3nXC9heaT1AJLE5gLj C6XsPzR/NUSVqgA6CZU3yKKf6y4BAPJyekT9O/3k2Z3hmU8CNyQKvlgWhJJzO1ygPORs PfqP+qF0rf+hX4ox9MgAXnQC2qqqD7DDwNdsxgoYBnNMzNPdjhIyXDhxgtYixXT3Vr7K jSsnlsfXS66ccPPnBjhLh4YZu6hMiIqZTUYpXTa2QyeKehSdJEroZJ4/0K39kxa8dyh4 rZnA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=JlWWLJu+SOU+OleF8b9iNUr3ONW3oZ22rK7JJmpNKGM=; b=WITWgsJhgZ7T5UDnAdglsj9vv56X3B1SlXGeXhYGR2H/qRvmmCLUg9rg4SFRPemjxn JogenqVXdYdPylTJW1E4Q9vSlVlWngP1BqAfpiSXmro+/e4yWgOkcKfL2VOt73QYpqss 7UjQUem+bogPf6LYTB8nQMzIMW9GhwYIR2QEH77VmOBExi3UrzTcGKzFjjaAAUV+QX6U ko0sue9nZCBYrX6n9Y2LdxhyvF0O/h5Y3ybdXnD0AFxWbF0oG9PaQia7TODHJ3I6b2a2 97ghmD2vvENJYYRg2S5Xaq/kj2dalR7qg5OQAaFZuFzUhWbsjhvdMkGIKWxPLxYg4nvs PhRg== X-Gm-Message-State: AOAM5323PW33fkq0LglRaKQccM5SrEnQmydlk+UQ5P0R/lVUO0MPqCuP AAMpd4sc9vzn6Kn5mCalBjPprLf/ozQLJdLBMcb8gnjSNr7VJw== X-Google-Smtp-Source: ABdhPJwiqnmWKQSmnAc6WoItAb8lMqqp5/2k0XyXd7j1CoRBFqqM3X9DRrw3NGEAWJ/ypZW3iq6dFYn8YIH/0lk/aaA= X-Received: by 2002:a4a:d2cb:: with SMTP id j11mr10482767oos.87.1620444893917; Fri, 07 May 2021 20:34:53 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Jason Xing Date: Sat, 8 May 2021 11:34:18 +0800 Message-ID: Subject: Re: soft lockup in __inet_lookup_established() function To: David Miller , Hideaki YOSHIFUJI , dsahern@kernel.org, kuba@kernel.org Cc: netdev , LKML , liweishi Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Does anyone have some suggestions? I've been haunted for a while. thanks, jason On Thu, Apr 29, 2021 at 8:15 PM Jason Xing wrote: > > Hello, > > I've encountered one big issue which causes infinite loop in > __inet_lookup_established() function until I reboot manually. it's > happening randomly among thousands of machines with the 4.19 kernel > running. Once the soft lockup issue is triggered, whatever I try I > still cannot ping or ssh to the machine anymore until reboot. > > Does anyone have any clue on how to dig into this part of code? I > highly suspect that it has something to do with the corruption of > nulls_list, so the lookup of sk could never break the infinite loop of > hashinfo. > > These call traces are totally identical attached below: > [1048271.465028] watchdog: BUG: soft lockup - CPU#20 stuck for 22s! > [swapper/20:0] > [1048271.473669] Modules linked in: vxlan ip6_udp_tunnel udp_tunnel > udp_diag tcp_diag inet_diag nf_conntrack_netlink nfnetlink > br_netfilter bridge stp llc xt_statistic xt_nat ipt_MASQUERADE > ipt_REJECT nf_reject_ipv4 xt_mark xt_addrtype xt_comment xt_conntrack > ... > [1048271.553597] RIP: 0010:__inet_lookup_established+0x5a/0x190 > ... > [1048271.660309] Call Trace: > [1048271.663135] > [1048271.665432] tcp_v4_early_demux+0xaa/0x150 > [1048271.669812] ip_rcv_finish+0x171/0x410 > [1048271.673941] ip_rcv+0x273/0x362 > [1048271.677360] ? inet_add_protocol.cold.1+0x1e/0x1e > [1048271.682354] __netif_receive_skb_core+0xac2/0xbb0 > [1048271.687351] ? inet_gro_receive+0x22a/0x2d0 > [1048271.692001] ? ktime_get_with_offset+0x4d/0xc0 > [1048271.696725] netif_receive_skb_internal+0x42/0xf0 > [1048271.701717] napi_gro_receive+0xba/0xe0 > [1048271.705839] receive_buf+0x165/0xa50 [virtio_net] > [1048271.710839] ? receive_buf+0x165/0xa50 [virtio_net] > [1048271.716053] ? vring_unmap_one+0x16/0x80 > [1048271.720308] ? detach_buf+0x69/0x110 > [1048271.724218] virtnet_poll+0xc0/0x2ea [virtio_net] > [1048271.729202] net_rx_action+0x149/0x3b0 > [1048271.733234] __do_softirq+0xe3/0x30a > [1048271.737095] irq_exit+0x100/0x110 > [1048271.740882] do_IRQ+0x85/0xd0 > [1048271.744143] common_interrupt+0xf/0xf > [1048271.748104] > > > thanks, > jason