From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.6 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8B186C43331 for ; Fri, 3 Apr 2020 04:30:28 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 55773206F8 for ; Fri, 3 Apr 2020 04:30:28 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="UPDh2QNL" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731546AbgDCEa1 (ORCPT ); Fri, 3 Apr 2020 00:30:27 -0400 Received: from mail-ot1-f68.google.com ([209.85.210.68]:46466 "EHLO mail-ot1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725851AbgDCEa1 (ORCPT ); Fri, 3 Apr 2020 00:30:27 -0400 Received: by mail-ot1-f68.google.com with SMTP id 111so5947008oth.13 for ; Thu, 02 Apr 2020 21:30:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Ol4C8g0+klaf0A6nMtyYR5Z8r2Ln5uSPPaqxsHbTubA=; b=UPDh2QNLDkvW9+Pj0o2auxtPk5bGvNH4ZUMEDZ1Sp4YUIloPUEN1dUS0DBSjLM3kwG bSbBH6OEuFJlYDQMULmn2jraRDRRcRX3ivfSg+LaxWtai5PuJnhyHCyYwLL+eV5DwytS edFRuXnEqR5MEv9K9UvnkeLi1l4glPzbayAE9xlZ7PbR52J5Op4nz1munevNgJw7IGER 0SieiYcMmFwn7yRwFJztkNP1rN6BYefdFrLkW+mo9FBAQdscvqdCvHAcUN9V42f5IVRj fVs/1/bvkMen9u040FlPSFCPUbBLFF13714HwC8bs1dFTfGxWxhAMU3eFmpGh5d/q6Lg 2yrQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Ol4C8g0+klaf0A6nMtyYR5Z8r2Ln5uSPPaqxsHbTubA=; b=k3jMrVU9bY3TAAleRgHP0NdfDLi6/krG3azTsGEWL6lRu3bqIYP4/3+v0S8nns1lSn +KKxtr/4RVdLfiUUpD2SujeOSxASXiTTMoH5X27F1z0o4pShzixITQP/Jvtttu1nRxoD +QjD9xTGMrolo8w+uxpaztafQhekUtStLl6h99bLF5fe/0VxQZqCK/r1LqD27rw6/10z gL1X3mE34w9oHoDMQwb654hVYaRNjtNDz5BYbg8E3IE8BZeIXsSzM09iKP9pL9AARcp0 WuQ3qWqjsxmsJVDygvrvGOrJ1u1dzaKR1O0OUG7aEYuhVQRp8/bC5bKYjurRhRA6VM6p hTWw== X-Gm-Message-State: AGi0Pubn689+foWsp3kNfvKfNFx5lBrnVqWgqWkCCfZ2oY5shtcdeYdb OANwQuBlIAFWmQynodtT3Q7KWHqhJJSU8geToco= X-Google-Smtp-Source: APiQypIjAuWRS7pcR6IS1hZXaNmaKSCyNMPn6vZop8xk0NR48hrz0qxiW0Ktdlxbzc6bevRbVRbE83+6fIWZWmNGaMI= X-Received: by 2002:a05:6830:1e96:: with SMTP id n22mr4929443otr.189.1585888226400; Thu, 02 Apr 2020 21:30:26 -0700 (PDT) MIME-Version: 1.0 References: <20200402152336.538433-1-leon@kernel.org> <20200402.180218.940555077368617365.davem@davemloft.net> In-Reply-To: <20200402.180218.940555077368617365.davem@davemloft.net> From: Cong Wang Date: Thu, 2 Apr 2020 21:30:15 -0700 Message-ID: Subject: Re: [PATCH net] net/sched: Don't print dump stack in event of transmission timeout To: David Miller Cc: Leon Romanovsky , Jakub Kicinski , leonro@mellanox.com, Arjan van de Ven , Jamal Hadi Salim , Jiri Pirko , Linux Kernel Network Developers , itayav@mellanox.com Content-Type: text/plain; charset="UTF-8" Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Thu, Apr 2, 2020 at 6:02 PM David Miller wrote: > > From: Leon Romanovsky > Date: Thu, 2 Apr 2020 18:23:36 +0300 > > > In event of transmission timeout, the drivers are given an opportunity > > to recover and continue to work after some in-house cleanups. > > > > Such event can be caused by HW bugs, wrong congestion configurations > > and many more other scenarios. In such case, users are interested to > > get a simple "NETDEV WATCHDOG ... " print, which points to the relevant > > netdevice in trouble. > > > > The dump stack printed later was added in the commit b4192bbd85d2 > > ("net: Add a WARN_ON_ONCE() to the transmit timeout function") to give > > extra information, like list of the modules and which driver is involved. > > > > While the latter is already printed in "NETDEV WATCHDOG ... ", the list > > of modules rarely needed and can be collected later. > > > > So let's remove the WARN_ONCE() and make dmesg look more user-friendly in > > large cluster setups. > > Software bugs play into these situations and on at least two or three > occasions I know that the backtrace hinted at the cause of the bug. > I don't see how a timer stack trace could help to debug this issue in any scenario, the messages out of this stack trace are indeed helpful. On the other hand, a stack trace does help to get some attention via ABRT, but at least for us we now use rasdaemon to capture this, so I am 100% fine to remove this stack trace. Thanks.