From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.0 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 94FF7C43219 for ; Thu, 2 May 2019 00:50:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 5CBC22089E for ; Thu, 2 May 2019 00:50:57 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="ZwTfXbXG" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726173AbfEBAu4 (ORCPT ); Wed, 1 May 2019 20:50:56 -0400 Received: from mail-pg1-f195.google.com ([209.85.215.195]:34339 "EHLO mail-pg1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726126AbfEBAuz (ORCPT ); Wed, 1 May 2019 20:50:55 -0400 Received: by mail-pg1-f195.google.com with SMTP id c13so252136pgt.1 for ; Wed, 01 May 2019 17:50:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=ssFCj4sAN3kpeGckLbmov0T+H+j7TxGSG0J8QLio6KA=; b=ZwTfXbXGnKAoclInUsaTHWqUlgqGm1/EiW8CJ0FhFaZQR9PxcErrT/gglJ4EotzuhO WNmrA6aLkQ/aWJI8EtqHr3hnJPRMCgFqW0YG/62tqQlvk979XNBtTftt2T12M4/kwboN yIi3fnKXjuUqxy+v1qtwwd+RYrN4ySr+rfVRJjPUTBvK9Lsam+YDQKsp4JGy/L0y6B0p 5WKIvlCBjpl8JJcezoskcJd8T7wMKnollXU+0F71dUYMgoiE//xzq7irrWwjUTulClnY uurR6LM2nE7u4i4GKR/fDGH9hKBX69WAAUT3wGfXTgrVGgEKWN4AMUdo7NbMzBcY5fHo SKGg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ssFCj4sAN3kpeGckLbmov0T+H+j7TxGSG0J8QLio6KA=; b=mOopQDmkS2nykYzZDr036jGfP+T+kbEtKVbG/LqstJntlfmvrmw4bdScUeDgxyMBkY hMQZNQAc2fse/6zND1oTmCkl2DjrzPYhkQN7P4v8PHJXEKm6/sp0AxZP0rZudumdU0Co +BH7eQuaKQJq7yVVYDbatLNhx6fawU2LyMIzHwqer9paeaWDmKCeoAiV6jmrGj5MMU47 +ldPqQXZR8qW6TBPSMh0An+DSVJjCUikLnFuu5lxQg/RQcWrE5DaxFmr8/crE+ZWTLId 3AYO/csKUILJEnhi/fsNECK6ZWK6kVKwXvZoJPzSvZery2cd1LvXJJqbcN6Q7KFsb8r6 4LGQ== X-Gm-Message-State: APjAAAXNNMIpVKXL3dX95Lur6HYswdGDRvucSDWZhmb0XsPqRyz0guLC 3PVU8+TkYkMLOBOM+WthNQfTvgzb3WcvDOua3Cc= X-Google-Smtp-Source: APXvYqygTE/ZEPfaJOzd0yGnzhrkiXJ4JyiL27msW76y+6VKaazUKoGR//L6KwOWCghtnYlFmVcOcTSSixrT3fZK9uo= X-Received: by 2002:a05:6a00:cc:: with SMTP id e12mr937316pfj.207.1556758255087; Wed, 01 May 2019 17:50:55 -0700 (PDT) MIME-Version: 1.0 References: <20190430185009.20456-1-xiyou.wangcong@gmail.com> <68f5b7e3-4022-edd4-8d18-752b3dfc500f@mellanox.com> In-Reply-To: <68f5b7e3-4022-edd4-8d18-752b3dfc500f@mellanox.com> From: Cong Wang Date: Wed, 1 May 2019 17:50:43 -0700 Message-ID: Subject: Re: [Patch net-next] net: add a generic tracepoint for TX queue timeout To: Eran Ben Elisha Cc: "netdev@vger.kernel.org" , Jiri Pirko Content-Type: text/plain; charset="UTF-8" Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Wed, May 1, 2019 at 6:11 AM Eran Ben Elisha wrote: > > > > On 4/30/2019 9:50 PM, Cong Wang wrote: > > Although devlink health report does a nice job on reporting TX > > timeout and other NIC errors, unfortunately it requires drivers > > to support it but currently only mlx5 has implemented it. > > The devlink health was never intended to be the generic mechanism for > monitoring all driver's TX timeouts notifications. mlx5e driver chose to > handle TX timeout notification by reporting it to the newly devlink > health mechanism. Understood. > > > Before other drivers could catch up, it is useful to have a > > generic tracepoint to monitor this kind of TX timeout. We have > > been suffering TX timeout with different drivers, we plan to > > start to monitor it with rasdaemon which just needs a new tracepoint. > > Great idea to suggest a generic trace message that can be monitored over > all drivers. > > > > > Sample output: > > > > ksoftirqd/1-16 [001] ..s2 144.043173: net_dev_xmit_timeout: dev=ens3 driver=e1000 queue=0 > > > > Cc: Eran Ben Elisha > > Cc: Jiri Pirko > > Signed-off-by: Cong Wang > > --- > > include/trace/events/net.h | 23 +++++++++++++++++++++++ > > net/sched/sch_generic.c | 2 ++ > > 2 files changed, 25 insertions(+) > > > > diff --git a/include/trace/events/net.h b/include/trace/events/net.h > > index 1efd7d9b25fe..002d6f04b9e5 100644 > > --- a/include/trace/events/net.h > > +++ b/include/trace/events/net.h > > @@ -303,6 +303,29 @@ DEFINE_EVENT(net_dev_rx_exit_template, netif_receive_skb_list_exit, > > TP_ARGS(ret) > > ); > > > > I would have put this next to net_dev_xmit trace event declaration. > Sounds reasonable, it would be slightly easier to find it. I will send v2. Thanks.