From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 08B82C43381 for ; Sat, 16 Mar 2019 15:10:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C1830218E0 for ; Sat, 16 Mar 2019 15:10:35 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="XXJHsw1S" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727162AbfCPPKe (ORCPT ); Sat, 16 Mar 2019 11:10:34 -0400 Received: from mail-wr1-f65.google.com ([209.85.221.65]:39445 "EHLO mail-wr1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726064AbfCPPKd (ORCPT ); Sat, 16 Mar 2019 11:10:33 -0400 Received: by mail-wr1-f65.google.com with SMTP id p8so12513106wrq.6 for ; Sat, 16 Mar 2019 08:10:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=TiA3h6jeIqmVW/xrlLyvJwqWtETFovnfRGU3HSxuv1M=; b=XXJHsw1SW2d/Tzoi3JZhiqozwI+6NLh7+ziSK81HVDwXBOf9gFMXUiWailaSTJpbH0 K8Sg26qmh13dj3g1LkTP1pH/g2CC+cAkQTyy8AOk27omDpksusvP9ntTFG4r7F9gsGie H+AHFdWWPQO41PByJI31evwQFz0tGxkQRb0iSx2bIoqGdI8KGGPFpf8cCw0o/ciudpJB mO77k6XWwH/iuAzRJ+jpo/G3Hj3CCSX0xArVGSJRGLhbABNj03IZPD//13oyOe7dNilZ KNCgd0x/9CFFsQjpLwGkebefgPoQeC+TRpvFFR7n/POIrUJrwavgs9zecUySZhJAfzkl rs3w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=TiA3h6jeIqmVW/xrlLyvJwqWtETFovnfRGU3HSxuv1M=; b=jt8vmRCDlRGvXU6H7I+PFh1eR4YgMDUcny+vl1yf9J2w849TuYb6qHUj9NPfW+pwpB nDnRDqcslFnlHcMYG17XuJ7EQrED1nX0Cr8L/I+IhbMbWciUjxwjYfuOkZEq18JFI0l0 DW12+f+bhFksGX2Rv7wvNukQEp/6R5GZ/ynj9iYcF5Uc2TIRY29vYgLdFDHq+RiaDFi9 J4b2psG2WwYd+vA5x82QZFlOU0xCmAfulfCmaARHLKIA8Xak/3bi6okEdD3Cx+DOX0LE P/zif3v79YspiR1LXqbbLN7peyAUcpxb4X8dXutPq1Zd3eHFgczvD3RECDMR9BENGSRm 9Cpw== X-Gm-Message-State: APjAAAXotM/f6VP1rzvMwSIupkapDC8pngrEfVcslM3Igc400PFDfbYJ 1z23RecWFmMqQ3qpFDHm+ttZOcV5 X-Google-Smtp-Source: APXvYqwYm8RYUYwTS9VB6u957vN8guEWHel3DCefLJguC3VkgyaE3imelYVjaY+H+Nrzoq7ZSGh8AA== X-Received: by 2002:adf:a147:: with SMTP id r7mr5867807wrr.5.1552749031186; Sat, 16 Mar 2019 08:10:31 -0700 (PDT) Received: from ?IPv6:2003:ea:8bc4:dc00:e9c5:8d8e:8498:7012? (p200300EA8BC4DC00E9C58D8E84987012.dip0.t-ipconnect.de. [2003:ea:8bc4:dc00:e9c5:8d8e:8498:7012]) by smtp.googlemail.com with ESMTPSA id s187sm6853474wms.7.2019.03.16.08.10.29 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 16 Mar 2019 08:10:30 -0700 (PDT) Subject: Re: r8169 driver from kernel 5.0 crashing - napi_consume_skb To: VDR User , Alexander Duyck Cc: netdev@vger.kernel.org References: <753b56b8-f1ab-82f5-f9b5-089fbb638989@gmail.com> <02388deb-0a06-95ae-1aac-b39c108fc2e7@gmail.com> <9b34d60d-8de7-5384-3822-98ec79d53e04@gmail.com> <0704f164-aa0a-bcae-a886-a7fc4a4cd52f@gmail.com> <8f910b1339a741cdc780f3948c11a082a8a51b9e.camel@linux.intel.com> From: Heiner Kallweit Message-ID: Date: Sat, 16 Mar 2019 16:10:26 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.5.3 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On 16.03.2019 15:38, VDR User wrote: >> Part of the issue though is that we don't know how reliable that test >> was. I believe Derek he hasn't had any crashes, but he wasn't confident >> that it had actually resolved the issue. > > Previously I thought I could easily & consistently reproduce the crash > but the more testing I did, the more I realized that wasn't the case. > That's why my confidence was low in that reversing commit 5317d5c6d47e > ("r8169: use napi_consume_skb where possible") fixed it. I felt like I > needed to do a lot more testing over the weekend to be sure. But, I > can now confirm that reversing that commit did not solve the problem. > I didn't ifdown/ifup after the crash so the nic eventually recovered > on its own I guess. The `ethtool -S` output is: > > NIC statistics: > tx_packets: 5370650 > rx_packets: 57340787 > tx_errors: 0 > rx_errors: 0 > rx_missed: 26 > align_errors: 0 > tx_single_collisions: 0 > tx_multi_collisions: 0 > unicast: 57332905 > broadcast: 6409 > multicast: 1473 > tx_aborted: 0 > tx_underrun: 0 > > The dmesg log looks the same as before: > > [95579.984062] ------------[ cut here ]------------ > [95579.984142] NETDEV WATCHDOG: enp4s0 (r8169): transmit queue 0 timed out > [95579.984224] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:461 > dev_watchdog+0x1bb/0x1e0 > [95579.984276] Modules linked in: snd_hda_codec_hdmi > snd_hda_codec_realtek snd_hda_codec_generic ohci_pci snd_hda_intel > snd_hda_codec snd_hwdep xhci_pci ohci_hcd ehci_pci xhci_hcd ehci_hcd > usbcore snd_hda_core usb_common snd_pcm snd_timer snd soundcore nfsd > auth_rpcgss oid_registry lockd grace sunrpc ip_tables x_tables ipv6 > [95579.984354] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.0.0-amd #1 > [95579.984387] Hardware name: ECS A75F-A/A75F-A, BIOS 4.6.5 09/14/2011 > [95579.984422] EIP: dev_watchdog+0x1bb/0x1e0 > [95579.984454] Code: 8b 50 3c 89 f8 e8 3d aa 0a 00 8b 7e f4 eb a4 89 > f8 c6 05 e7 1c 6d c1 01 e8 72 4f fd ff 53 50 57 68 78 05 66 c1 e8 25 > ad ba ff <0f> 0b 83 c4 10 eb c9 eb 1c 8d b4 26 00 00 00 00 8d b4 26 00 > 00 00 > [95579.986189] EAX: 0000003b EBX: 00000000 ECX: 00000800 EDX: 00000103 > [95579.986224] ESI: f4cc2264 EDI: f4cc2000 EBP: f4c99f74 ESP: f4c99f4c > [95579.986259] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210296 > [95579.986292] CR0: 80050033 CR2: b7c644f0 CR3: 0dfd2000 CR4: 00000690 > [95579.986325] Call Trace: > [95579.986356] > [95579.986389] ? qdisc_put_unlocked+0x40/0x40 > [95579.986423] call_timer_fn+0x19/0xa0 > [95579.986456] run_timer_softirq+0x337/0x380 > [95579.986488] ? qdisc_put_unlocked+0x40/0x40 > [95579.986521] ? rcu_process_callbacks+0xcb/0x380 > [95579.986555] __do_softirq+0xd6/0x21c > [95579.986586] ? __irqentry_text_end+0x18/0x18 > [95579.986619] call_on_stack+0x10/0x60 > [95579.986646] > [95579.986674] ? irq_exit+0x91/0xc0 > [95579.986701] ? smp_apic_timer_interrupt+0x56/0xa0 > [95579.986731] ? apic_timer_interrupt+0xd5/0xdc > [95579.986761] ? acpi_idle_enter_s2idle+0x60/0x60 > [95579.986790] ? cpuidle_enter_state+0x122/0x360 > [95579.986821] ? cpuidle_enter+0xf/0x20 > [95579.986850] ? call_cpuidle+0x1c/0x40 > [95579.986878] ? do_idle+0x1e6/0x220 > [95579.986906] ? cpu_startup_entry+0x25/0x40 > [95579.986934] ? start_secondary+0x1a5/0x220 > [95579.986963] ? startup_32_smp+0x15f/0x164 > [95579.986991] ---[ end trace 2e8d77bb3c9d2d45 ]--- > > Please let me know if there's anything I can do to help. > Derek > The other changes between 4.20 and 5.0 don't look like they could cause the issue. And two critical ones have been reverted already. So what would need to be done is bisecting the issue. Heiner