From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6E56BC282D7 for ; Wed, 30 Jan 2019 19:16:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 34BFD2087F for ; Wed, 30 Jan 2019 19:16:01 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="QTgnE2Nb" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387480AbfA3TQA (ORCPT ); Wed, 30 Jan 2019 14:16:00 -0500 Received: from mail-wr1-f67.google.com ([209.85.221.67]:40110 "EHLO mail-wr1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727114AbfA3TP7 (ORCPT ); Wed, 30 Jan 2019 14:15:59 -0500 Received: by mail-wr1-f67.google.com with SMTP id p4so692869wrt.7 for ; Wed, 30 Jan 2019 11:15:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=oGHRTk+F1+mdPbL6QKnMIP0VhQmLDPDc1H4MRe0gW4Y=; b=QTgnE2Nb2XfZnNO2SIFuJhBhM6KgVstv/jlL7U/KjgqfCBF9s/PyJdv/B9CFLkaKfk mj3KM+0XMsCfqcI25JAd4B2lOBUhdYoU8RO5S68Wh5vbpRSJkyCmB/FmvbZlP0s7jDbZ zpRndroea0lPBENCdHiIZsKXgyJrXBCIF/xy7an2qAdoP7sz1QZinPwglvJhXcNgVizS lQWjlLaWzi4bFzPwSw3uLMPptOEUmdIKw+L8yz8ydXlZJ5MsJx25rLDUrl5xcx6/6+tG NJkBoRpLsN8S0Bsd62mMGMNCRYSjkF80BerBNGWAxfMXGY5TPTcTo3YJBhWq7Tz+HjZ1 SqLw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=oGHRTk+F1+mdPbL6QKnMIP0VhQmLDPDc1H4MRe0gW4Y=; b=HiaxavKzlVGpDUh773IaKRhI5LWV9fmzVHfr5pRSD4aFLMGTckMGSB2UzCPrr0L19w VTvt3oB+u7JFFdPL9ME98DfZgksisjl2J4aRXLgke6smsUhOi2fieIvh9j6ySMYjnyUY zjSMwZ4ONuB4uCo+Tabob2qD65KsGaLkZ+Iq7ws9ZkQGufuLVAnQexOzH+/c0kZxUx2u R6oP3HQaz+gGZqo7DTUFzMFJfSq/QBlAfZAiRWJmgQO2P3IHSIW40QohCBq4nGP6xMo2 OEfwooCpApbOpt12hieUteydcCsuKj7ac2R1RRci8Q2LZAOeHDcaNrCPWWlZIn6m5Eox 5NjA== X-Gm-Message-State: AJcUukduJPj9lEOCGItTc1V6amqJmksM+t7DkPTs099mZ1AkdOdDrbbn 9EmcVZfwFe6nd5KpeInszUx9/yeG X-Google-Smtp-Source: ALg8bN6cNx14jjEEDb/PQEaY7CCxgEwzcqcpnzMhyMNWsGVSP7WpmxcL8aI+2H6HDNbfMn5XqVE8rQ== X-Received: by 2002:a5d:470b:: with SMTP id y11mr31361957wrq.16.1548875756400; Wed, 30 Jan 2019 11:15:56 -0800 (PST) Received: from ?IPv6:2003:ea:8bf1:e200:bcc8:ce2e:3fd3:c7b4? (p200300EA8BF1E200BCC8CE2E3FD3C7B4.dip0.t-ipconnect.de. [2003:ea:8bf1:e200:bcc8:ce2e:3fd3:c7b4]) by smtp.googlemail.com with ESMTPSA id 141sm4378133wmb.5.2019.01.30.11.15.55 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 30 Jan 2019 11:15:55 -0800 (PST) Subject: Re: r8169 Driver - Poor Network Performance Since Kernel 4.19 To: Peter Ceiley Cc: Realtek linux nic maintainers , netdev@vger.kernel.org References: <172787aa-9ef5-091d-f70f-baf89fe0b1ee@gmail.com> From: Heiner Kallweit Message-ID: Date: Wed, 30 Jan 2019 20:15:45 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Hi Peter, recently I had somebody where pcie_aspm=off for whatever reason didn't do the trick, can you also check with pcie_aspm.policy=performance. And please check with "ethtool -S " whether the chip statistics show a significant number of errors. If this doesn't help you may have to bisect to find the offending commit. Heiner On 30.01.2019 10:59, Peter Ceiley wrote: > Hi Heiner, > > I tried disabling the ASPM using the pcie_aspm=off kernel parameter > and this made no difference. > > I tried compiling the 4.18.16 r8169.c with the 4.19.18 source and > subsequently loaded the module in the running 4.19.18 kernel. I can > confirm that this immediately resolved the issue and access to the NFS > shares operated as expected. > > I presume this means it is an issue with the r8169 driver included in > 4.19 onwards? > > To answer your last questions: > > Base Board Information > Manufacturer: Alienware > Product Name: 0PGRP5 > Version: A02 > > ... and yes, the RTL8168 is the onboard network chip. > > Regards, > > Peter. > > On Tue, 29 Jan 2019 at 17:44, Heiner Kallweit wrote: >> >> Hi Peter, >> >> I think the vendor driver doesn't enable ASPM per default. >> So it's worth a try to disable ASPM in the BIOS or via sysfs. >> Few older systems seem to have issues with ASPM, what kind of >> system / mainboard are you using? The RTL8168 is the onboard >> network chip? >> >> Rgds, Heiner >> >> >> On 29.01.2019 07:20, Peter Ceiley wrote: >>> Hi Heiner, >>> >>> Thanks, I'll do some more testing. It might not be the driver - I >>> assumed it was due to the fact that using the r8168 driver 'resolves' >>> the issue. I'll see if I can test the r8169.c on top of 4.19 - this is >>> a good idea. >>> >>> Cheers, >>> >>> Peter. >>> >>> On Tue, 29 Jan 2019 at 17:16, Heiner Kallweit wrote: >>>> >>>> Hi Peter, >>>> >>>> at a first glance it doesn't look like a typical driver issue. >>>> What you could do: >>>> >>>> - Test the r8169.c from 4.18 on top of 4.19. >>>> >>>> - Check whether disabling ASPM (/sys/module/pcie_aspm) has an effect. >>>> >>>> - Bisect between 4.18 and 4.19 to find the offending commit. >>>> >>>> Any specific reason why you think root cause is in the driver and not >>>> elsewhere in the network subsystem? >>>> >>>> Heiner >>>> >>>> >>>> On 28.01.2019 23:10, Peter Ceiley wrote: >>>>> Hi Heiner, >>>>> >>>>> Thanks for getting back to me. >>>>> >>>>> No, I don't use jumbo packets. >>>>> >>>>> Bandwidth is *generally* good, and iperf results to my NAS provide >>>>> over 900 Mbits/s in both circumstances. The issue seems to appear when >>>>> establishing a connection and is most notable, for example, on my >>>>> mounted NFS shares where it takes seconds (up to 10's of seconds on >>>>> larger directories) to list the contents of each directory. Once a >>>>> transfer begins on a file, I appear to get good bandwidth. >>>>> >>>>> I'm unsure of the best scientific data to provide you in order to >>>>> troubleshoot this issue. Running the following >>>>> >>>>> netstat -s |grep retransmitted >>>>> >>>>> shows a steady increase in retransmitted segments each time I list the >>>>> contents of a remote directory, for example, running 'ls' on a >>>>> directory containing 345 media files did the following using kernel >>>>> 4.19.18: >>>>> >>>>> increased retransmitted segments by 21 and the 'time' command showed >>>>> the following: >>>>> real 0m19.867s >>>>> user 0m0.012s >>>>> sys 0m0.036s >>>>> >>>>> The same command shows no retransmitted segments running kernel >>>>> 4.18.16 and 'time' showed: >>>>> real 0m0.300s >>>>> user 0m0.004s >>>>> sys 0m0.007s >>>>> >>>>> ifconfig does not show any RX/TX errors nor dropped packets in either case. >>>>> >>>>> dmesg XID: >>>>> [ 2.979984] r8169 0000:03:00.0 eth0: RTL8168g/8111g, >>>>> f8:b1:56:fe:67:e0, XID 4c000800, IRQ 32 >>>>> >>>>> # lspci -vv >>>>> 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. >>>>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c) >>>>> Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller >>>>> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- >>>>> ParErr- Stepping- SERR- FastB2B- DisINTx+ >>>>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- >>>>> SERR- >>>> Latency: 0, Cache Line Size: 64 bytes >>>>> Interrupt: pin A routed to IRQ 19 >>>>> Region 0: I/O ports at d000 [size=256] >>>>> Region 2: Memory at f7b00000 (64-bit, non-prefetchable) [size=4K] >>>>> Region 4: Memory at f2100000 (64-bit, prefetchable) [size=16K] >>>>> Capabilities: [40] Power Management version 3 >>>>> Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA >>>>> PME(D0+,D1+,D2+,D3hot+,D3cold+) >>>>> Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- >>>>> Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+ >>>>> Address: 0000000000000000 Data: 0000 >>>>> Capabilities: [70] Express (v2) Endpoint, MSI 01 >>>>> DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s >>>>> <512ns, L1 <64us >>>>> ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- >>>>> SlotPowerLimit 10.000W >>>>> DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq- >>>>> RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- >>>>> MaxPayload 128 bytes, MaxReadReq 4096 bytes >>>>> DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend- >>>>> LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit >>>>> Latency L0s unlimited, L1 <64us >>>>> ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+ >>>>> LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+ >>>>> ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt- >>>>> LnkSta: Speed 2.5GT/s (ok), Width x1 (ok) >>>>> TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- >>>>> DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, >>>>> OBFF Via message/WAKE# >>>>> AtomicOpsCap: 32bit- 64bit- 128bitCAS- >>>>> DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, >>>>> OBFF Disabled >>>>> AtomicOpsCtl: ReqEn- >>>>> LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis- >>>>> Transmit Margin: Normal Operating Range, >>>>> EnterModifiedCompliance- ComplianceSOS- >>>>> Compliance De-emphasis: -6dB >>>>> LnkSta2: Current De-emphasis Level: -6dB, >>>>> EqualizationComplete-, EqualizationPhase1- >>>>> EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- >>>>> Capabilities: [b0] MSI-X: Enable+ Count=4 Masked- >>>>> Vector table: BAR=4 offset=00000000 >>>>> PBA: BAR=4 offset=00000800 >>>>> Capabilities: [d0] Vital Product Data >>>>> pcilib: sysfs_read_vpd: read failed: Input/output error >>>>> Not readable >>>>> Capabilities: [100 v1] Advanced Error Reporting >>>>> UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- >>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- >>>>> UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- >>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- >>>>> UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- >>>>> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- >>>>> CESta: RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ AdvNonFatalErr- >>>>> CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ >>>>> AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- >>>>> ECRCChkCap+ ECRCChkEn- >>>>> MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap- >>>>> HeaderLog: 00000000 00000000 00000000 00000000 >>>>> Capabilities: [140 v1] Virtual Channel >>>>> Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 >>>>> Arb: Fixed- WRR32- WRR64- WRR128- >>>>> Ctrl: ArbSelect=Fixed >>>>> Status: InProgress- >>>>> VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- >>>>> Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256- >>>>> Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01 >>>>> Status: NegoPending- InProgress- >>>>> Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00 >>>>> Capabilities: [170 v1] Latency Tolerance Reporting >>>>> Max snoop latency: 71680ns >>>>> Max no snoop latency: 71680ns >>>>> Kernel driver in use: r8169 >>>>> Kernel modules: r8169 >>>>> >>>>> Please let me know if you have any other ideas in terms of testing. >>>>> >>>>> Thanks! >>>>> >>>>> Peter. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Tue, 29 Jan 2019 at 05:28, Heiner Kallweit wrote: >>>>>> >>>>>> On 28.01.2019 12:13, Peter Ceiley wrote: >>>>>>> Hi, >>>>>>> >>>>>>> I have been experiencing very poor network performance since Kernel >>>>>>> 4.19 and I'm confident it's related to the r8169 driver. >>>>>>> >>>>>>> I have no issue with kernel versions 4.18 and prior. I am experiencing >>>>>>> this issue in kernels 4.19 and 4.20 (currently running/testing with >>>>>>> 4.20.4 & 4.19.18). >>>>>>> >>>>>>> If someone could guide me in the right direction, I'm happy to help >>>>>>> troubleshoot this issue. Note that I have been keeping an eye on one >>>>>>> issue related to loading of the PHY driver, however, my symptoms >>>>>>> differ in that I still have a network connection. I have attempted to >>>>>>> reload the driver on a running system, but this does not improve the >>>>>>> situation. >>>>>>> >>>>>>> Using the proprietary r8168 driver returns my device to proper working order. >>>>>>> >>>>>>> lshw shows: >>>>>>> description: Ethernet interface >>>>>>> product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller >>>>>>> vendor: Realtek Semiconductor Co., Ltd. >>>>>>> physical id: 0 >>>>>>> bus info: pci@0000:03:00.0 >>>>>>> logical name: enp3s0 >>>>>>> version: 0c >>>>>>> serial: >>>>>>> size: 1Gbit/s >>>>>>> capacity: 1Gbit/s >>>>>>> width: 64 bits >>>>>>> clock: 33MHz >>>>>>> capabilities: pm msi pciexpress msix vpd bus_master cap_list >>>>>>> ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd >>>>>>> 1000bt-fd autonegotiation >>>>>>> configuration: autonegotiation=on broadcast=yes driver=r8169 >>>>>>> duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25 >>>>>>> latency=0 link=yes multicast=yes port=MII speed=1Gbit/s >>>>>>> resources: irq:19 ioport:d000(size=256) >>>>>>> memory:f7b00000-f7b00fff memory:f2100000-f2103fff >>>>>>> >>>>>>> Kind Regards, >>>>>>> >>>>>>> Peter. >>>>>>> >>>>>> Hi Peter, >>>>>> >>>>>> the description "poor network performance" is quite vague, therefore: >>>>>> >>>>>> - Can you provide any measurements? >>>>>> - iperf results before and after >>>>>> - statistics about dropped packets (rx and/or tx) >>>>>> - Do you use jumbo packets? >>>>>> >>>>>> Also help would be a "lspci -vv" output for the network card and >>>>>> the dmesg output line with the chip XID. >>>>>> >>>>>> Heiner >>>>> >>>> >>> >> >