From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.5 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS,USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C3745C169C4 for ; Thu, 31 Jan 2019 07:23:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 8DD912087F for ; Thu, 31 Jan 2019 07:23:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731367AbfAaHXd (ORCPT ); Thu, 31 Jan 2019 02:23:33 -0500 Received: from smtp.nue.novell.com ([195.135.221.5]:52050 "EHLO smtp.nue.novell.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725963AbfAaHXd (ORCPT ); Thu, 31 Jan 2019 02:23:33 -0500 Received: from emea4-mta.ukb.novell.com ([10.120.13.87]) by smtp.nue.novell.com with ESMTP (TLS encrypted); Thu, 31 Jan 2019 08:23:31 +0100 Received: from localhost (nwb-a10-snat.microfocus.com [10.120.13.201]) by emea4-mta.ukb.novell.com with ESMTP (TLS encrypted); Thu, 31 Jan 2019 07:23:10 +0000 Date: Thu, 31 Jan 2019 15:23:06 +0800 From: David Chang To: Heiner Kallweit Cc: Peter Ceiley , Realtek linux nic maintainers , netdev@vger.kernel.org Subject: Re: r8169 Driver - Poor Network Performance Since Kernel 4.19 Message-ID: <20190131072306.GG25745@linux-kyyb.suse> References: <172787aa-9ef5-091d-f70f-baf89fe0b1ee@gmail.com> <20190131023240.GF25745@linux-kyyb.suse> <4d832c16-8830-b746-a818-6026c2e6725c@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4d832c16-8830-b746-a818-6026c2e6725c@gmail.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Hi Heiner, On Jan 31, 2019 at 07:35:30 +0100, Heiner Kallweit wrote: > Hi David, two more things: > > 1. Could you please test a recent linux-next kernel? > 2. Please get a register dump (ethtool -d ) from 4.18 and 4.19 > and compare them. I'm sorry that I do not have the issue machine handy. I would ask our user to do the test. Thanks! Regards, David > > Heiner > > > On 31.01.2019 07:21, Heiner Kallweit wrote: > > David, thanks for the link to the bug ticket. > > I think only a proper bisect can help to find the offending commit. > > > > Heiner > > > > > > On 31.01.2019 03:32, David Chang wrote: > >> Hi, > >> > >> We had a similr case here. > >> - Realtek r8169 receive performance regression in kernel 4.19 > >> https://bugzilla.suse.com/show_bug.cgi?id=1119649 > >> > >> kernel: r8169 0000:01:00.0 eth0: RTL8168h/8111h, XID 54100880 > >> The major symptom is there are many rx_missed count. > >> > >> > >> On Jan 30, 2019 at 20:15:45 +0100, Heiner Kallweit wrote: > >>> Hi Peter, > >>> > >>> recently I had somebody where pcie_aspm=off for whatever reason didn't > >>> do the trick, can you also check with pcie_aspm.policy=performance. > >> > >> We will give it a try later. > >> > >>> And please check with "ethtool -S " whether the chip statistics > >>> show a significant number of errors. > >>> > >>> If this doesn't help you may have to bisect to find the offending commit. > >> > >> We had tried fallback driver to a few previous commits as following, > >> but with no luck. > >> > >> 9675931e6b65 r8169: re-enable MSI-X on RTL8168g (v4.19) > >> 098b01ad9837 r8169: don't include asm headers directly (v4.19-rc1) > >> a2965f12fde6 r8169: remove rtl8169_set_speed_xmii (v4.19-rc1) > >> 6fcf9b1d4d6c r8169: fix runtime suspend (v4.19-rc1) > >> e397286b8e89 r8169: remove TBI 1000BaseX support (v4.19-rc1) > >> > >> Thanks, > >> David Chang > >> > >>> > >>> Heiner > >>> > >>> > >>> On 30.01.2019 10:59, Peter Ceiley wrote: > >>>> Hi Heiner, > >>>> > >>>> I tried disabling the ASPM using the pcie_aspm=off kernel parameter > >>>> and this made no difference. > >>>> > >>>> I tried compiling the 4.18.16 r8169.c with the 4.19.18 source and > >>>> subsequently loaded the module in the running 4.19.18 kernel. I can > >>>> confirm that this immediately resolved the issue and access to the NFS > >>>> shares operated as expected. > >>>> > >>>> I presume this means it is an issue with the r8169 driver included in > >>>> 4.19 onwards? > >>>> > >>>> To answer your last questions: > >>>> > >>>> Base Board Information > >>>> Manufacturer: Alienware > >>>> Product Name: 0PGRP5 > >>>> Version: A02 > >>>> > >>>> ... and yes, the RTL8168 is the onboard network chip. > >>>> > >>>> Regards, > >>>> > >>>> Peter. > >>>> > >>>> On Tue, 29 Jan 2019 at 17:44, Heiner Kallweit wrote: > >>>>> > >>>>> Hi Peter, > >>>>> > >>>>> I think the vendor driver doesn't enable ASPM per default. > >>>>> So it's worth a try to disable ASPM in the BIOS or via sysfs. > >>>>> Few older systems seem to have issues with ASPM, what kind of > >>>>> system / mainboard are you using? The RTL8168 is the onboard > >>>>> network chip? > >>>>> > >>>>> Rgds, Heiner > >>>>> > >>>>> > >>>>> On 29.01.2019 07:20, Peter Ceiley wrote: > >>>>>> Hi Heiner, > >>>>>> > >>>>>> Thanks, I'll do some more testing. It might not be the driver - I > >>>>>> assumed it was due to the fact that using the r8168 driver 'resolves' > >>>>>> the issue. I'll see if I can test the r8169.c on top of 4.19 - this is > >>>>>> a good idea. > >>>>>> > >>>>>> Cheers, > >>>>>> > >>>>>> Peter. > >>>>>> > >>>>>> On Tue, 29 Jan 2019 at 17:16, Heiner Kallweit wrote: > >>>>>>> > >>>>>>> Hi Peter, > >>>>>>> > >>>>>>> at a first glance it doesn't look like a typical driver issue. > >>>>>>> What you could do: > >>>>>>> > >>>>>>> - Test the r8169.c from 4.18 on top of 4.19. > >>>>>>> > >>>>>>> - Check whether disabling ASPM (/sys/module/pcie_aspm) has an effect. > >>>>>>> > >>>>>>> - Bisect between 4.18 and 4.19 to find the offending commit. > >>>>>>> > >>>>>>> Any specific reason why you think root cause is in the driver and not > >>>>>>> elsewhere in the network subsystem? > >>>>>>> > >>>>>>> Heiner > >>>>>>> > >>>>>>> > >>>>>>> On 28.01.2019 23:10, Peter Ceiley wrote: > >>>>>>>> Hi Heiner, > >>>>>>>> > >>>>>>>> Thanks for getting back to me. > >>>>>>>> > >>>>>>>> No, I don't use jumbo packets. > >>>>>>>> > >>>>>>>> Bandwidth is *generally* good, and iperf results to my NAS provide > >>>>>>>> over 900 Mbits/s in both circumstances. The issue seems to appear when > >>>>>>>> establishing a connection and is most notable, for example, on my > >>>>>>>> mounted NFS shares where it takes seconds (up to 10's of seconds on > >>>>>>>> larger directories) to list the contents of each directory. Once a > >>>>>>>> transfer begins on a file, I appear to get good bandwidth. > >>>>>>>> > >>>>>>>> I'm unsure of the best scientific data to provide you in order to > >>>>>>>> troubleshoot this issue. Running the following > >>>>>>>> > >>>>>>>> netstat -s |grep retransmitted > >>>>>>>> > >>>>>>>> shows a steady increase in retransmitted segments each time I list the > >>>>>>>> contents of a remote directory, for example, running 'ls' on a > >>>>>>>> directory containing 345 media files did the following using kernel > >>>>>>>> 4.19.18: > >>>>>>>> > >>>>>>>> increased retransmitted segments by 21 and the 'time' command showed > >>>>>>>> the following: > >>>>>>>> real 0m19.867s > >>>>>>>> user 0m0.012s > >>>>>>>> sys 0m0.036s > >>>>>>>> > >>>>>>>> The same command shows no retransmitted segments running kernel > >>>>>>>> 4.18.16 and 'time' showed: > >>>>>>>> real 0m0.300s > >>>>>>>> user 0m0.004s > >>>>>>>> sys 0m0.007s > >>>>>>>> > >>>>>>>> ifconfig does not show any RX/TX errors nor dropped packets in either case. > >>>>>>>> > >>>>>>>> dmesg XID: > >>>>>>>> [ 2.979984] r8169 0000:03:00.0 eth0: RTL8168g/8111g, > >>>>>>>> f8:b1:56:fe:67:e0, XID 4c000800, IRQ 32 > >>>>>>>> > >>>>>>>> # lspci -vv > >>>>>>>> 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. > >>>>>>>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c) > >>>>>>>> Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller > >>>>>>>> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- > >>>>>>>> ParErr- Stepping- SERR- FastB2B- DisINTx+ > >>>>>>>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- > >>>>>>>> SERR- >>>>>>>> Latency: 0, Cache Line Size: 64 bytes > >>>>>>>> Interrupt: pin A routed to IRQ 19 > >>>>>>>> Region 0: I/O ports at d000 [size=256] > >>>>>>>> Region 2: Memory at f7b00000 (64-bit, non-prefetchable) [size=4K] > >>>>>>>> Region 4: Memory at f2100000 (64-bit, prefetchable) [size=16K] > >>>>>>>> Capabilities: [40] Power Management version 3 > >>>>>>>> Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA > >>>>>>>> PME(D0+,D1+,D2+,D3hot+,D3cold+) > >>>>>>>> Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- > >>>>>>>> Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+ > >>>>>>>> Address: 0000000000000000 Data: 0000 > >>>>>>>> Capabilities: [70] Express (v2) Endpoint, MSI 01 > >>>>>>>> DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s > >>>>>>>> <512ns, L1 <64us > >>>>>>>> ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- > >>>>>>>> SlotPowerLimit 10.000W > >>>>>>>> DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq- > >>>>>>>> RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- > >>>>>>>> MaxPayload 128 bytes, MaxReadReq 4096 bytes > >>>>>>>> DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend- > >>>>>>>> LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit > >>>>>>>> Latency L0s unlimited, L1 <64us > >>>>>>>> ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+ > >>>>>>>> LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+ > >>>>>>>> ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt- > >>>>>>>> LnkSta: Speed 2.5GT/s (ok), Width x1 (ok) > >>>>>>>> TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- > >>>>>>>> DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, > >>>>>>>> OBFF Via message/WAKE# > >>>>>>>> AtomicOpsCap: 32bit- 64bit- 128bitCAS- > >>>>>>>> DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, > >>>>>>>> OBFF Disabled > >>>>>>>> AtomicOpsCtl: ReqEn- > >>>>>>>> LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis- > >>>>>>>> Transmit Margin: Normal Operating Range, > >>>>>>>> EnterModifiedCompliance- ComplianceSOS- > >>>>>>>> Compliance De-emphasis: -6dB > >>>>>>>> LnkSta2: Current De-emphasis Level: -6dB, > >>>>>>>> EqualizationComplete-, EqualizationPhase1- > >>>>>>>> EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- > >>>>>>>> Capabilities: [b0] MSI-X: Enable+ Count=4 Masked- > >>>>>>>> Vector table: BAR=4 offset=00000000 > >>>>>>>> PBA: BAR=4 offset=00000800 > >>>>>>>> Capabilities: [d0] Vital Product Data > >>>>>>>> pcilib: sysfs_read_vpd: read failed: Input/output error > >>>>>>>> Not readable > >>>>>>>> Capabilities: [100 v1] Advanced Error Reporting > >>>>>>>> UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- > >>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- > >>>>>>>> UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- > >>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- > >>>>>>>> UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- > >>>>>>>> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- > >>>>>>>> CESta: RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ AdvNonFatalErr- > >>>>>>>> CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ > >>>>>>>> AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- > >>>>>>>> ECRCChkCap+ ECRCChkEn- > >>>>>>>> MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap- > >>>>>>>> HeaderLog: 00000000 00000000 00000000 00000000 > >>>>>>>> Capabilities: [140 v1] Virtual Channel > >>>>>>>> Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 > >>>>>>>> Arb: Fixed- WRR32- WRR64- WRR128- > >>>>>>>> Ctrl: ArbSelect=Fixed > >>>>>>>> Status: InProgress- > >>>>>>>> VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- > >>>>>>>> Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256- > >>>>>>>> Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01 > >>>>>>>> Status: NegoPending- InProgress- > >>>>>>>> Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00 > >>>>>>>> Capabilities: [170 v1] Latency Tolerance Reporting > >>>>>>>> Max snoop latency: 71680ns > >>>>>>>> Max no snoop latency: 71680ns > >>>>>>>> Kernel driver in use: r8169 > >>>>>>>> Kernel modules: r8169 > >>>>>>>> > >>>>>>>> Please let me know if you have any other ideas in terms of testing. > >>>>>>>> > >>>>>>>> Thanks! > >>>>>>>> > >>>>>>>> Peter. > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> On Tue, 29 Jan 2019 at 05:28, Heiner Kallweit wrote: > >>>>>>>>> > >>>>>>>>> On 28.01.2019 12:13, Peter Ceiley wrote: > >>>>>>>>>> Hi, > >>>>>>>>>> > >>>>>>>>>> I have been experiencing very poor network performance since Kernel > >>>>>>>>>> 4.19 and I'm confident it's related to the r8169 driver. > >>>>>>>>>> > >>>>>>>>>> I have no issue with kernel versions 4.18 and prior. I am experiencing > >>>>>>>>>> this issue in kernels 4.19 and 4.20 (currently running/testing with > >>>>>>>>>> 4.20.4 & 4.19.18). > >>>>>>>>>> > >>>>>>>>>> If someone could guide me in the right direction, I'm happy to help > >>>>>>>>>> troubleshoot this issue. Note that I have been keeping an eye on one > >>>>>>>>>> issue related to loading of the PHY driver, however, my symptoms > >>>>>>>>>> differ in that I still have a network connection. I have attempted to > >>>>>>>>>> reload the driver on a running system, but this does not improve the > >>>>>>>>>> situation. > >>>>>>>>>> > >>>>>>>>>> Using the proprietary r8168 driver returns my device to proper working order. > >>>>>>>>>> > >>>>>>>>>> lshw shows: > >>>>>>>>>> description: Ethernet interface > >>>>>>>>>> product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller > >>>>>>>>>> vendor: Realtek Semiconductor Co., Ltd. > >>>>>>>>>> physical id: 0 > >>>>>>>>>> bus info: pci@0000:03:00.0 > >>>>>>>>>> logical name: enp3s0 > >>>>>>>>>> version: 0c > >>>>>>>>>> serial: > >>>>>>>>>> size: 1Gbit/s > >>>>>>>>>> capacity: 1Gbit/s > >>>>>>>>>> width: 64 bits > >>>>>>>>>> clock: 33MHz > >>>>>>>>>> capabilities: pm msi pciexpress msix vpd bus_master cap_list > >>>>>>>>>> ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd > >>>>>>>>>> 1000bt-fd autonegotiation > >>>>>>>>>> configuration: autonegotiation=on broadcast=yes driver=r8169 > >>>>>>>>>> duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25 > >>>>>>>>>> latency=0 link=yes multicast=yes port=MII speed=1Gbit/s > >>>>>>>>>> resources: irq:19 ioport:d000(size=256) > >>>>>>>>>> memory:f7b00000-f7b00fff memory:f2100000-f2103fff > >>>>>>>>>> > >>>>>>>>>> Kind Regards, > >>>>>>>>>> > >>>>>>>>>> Peter. > >>>>>>>>>> > >>>>>>>>> Hi Peter, > >>>>>>>>> > >>>>>>>>> the description "poor network performance" is quite vague, therefore: > >>>>>>>>> > >>>>>>>>> - Can you provide any measurements? > >>>>>>>>> - iperf results before and after > >>>>>>>>> - statistics about dropped packets (rx and/or tx) > >>>>>>>>> - Do you use jumbo packets? > >>>>>>>>> > >>>>>>>>> Also help would be a "lspci -vv" output for the network card and > >>>>>>>>> the dmesg output line with the chip XID. > >>>>>>>>> > >>>>>>>>> Heiner > >>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>> > >>> > >> > > > >