From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.1 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 08724C282CB for ; Tue, 5 Feb 2019 18:54:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id BE4DA217F9 for ; Tue, 5 Feb 2019 18:54:05 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Zq/MrVOU" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729110AbfBESyE (ORCPT ); Tue, 5 Feb 2019 13:54:04 -0500 Received: from mail-wr1-f67.google.com ([209.85.221.67]:41445 "EHLO mail-wr1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727650AbfBESyE (ORCPT ); Tue, 5 Feb 2019 13:54:04 -0500 Received: by mail-wr1-f67.google.com with SMTP id x10so4823022wrs.8 for ; Tue, 05 Feb 2019 10:54:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:from:to:cc:references:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=3bQSYDrbP0OI2mdg2iReHu8LF8UrEHLO14HMxPcu1+w=; b=Zq/MrVOUFtK1PZrgJqzJ6ct3gBTeEilAB46aOYfyMuWzidQQE53Iu1TxVE5nKs5ov3 MOs/h3wYeOC6q3sMj5X26NOI0F4lwlNQAz53pQbGRgMmqrGDdnwSyUlR7VErEob5q1TO vuciuUJMYzWyixT4DluibcIdA6dsaWKjKiLS+F6dWE4PXpxOTrOHUUJTzlb87PC/0VO3 il8oC3zQed++AczKVrvIlmVs2Kopq/AbBfUnI+R1cjspuzL0Y2ED7hQ+F1IXyfr7xw9c yL4hIrKSdav5IFsqcJUohHVaY0Y9iLxh7ol1J6mJA7hZzBMs46SruCXsirNnqzbptFbE iZhw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:references:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=3bQSYDrbP0OI2mdg2iReHu8LF8UrEHLO14HMxPcu1+w=; b=aG0r6LeMmjExd+DijO7wU3EFyZPxvg6mCvcHXFZC5QY36WKPTJZB4JB8h9nH3pekMv SZbELlkguTsPjCWQnKRkYH2D53G5ZyvbNaNrkHKZZPg5sq19lbWsIVSJIrqF1TsnsfAY atVsaWOQmTn7Rr2QdOhouDiisoHPWJMseSPzah57cZ8IbFagrtQQv59Mo/M41RIHHUGw io9nn2eGce/rwIc4lnKZWSrivl7F7+fB8JMGg90rGasMU1a+VREMkZ6GUx/yBGyCJfiG TmURR/wcIgZoOy008nJ74cqS9sIHDyQsoA6980zCp7UNYLxcfjffERVpBCJ5CQjngity oZgA== X-Gm-Message-State: AHQUAuZV/HcGf838Pn5CX9hnZm34+iz3J71K/oT2u9VPiqNJJJNp4CNG vZuOqm2Oos0mxtwnOJ2bPU+16uEg X-Google-Smtp-Source: AHgI3IZM8GLpBzZrPAgkenvpb9+zPR7euTAuC82ucjtvyyC6OaGkt6XXTgr2IX+Iqa1pMEUgOlAU9Q== X-Received: by 2002:adf:fe11:: with SMTP id n17mr4628966wrr.329.1549392841366; Tue, 05 Feb 2019 10:54:01 -0800 (PST) Received: from ?IPv6:2003:ea:8bf1:e200:759b:8dd5:b995:6b39? (p200300EA8BF1E200759B8DD5B9956B39.dip0.t-ipconnect.de. [2003:ea:8bf1:e200:759b:8dd5:b995:6b39]) by smtp.googlemail.com with ESMTPSA id w23sm5564948wmc.38.2019.02.05.10.54.00 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 05 Feb 2019 10:54:00 -0800 (PST) Subject: Re: r8169 Driver - Poor Network Performance Since Kernel 4.19 From: Heiner Kallweit To: David Chang Cc: Realtek linux nic maintainers , netdev@vger.kernel.org References: <172787aa-9ef5-091d-f70f-baf89fe0b1ee@gmail.com> <20190131023240.GF25745@linux-kyyb.suse> <856b3a75-5daf-6ce8-7fa3-0405e3cefe97@gmail.com> Message-ID: Date: Tue, 5 Feb 2019 19:53:54 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.5.0 MIME-Version: 1.0 In-Reply-To: <856b3a75-5daf-6ce8-7fa3-0405e3cefe97@gmail.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org By the way: I can't reproduce the issue on a RTL8168g. So it doesn't seem to be an issue with generic code in the driver. I would assume it's some kind of incompatibility between activated chip settings (ASPM etc) and certain systems. Heiner On 05.02.2019 19:50, Heiner Kallweit wrote: > Hi David, > > meanwhile there's the following bug report matching what reported. > It's even the same chip version (RTL8168h). > https://bugzilla.redhat.com/show_bug.cgi?id=1671958 > > Symptom there is also a significant number of rx_missed packets. > Could you try what I mentioned there last: > Try building a kernel with the call to rtl_hw_aspm_clkreq_enable(tp, true) at the > end of rtl_hw_start_8168h_1() being disabled. > > Heiner > > > On 31.01.2019 03:32, David Chang wrote: >> Hi, >> >> We had a similr case here. >> - Realtek r8169 receive performance regression in kernel 4.19 >> https://bugzilla.suse.com/show_bug.cgi?id=1119649 >> >> kernel: r8169 0000:01:00.0 eth0: RTL8168h/8111h, XID 54100880 >> The major symptom is there are many rx_missed count. >> >> >> On Jan 30, 2019 at 20:15:45 +0100, Heiner Kallweit wrote: >>> Hi Peter, >>> >>> recently I had somebody where pcie_aspm=off for whatever reason didn't >>> do the trick, can you also check with pcie_aspm.policy=performance. >> >> We will give it a try later. >> >>> And please check with "ethtool -S " whether the chip statistics >>> show a significant number of errors. >>> >>> If this doesn't help you may have to bisect to find the offending commit. >> >> We had tried fallback driver to a few previous commits as following, >> but with no luck. >> >> 9675931e6b65 r8169: re-enable MSI-X on RTL8168g (v4.19) >> 098b01ad9837 r8169: don't include asm headers directly (v4.19-rc1) >> a2965f12fde6 r8169: remove rtl8169_set_speed_xmii (v4.19-rc1) >> 6fcf9b1d4d6c r8169: fix runtime suspend (v4.19-rc1) >> e397286b8e89 r8169: remove TBI 1000BaseX support (v4.19-rc1) >> >> Thanks, >> David Chang >> >>> >>> Heiner >>> >>> >>> On 30.01.2019 10:59, Peter Ceiley wrote: >>>> Hi Heiner, >>>> >>>> I tried disabling the ASPM using the pcie_aspm=off kernel parameter >>>> and this made no difference. >>>> >>>> I tried compiling the 4.18.16 r8169.c with the 4.19.18 source and >>>> subsequently loaded the module in the running 4.19.18 kernel. I can >>>> confirm that this immediately resolved the issue and access to the NFS >>>> shares operated as expected. >>>> >>>> I presume this means it is an issue with the r8169 driver included in >>>> 4.19 onwards? >>>> >>>> To answer your last questions: >>>> >>>> Base Board Information >>>> Manufacturer: Alienware >>>> Product Name: 0PGRP5 >>>> Version: A02 >>>> >>>> ... and yes, the RTL8168 is the onboard network chip. >>>> >>>> Regards, >>>> >>>> Peter. >>>> >>>> On Tue, 29 Jan 2019 at 17:44, Heiner Kallweit wrote: >>>>> >>>>> Hi Peter, >>>>> >>>>> I think the vendor driver doesn't enable ASPM per default. >>>>> So it's worth a try to disable ASPM in the BIOS or via sysfs. >>>>> Few older systems seem to have issues with ASPM, what kind of >>>>> system / mainboard are you using? The RTL8168 is the onboard >>>>> network chip? >>>>> >>>>> Rgds, Heiner >>>>> >>>>> >>>>> On 29.01.2019 07:20, Peter Ceiley wrote: >>>>>> Hi Heiner, >>>>>> >>>>>> Thanks, I'll do some more testing. It might not be the driver - I >>>>>> assumed it was due to the fact that using the r8168 driver 'resolves' >>>>>> the issue. I'll see if I can test the r8169.c on top of 4.19 - this is >>>>>> a good idea. >>>>>> >>>>>> Cheers, >>>>>> >>>>>> Peter. >>>>>> >>>>>> On Tue, 29 Jan 2019 at 17:16, Heiner Kallweit wrote: >>>>>>> >>>>>>> Hi Peter, >>>>>>> >>>>>>> at a first glance it doesn't look like a typical driver issue. >>>>>>> What you could do: >>>>>>> >>>>>>> - Test the r8169.c from 4.18 on top of 4.19. >>>>>>> >>>>>>> - Check whether disabling ASPM (/sys/module/pcie_aspm) has an effect. >>>>>>> >>>>>>> - Bisect between 4.18 and 4.19 to find the offending commit. >>>>>>> >>>>>>> Any specific reason why you think root cause is in the driver and not >>>>>>> elsewhere in the network subsystem? >>>>>>> >>>>>>> Heiner >>>>>>> >>>>>>> >>>>>>> On 28.01.2019 23:10, Peter Ceiley wrote: >>>>>>>> Hi Heiner, >>>>>>>> >>>>>>>> Thanks for getting back to me. >>>>>>>> >>>>>>>> No, I don't use jumbo packets. >>>>>>>> >>>>>>>> Bandwidth is *generally* good, and iperf results to my NAS provide >>>>>>>> over 900 Mbits/s in both circumstances. The issue seems to appear when >>>>>>>> establishing a connection and is most notable, for example, on my >>>>>>>> mounted NFS shares where it takes seconds (up to 10's of seconds on >>>>>>>> larger directories) to list the contents of each directory. Once a >>>>>>>> transfer begins on a file, I appear to get good bandwidth. >>>>>>>> >>>>>>>> I'm unsure of the best scientific data to provide you in order to >>>>>>>> troubleshoot this issue. Running the following >>>>>>>> >>>>>>>> netstat -s |grep retransmitted >>>>>>>> >>>>>>>> shows a steady increase in retransmitted segments each time I list the >>>>>>>> contents of a remote directory, for example, running 'ls' on a >>>>>>>> directory containing 345 media files did the following using kernel >>>>>>>> 4.19.18: >>>>>>>> >>>>>>>> increased retransmitted segments by 21 and the 'time' command showed >>>>>>>> the following: >>>>>>>> real 0m19.867s >>>>>>>> user 0m0.012s >>>>>>>> sys 0m0.036s >>>>>>>> >>>>>>>> The same command shows no retransmitted segments running kernel >>>>>>>> 4.18.16 and 'time' showed: >>>>>>>> real 0m0.300s >>>>>>>> user 0m0.004s >>>>>>>> sys 0m0.007s >>>>>>>> >>>>>>>> ifconfig does not show any RX/TX errors nor dropped packets in either case. >>>>>>>> >>>>>>>> dmesg XID: >>>>>>>> [ 2.979984] r8169 0000:03:00.0 eth0: RTL8168g/8111g, >>>>>>>> f8:b1:56:fe:67:e0, XID 4c000800, IRQ 32 >>>>>>>> >>>>>>>> # lspci -vv >>>>>>>> 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. >>>>>>>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c) >>>>>>>> Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller >>>>>>>> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- >>>>>>>> ParErr- Stepping- SERR- FastB2B- DisINTx+ >>>>>>>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- >>>>>>>> SERR- >>>>>>> Latency: 0, Cache Line Size: 64 bytes >>>>>>>> Interrupt: pin A routed to IRQ 19 >>>>>>>> Region 0: I/O ports at d000 [size=256] >>>>>>>> Region 2: Memory at f7b00000 (64-bit, non-prefetchable) [size=4K] >>>>>>>> Region 4: Memory at f2100000 (64-bit, prefetchable) [size=16K] >>>>>>>> Capabilities: [40] Power Management version 3 >>>>>>>> Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA >>>>>>>> PME(D0+,D1+,D2+,D3hot+,D3cold+) >>>>>>>> Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- >>>>>>>> Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+ >>>>>>>> Address: 0000000000000000 Data: 0000 >>>>>>>> Capabilities: [70] Express (v2) Endpoint, MSI 01 >>>>>>>> DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s >>>>>>>> <512ns, L1 <64us >>>>>>>> ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- >>>>>>>> SlotPowerLimit 10.000W >>>>>>>> DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq- >>>>>>>> RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- >>>>>>>> MaxPayload 128 bytes, MaxReadReq 4096 bytes >>>>>>>> DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend- >>>>>>>> LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit >>>>>>>> Latency L0s unlimited, L1 <64us >>>>>>>> ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+ >>>>>>>> LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+ >>>>>>>> ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt- >>>>>>>> LnkSta: Speed 2.5GT/s (ok), Width x1 (ok) >>>>>>>> TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- >>>>>>>> DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, >>>>>>>> OBFF Via message/WAKE# >>>>>>>> AtomicOpsCap: 32bit- 64bit- 128bitCAS- >>>>>>>> DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, >>>>>>>> OBFF Disabled >>>>>>>> AtomicOpsCtl: ReqEn- >>>>>>>> LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis- >>>>>>>> Transmit Margin: Normal Operating Range, >>>>>>>> EnterModifiedCompliance- ComplianceSOS- >>>>>>>> Compliance De-emphasis: -6dB >>>>>>>> LnkSta2: Current De-emphasis Level: -6dB, >>>>>>>> EqualizationComplete-, EqualizationPhase1- >>>>>>>> EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- >>>>>>>> Capabilities: [b0] MSI-X: Enable+ Count=4 Masked- >>>>>>>> Vector table: BAR=4 offset=00000000 >>>>>>>> PBA: BAR=4 offset=00000800 >>>>>>>> Capabilities: [d0] Vital Product Data >>>>>>>> pcilib: sysfs_read_vpd: read failed: Input/output error >>>>>>>> Not readable >>>>>>>> Capabilities: [100 v1] Advanced Error Reporting >>>>>>>> UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- >>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- >>>>>>>> UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- >>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- >>>>>>>> UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- >>>>>>>> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- >>>>>>>> CESta: RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ AdvNonFatalErr- >>>>>>>> CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ >>>>>>>> AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- >>>>>>>> ECRCChkCap+ ECRCChkEn- >>>>>>>> MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap- >>>>>>>> HeaderLog: 00000000 00000000 00000000 00000000 >>>>>>>> Capabilities: [140 v1] Virtual Channel >>>>>>>> Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 >>>>>>>> Arb: Fixed- WRR32- WRR64- WRR128- >>>>>>>> Ctrl: ArbSelect=Fixed >>>>>>>> Status: InProgress- >>>>>>>> VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- >>>>>>>> Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256- >>>>>>>> Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01 >>>>>>>> Status: NegoPending- InProgress- >>>>>>>> Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00 >>>>>>>> Capabilities: [170 v1] Latency Tolerance Reporting >>>>>>>> Max snoop latency: 71680ns >>>>>>>> Max no snoop latency: 71680ns >>>>>>>> Kernel driver in use: r8169 >>>>>>>> Kernel modules: r8169 >>>>>>>> >>>>>>>> Please let me know if you have any other ideas in terms of testing. >>>>>>>> >>>>>>>> Thanks! >>>>>>>> >>>>>>>> Peter. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Tue, 29 Jan 2019 at 05:28, Heiner Kallweit wrote: >>>>>>>>> >>>>>>>>> On 28.01.2019 12:13, Peter Ceiley wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I have been experiencing very poor network performance since Kernel >>>>>>>>>> 4.19 and I'm confident it's related to the r8169 driver. >>>>>>>>>> >>>>>>>>>> I have no issue with kernel versions 4.18 and prior. I am experiencing >>>>>>>>>> this issue in kernels 4.19 and 4.20 (currently running/testing with >>>>>>>>>> 4.20.4 & 4.19.18). >>>>>>>>>> >>>>>>>>>> If someone could guide me in the right direction, I'm happy to help >>>>>>>>>> troubleshoot this issue. Note that I have been keeping an eye on one >>>>>>>>>> issue related to loading of the PHY driver, however, my symptoms >>>>>>>>>> differ in that I still have a network connection. I have attempted to >>>>>>>>>> reload the driver on a running system, but this does not improve the >>>>>>>>>> situation. >>>>>>>>>> >>>>>>>>>> Using the proprietary r8168 driver returns my device to proper working order. >>>>>>>>>> >>>>>>>>>> lshw shows: >>>>>>>>>> description: Ethernet interface >>>>>>>>>> product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller >>>>>>>>>> vendor: Realtek Semiconductor Co., Ltd. >>>>>>>>>> physical id: 0 >>>>>>>>>> bus info: pci@0000:03:00.0 >>>>>>>>>> logical name: enp3s0 >>>>>>>>>> version: 0c >>>>>>>>>> serial: >>>>>>>>>> size: 1Gbit/s >>>>>>>>>> capacity: 1Gbit/s >>>>>>>>>> width: 64 bits >>>>>>>>>> clock: 33MHz >>>>>>>>>> capabilities: pm msi pciexpress msix vpd bus_master cap_list >>>>>>>>>> ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd >>>>>>>>>> 1000bt-fd autonegotiation >>>>>>>>>> configuration: autonegotiation=on broadcast=yes driver=r8169 >>>>>>>>>> duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25 >>>>>>>>>> latency=0 link=yes multicast=yes port=MII speed=1Gbit/s >>>>>>>>>> resources: irq:19 ioport:d000(size=256) >>>>>>>>>> memory:f7b00000-f7b00fff memory:f2100000-f2103fff >>>>>>>>>> >>>>>>>>>> Kind Regards, >>>>>>>>>> >>>>>>>>>> Peter. >>>>>>>>>> >>>>>>>>> Hi Peter, >>>>>>>>> >>>>>>>>> the description "poor network performance" is quite vague, therefore: >>>>>>>>> >>>>>>>>> - Can you provide any measurements? >>>>>>>>> - iperf results before and after >>>>>>>>> - statistics about dropped packets (rx and/or tx) >>>>>>>>> - Do you use jumbo packets? >>>>>>>>> >>>>>>>>> Also help would be a "lspci -vv" output for the network card and >>>>>>>>> the dmesg output line with the chip XID. >>>>>>>>> >>>>>>>>> Heiner >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >