From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 025F2C169C4 for ; Tue, 29 Jan 2019 06:16:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id A73E92148E for ; Tue, 29 Jan 2019 06:16:35 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="SZB4VTpn" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726456AbfA2GQe (ORCPT ); Tue, 29 Jan 2019 01:16:34 -0500 Received: from mail-wr1-f67.google.com ([209.85.221.67]:35703 "EHLO mail-wr1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725832AbfA2GQd (ORCPT ); Tue, 29 Jan 2019 01:16:33 -0500 Received: by mail-wr1-f67.google.com with SMTP id 96so20677695wrb.2 for ; Mon, 28 Jan 2019 22:16:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=O8B0QdUHS6/nsmR5U/T+o5EDawhyjUOulTeuWqrTco0=; b=SZB4VTpnJz8wDiA5Pcv0KPSUJZsqgW7agXcF7mQoO1tk3474mPYAUQSVDNCOEEx6yD BJQkRnlvRmAmHT8PULTiRPzWEBZOyeGrLh2B7HYlVA1XmcwIxRv92T7SdMeKJ8/BTrGV 9WG22dFpofM2rs5/Ui52bdvjZFOPQYDLYEXpHc/dixLVQxM7kvlpBTYmpVr9zAfDt55r HNljo8UVNcn93Hzs8aDECWAUq3BMMI890x62op+SAfqDjuxG1EiEHcdA7VJjgvbAKvrI fa0Bls40I/uI1H/Ha8Wy5MsNU7T0Ui9ChptVrgq0fSiKcGeP456xl9ADo5/jjjISf9Lo hmbA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=O8B0QdUHS6/nsmR5U/T+o5EDawhyjUOulTeuWqrTco0=; b=dJH86YzfqkrMlmCwfLHEgd/9OJ2VcxlFay788OSINz0H7jijHmVVxoCVhkp6eTqqSp 4e5jONNqbM6H8cw7gxe5dpRPhi0HozOS9cEx31ayPg7fwhk1aJIKGD2/iMfq8T74iXng J7JJ8KZEPmyvJnic5j4sWim33nC+Z7bEVKvoo2uMm4szdmErfuNPyZWnnvlLApuPbWmt gbalouGJSkABmdFJ3A1NOm/mSnmnbg9gTq0Jatqvk6VUPODZQcU5dIiwwm71osWGz/AU TIRJu214bhe40smvCM4o3ZmjNb0w4+p9C1GZwNTPxWIfrQM+7JHbSuGkGF9B9NdZ2fl4 pbSQ== X-Gm-Message-State: AJcUukfZtjuUYuof1gXdMCUxmadSmzaJuecjbaCI1E0JlIcmHxMzAGlD 1PXUzqIeiMTBk0t+QxiXCnY3xMVD X-Google-Smtp-Source: ALg8bN4VY55cP5hWIXlb02BjWx/ezXMDDtU92uNdomQM+aDFAmQYG2LeLsgJdctlacslCojmPj+aXA== X-Received: by 2002:adf:fc51:: with SMTP id e17mr24050934wrs.268.1548742591102; Mon, 28 Jan 2019 22:16:31 -0800 (PST) Received: from ?IPv6:2003:ea:8bf1:e200:d9b7:9dab:1ca5:5941? (p200300EA8BF1E200D9B79DAB1CA55941.dip0.t-ipconnect.de. [2003:ea:8bf1:e200:d9b7:9dab:1ca5:5941]) by smtp.googlemail.com with ESMTPSA id n5sm115714580wrr.94.2019.01.28.22.16.29 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 28 Jan 2019 22:16:30 -0800 (PST) Subject: Re: r8169 Driver - Poor Network Performance Since Kernel 4.19 To: Peter Ceiley , Realtek linux nic maintainers Cc: netdev@vger.kernel.org References: From: Heiner Kallweit Message-ID: Date: Tue, 29 Jan 2019 07:16:25 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Hi Peter, at a first glance it doesn't look like a typical driver issue. What you could do: - Test the r8169.c from 4.18 on top of 4.19. - Check whether disabling ASPM (/sys/module/pcie_aspm) has an effect. - Bisect between 4.18 and 4.19 to find the offending commit. Any specific reason why you think root cause is in the driver and not elsewhere in the network subsystem? Heiner On 28.01.2019 23:10, Peter Ceiley wrote: > Hi Heiner, > > Thanks for getting back to me. > > No, I don't use jumbo packets. > > Bandwidth is *generally* good, and iperf results to my NAS provide > over 900 Mbits/s in both circumstances. The issue seems to appear when > establishing a connection and is most notable, for example, on my > mounted NFS shares where it takes seconds (up to 10's of seconds on > larger directories) to list the contents of each directory. Once a > transfer begins on a file, I appear to get good bandwidth. > > I'm unsure of the best scientific data to provide you in order to > troubleshoot this issue. Running the following > > netstat -s |grep retransmitted > > shows a steady increase in retransmitted segments each time I list the > contents of a remote directory, for example, running 'ls' on a > directory containing 345 media files did the following using kernel > 4.19.18: > > increased retransmitted segments by 21 and the 'time' command showed > the following: > real 0m19.867s > user 0m0.012s > sys 0m0.036s > > The same command shows no retransmitted segments running kernel > 4.18.16 and 'time' showed: > real 0m0.300s > user 0m0.004s > sys 0m0.007s > > ifconfig does not show any RX/TX errors nor dropped packets in either case. > > dmesg XID: > [ 2.979984] r8169 0000:03:00.0 eth0: RTL8168g/8111g, > f8:b1:56:fe:67:e0, XID 4c000800, IRQ 32 > > # lspci -vv > 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. > RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c) > Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- > ParErr- Stepping- SERR- FastB2B- DisINTx+ > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- > SERR- Latency: 0, Cache Line Size: 64 bytes > Interrupt: pin A routed to IRQ 19 > Region 0: I/O ports at d000 [size=256] > Region 2: Memory at f7b00000 (64-bit, non-prefetchable) [size=4K] > Region 4: Memory at f2100000 (64-bit, prefetchable) [size=16K] > Capabilities: [40] Power Management version 3 > Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA > PME(D0+,D1+,D2+,D3hot+,D3cold+) > Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- > Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+ > Address: 0000000000000000 Data: 0000 > Capabilities: [70] Express (v2) Endpoint, MSI 01 > DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s > <512ns, L1 <64us > ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- > SlotPowerLimit 10.000W > DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq- > RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- > MaxPayload 128 bytes, MaxReadReq 4096 bytes > DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend- > LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit > Latency L0s unlimited, L1 <64us > ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+ > LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+ > ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt- > LnkSta: Speed 2.5GT/s (ok), Width x1 (ok) > TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- > DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, > OBFF Via message/WAKE# > AtomicOpsCap: 32bit- 64bit- 128bitCAS- > DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, > OBFF Disabled > AtomicOpsCtl: ReqEn- > LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis- > Transmit Margin: Normal Operating Range, > EnterModifiedCompliance- ComplianceSOS- > Compliance De-emphasis: -6dB > LnkSta2: Current De-emphasis Level: -6dB, > EqualizationComplete-, EqualizationPhase1- > EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- > Capabilities: [b0] MSI-X: Enable+ Count=4 Masked- > Vector table: BAR=4 offset=00000000 > PBA: BAR=4 offset=00000800 > Capabilities: [d0] Vital Product Data > pcilib: sysfs_read_vpd: read failed: Input/output error > Not readable > Capabilities: [100 v1] Advanced Error Reporting > UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- > RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- > UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- > RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- > UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- > RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- > CESta: RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ AdvNonFatalErr- > CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ > AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- > ECRCChkCap+ ECRCChkEn- > MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap- > HeaderLog: 00000000 00000000 00000000 00000000 > Capabilities: [140 v1] Virtual Channel > Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 > Arb: Fixed- WRR32- WRR64- WRR128- > Ctrl: ArbSelect=Fixed > Status: InProgress- > VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- > Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256- > Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01 > Status: NegoPending- InProgress- > Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00 > Capabilities: [170 v1] Latency Tolerance Reporting > Max snoop latency: 71680ns > Max no snoop latency: 71680ns > Kernel driver in use: r8169 > Kernel modules: r8169 > > Please let me know if you have any other ideas in terms of testing. > > Thanks! > > Peter. > > > > > > > > > > On Tue, 29 Jan 2019 at 05:28, Heiner Kallweit wrote: >> >> On 28.01.2019 12:13, Peter Ceiley wrote: >>> Hi, >>> >>> I have been experiencing very poor network performance since Kernel >>> 4.19 and I'm confident it's related to the r8169 driver. >>> >>> I have no issue with kernel versions 4.18 and prior. I am experiencing >>> this issue in kernels 4.19 and 4.20 (currently running/testing with >>> 4.20.4 & 4.19.18). >>> >>> If someone could guide me in the right direction, I'm happy to help >>> troubleshoot this issue. Note that I have been keeping an eye on one >>> issue related to loading of the PHY driver, however, my symptoms >>> differ in that I still have a network connection. I have attempted to >>> reload the driver on a running system, but this does not improve the >>> situation. >>> >>> Using the proprietary r8168 driver returns my device to proper working order. >>> >>> lshw shows: >>> description: Ethernet interface >>> product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller >>> vendor: Realtek Semiconductor Co., Ltd. >>> physical id: 0 >>> bus info: pci@0000:03:00.0 >>> logical name: enp3s0 >>> version: 0c >>> serial: >>> size: 1Gbit/s >>> capacity: 1Gbit/s >>> width: 64 bits >>> clock: 33MHz >>> capabilities: pm msi pciexpress msix vpd bus_master cap_list >>> ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd >>> 1000bt-fd autonegotiation >>> configuration: autonegotiation=on broadcast=yes driver=r8169 >>> duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25 >>> latency=0 link=yes multicast=yes port=MII speed=1Gbit/s >>> resources: irq:19 ioport:d000(size=256) >>> memory:f7b00000-f7b00fff memory:f2100000-f2103fff >>> >>> Kind Regards, >>> >>> Peter. >>> >> Hi Peter, >> >> the description "poor network performance" is quite vague, therefore: >> >> - Can you provide any measurements? >> - iperf results before and after >> - statistics about dropped packets (rx and/or tx) >> - Do you use jumbo packets? >> >> Also help would be a "lspci -vv" output for the network card and >> the dmesg output line with the chip XID. >> >> Heiner >