* r8169 Driver - Poor Network Performance Since Kernel 4.19
@ 2019-01-28 11:13 Peter Ceiley
2019-01-28 18:28 ` Heiner Kallweit
0 siblings, 1 reply; 25+ messages in thread
From: Peter Ceiley @ 2019-01-28 11:13 UTC (permalink / raw)
To: Realtek linux nic maintainers, Heiner Kallweit; +Cc: netdev
Hi,
I have been experiencing very poor network performance since Kernel
4.19 and I'm confident it's related to the r8169 driver.
I have no issue with kernel versions 4.18 and prior. I am experiencing
this issue in kernels 4.19 and 4.20 (currently running/testing with
4.20.4 & 4.19.18).
If someone could guide me in the right direction, I'm happy to help
troubleshoot this issue. Note that I have been keeping an eye on one
issue related to loading of the PHY driver, however, my symptoms
differ in that I still have a network connection. I have attempted to
reload the driver on a running system, but this does not improve the
situation.
Using the proprietary r8168 driver returns my device to proper working order.
lshw shows:
description: Ethernet interface
product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
vendor: Realtek Semiconductor Co., Ltd.
physical id: 0
bus info: pci@0000:03:00.0
logical name: enp3s0
version: 0c
serial:
size: 1Gbit/s
capacity: 1Gbit/s
width: 64 bits
clock: 33MHz
capabilities: pm msi pciexpress msix vpd bus_master cap_list
ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd
1000bt-fd autonegotiation
configuration: autonegotiation=on broadcast=yes driver=r8169
duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25
latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
resources: irq:19 ioport:d000(size=256)
memory:f7b00000-f7b00fff memory:f2100000-f2103fff
Kind Regards,
Peter.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: r8169 Driver - Poor Network Performance Since Kernel 4.19
2019-01-28 11:13 r8169 Driver - Poor Network Performance Since Kernel 4.19 Peter Ceiley
@ 2019-01-28 18:28 ` Heiner Kallweit
2019-01-28 22:10 ` Peter Ceiley
0 siblings, 1 reply; 25+ messages in thread
From: Heiner Kallweit @ 2019-01-28 18:28 UTC (permalink / raw)
To: Peter Ceiley, Realtek linux nic maintainers; +Cc: netdev
On 28.01.2019 12:13, Peter Ceiley wrote:
> Hi,
>
> I have been experiencing very poor network performance since Kernel
> 4.19 and I'm confident it's related to the r8169 driver.
>
> I have no issue with kernel versions 4.18 and prior. I am experiencing
> this issue in kernels 4.19 and 4.20 (currently running/testing with
> 4.20.4 & 4.19.18).
>
> If someone could guide me in the right direction, I'm happy to help
> troubleshoot this issue. Note that I have been keeping an eye on one
> issue related to loading of the PHY driver, however, my symptoms
> differ in that I still have a network connection. I have attempted to
> reload the driver on a running system, but this does not improve the
> situation.
>
> Using the proprietary r8168 driver returns my device to proper working order.
>
> lshw shows:
> description: Ethernet interface
> product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
> vendor: Realtek Semiconductor Co., Ltd.
> physical id: 0
> bus info: pci@0000:03:00.0
> logical name: enp3s0
> version: 0c
> serial:
> size: 1Gbit/s
> capacity: 1Gbit/s
> width: 64 bits
> clock: 33MHz
> capabilities: pm msi pciexpress msix vpd bus_master cap_list
> ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd
> 1000bt-fd autonegotiation
> configuration: autonegotiation=on broadcast=yes driver=r8169
> duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25
> latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
> resources: irq:19 ioport:d000(size=256)
> memory:f7b00000-f7b00fff memory:f2100000-f2103fff
>
> Kind Regards,
>
> Peter.
>
Hi Peter,
the description "poor network performance" is quite vague, therefore:
- Can you provide any measurements?
- iperf results before and after
- statistics about dropped packets (rx and/or tx)
- Do you use jumbo packets?
Also help would be a "lspci -vv" output for the network card and
the dmesg output line with the chip XID.
Heiner
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: r8169 Driver - Poor Network Performance Since Kernel 4.19
2019-01-28 18:28 ` Heiner Kallweit
@ 2019-01-28 22:10 ` Peter Ceiley
2019-01-29 6:16 ` Heiner Kallweit
0 siblings, 1 reply; 25+ messages in thread
From: Peter Ceiley @ 2019-01-28 22:10 UTC (permalink / raw)
To: Heiner Kallweit, Realtek linux nic maintainers; +Cc: netdev
Hi Heiner,
Thanks for getting back to me.
No, I don't use jumbo packets.
Bandwidth is *generally* good, and iperf results to my NAS provide
over 900 Mbits/s in both circumstances. The issue seems to appear when
establishing a connection and is most notable, for example, on my
mounted NFS shares where it takes seconds (up to 10's of seconds on
larger directories) to list the contents of each directory. Once a
transfer begins on a file, I appear to get good bandwidth.
I'm unsure of the best scientific data to provide you in order to
troubleshoot this issue. Running the following
netstat -s |grep retransmitted
shows a steady increase in retransmitted segments each time I list the
contents of a remote directory, for example, running 'ls' on a
directory containing 345 media files did the following using kernel
4.19.18:
increased retransmitted segments by 21 and the 'time' command showed
the following:
real 0m19.867s
user 0m0.012s
sys 0m0.036s
The same command shows no retransmitted segments running kernel
4.18.16 and 'time' showed:
real 0m0.300s
user 0m0.004s
sys 0m0.007s
ifconfig does not show any RX/TX errors nor dropped packets in either case.
dmesg XID:
[ 2.979984] r8169 0000:03:00.0 eth0: RTL8168g/8111g,
f8:b1:56:fe:67:e0, XID 4c000800, IRQ 32
# lspci -vv
03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 19
Region 0: I/O ports at d000 [size=256]
Region 2: Memory at f7b00000 (64-bit, non-prefetchable) [size=4K]
Region 4: Memory at f2100000 (64-bit, prefetchable) [size=16K]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
PME(D0+,D1+,D2+,D3hot+,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [70] Express (v2) Endpoint, MSI 01
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s
<512ns, L1 <64us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
SlotPowerLimit 10.000W
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 4096 bytes
DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
Latency L0s unlimited, L1 <64us
ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s (ok), Width x1 (ok)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+,
OBFF Via message/WAKE#
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+,
OBFF Disabled
AtomicOpsCtl: ReqEn-
LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range,
EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB,
EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
Vector table: BAR=4 offset=00000000
PBA: BAR=4 offset=00000800
Capabilities: [d0] Vital Product Data
pcilib: sysfs_read_vpd: read failed: Input/output error
Not readable
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ AdvNonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn-
ECRCChkCap+ ECRCChkEn-
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
Capabilities: [140 v1] Virtual Channel
Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
Arb: Fixed- WRR32- WRR64- WRR128-
Ctrl: ArbSelect=Fixed
Status: InProgress-
VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01
Status: NegoPending- InProgress-
Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
Capabilities: [170 v1] Latency Tolerance Reporting
Max snoop latency: 71680ns
Max no snoop latency: 71680ns
Kernel driver in use: r8169
Kernel modules: r8169
Please let me know if you have any other ideas in terms of testing.
Thanks!
Peter.
On Tue, 29 Jan 2019 at 05:28, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>
> On 28.01.2019 12:13, Peter Ceiley wrote:
> > Hi,
> >
> > I have been experiencing very poor network performance since Kernel
> > 4.19 and I'm confident it's related to the r8169 driver.
> >
> > I have no issue with kernel versions 4.18 and prior. I am experiencing
> > this issue in kernels 4.19 and 4.20 (currently running/testing with
> > 4.20.4 & 4.19.18).
> >
> > If someone could guide me in the right direction, I'm happy to help
> > troubleshoot this issue. Note that I have been keeping an eye on one
> > issue related to loading of the PHY driver, however, my symptoms
> > differ in that I still have a network connection. I have attempted to
> > reload the driver on a running system, but this does not improve the
> > situation.
> >
> > Using the proprietary r8168 driver returns my device to proper working order.
> >
> > lshw shows:
> > description: Ethernet interface
> > product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
> > vendor: Realtek Semiconductor Co., Ltd.
> > physical id: 0
> > bus info: pci@0000:03:00.0
> > logical name: enp3s0
> > version: 0c
> > serial:
> > size: 1Gbit/s
> > capacity: 1Gbit/s
> > width: 64 bits
> > clock: 33MHz
> > capabilities: pm msi pciexpress msix vpd bus_master cap_list
> > ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd
> > 1000bt-fd autonegotiation
> > configuration: autonegotiation=on broadcast=yes driver=r8169
> > duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25
> > latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
> > resources: irq:19 ioport:d000(size=256)
> > memory:f7b00000-f7b00fff memory:f2100000-f2103fff
> >
> > Kind Regards,
> >
> > Peter.
> >
> Hi Peter,
>
> the description "poor network performance" is quite vague, therefore:
>
> - Can you provide any measurements?
> - iperf results before and after
> - statistics about dropped packets (rx and/or tx)
> - Do you use jumbo packets?
>
> Also help would be a "lspci -vv" output for the network card and
> the dmesg output line with the chip XID.
>
> Heiner
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: r8169 Driver - Poor Network Performance Since Kernel 4.19
2019-01-28 22:10 ` Peter Ceiley
@ 2019-01-29 6:16 ` Heiner Kallweit
2019-01-29 6:20 ` Peter Ceiley
0 siblings, 1 reply; 25+ messages in thread
From: Heiner Kallweit @ 2019-01-29 6:16 UTC (permalink / raw)
To: Peter Ceiley, Realtek linux nic maintainers; +Cc: netdev
Hi Peter,
at a first glance it doesn't look like a typical driver issue.
What you could do:
- Test the r8169.c from 4.18 on top of 4.19.
- Check whether disabling ASPM (/sys/module/pcie_aspm) has an effect.
- Bisect between 4.18 and 4.19 to find the offending commit.
Any specific reason why you think root cause is in the driver and not
elsewhere in the network subsystem?
Heiner
On 28.01.2019 23:10, Peter Ceiley wrote:
> Hi Heiner,
>
> Thanks for getting back to me.
>
> No, I don't use jumbo packets.
>
> Bandwidth is *generally* good, and iperf results to my NAS provide
> over 900 Mbits/s in both circumstances. The issue seems to appear when
> establishing a connection and is most notable, for example, on my
> mounted NFS shares where it takes seconds (up to 10's of seconds on
> larger directories) to list the contents of each directory. Once a
> transfer begins on a file, I appear to get good bandwidth.
>
> I'm unsure of the best scientific data to provide you in order to
> troubleshoot this issue. Running the following
>
> netstat -s |grep retransmitted
>
> shows a steady increase in retransmitted segments each time I list the
> contents of a remote directory, for example, running 'ls' on a
> directory containing 345 media files did the following using kernel
> 4.19.18:
>
> increased retransmitted segments by 21 and the 'time' command showed
> the following:
> real 0m19.867s
> user 0m0.012s
> sys 0m0.036s
>
> The same command shows no retransmitted segments running kernel
> 4.18.16 and 'time' showed:
> real 0m0.300s
> user 0m0.004s
> sys 0m0.007s
>
> ifconfig does not show any RX/TX errors nor dropped packets in either case.
>
> dmesg XID:
> [ 2.979984] r8169 0000:03:00.0 eth0: RTL8168g/8111g,
> f8:b1:56:fe:67:e0, XID 4c000800, IRQ 32
>
> # lspci -vv
> 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
> Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> ParErr- Stepping- SERR- FastB2B- DisINTx+
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort- >SERR- <PERR- INTx-
> Latency: 0, Cache Line Size: 64 bytes
> Interrupt: pin A routed to IRQ 19
> Region 0: I/O ports at d000 [size=256]
> Region 2: Memory at f7b00000 (64-bit, non-prefetchable) [size=4K]
> Region 4: Memory at f2100000 (64-bit, prefetchable) [size=16K]
> Capabilities: [40] Power Management version 3
> Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
> PME(D0+,D1+,D2+,D3hot+,D3cold+)
> Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
> Address: 0000000000000000 Data: 0000
> Capabilities: [70] Express (v2) Endpoint, MSI 01
> DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s
> <512ns, L1 <64us
> ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
> SlotPowerLimit 10.000W
> DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
> RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
> MaxPayload 128 bytes, MaxReadReq 4096 bytes
> DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
> LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
> Latency L0s unlimited, L1 <64us
> ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
> LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
> ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
> LnkSta: Speed 2.5GT/s (ok), Width x1 (ok)
> TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+,
> OBFF Via message/WAKE#
> AtomicOpsCap: 32bit- 64bit- 128bitCAS-
> DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+,
> OBFF Disabled
> AtomicOpsCtl: ReqEn-
> LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
> Transmit Margin: Normal Operating Range,
> EnterModifiedCompliance- ComplianceSOS-
> Compliance De-emphasis: -6dB
> LnkSta2: Current De-emphasis Level: -6dB,
> EqualizationComplete-, EqualizationPhase1-
> EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
> Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
> Vector table: BAR=4 offset=00000000
> PBA: BAR=4 offset=00000800
> Capabilities: [d0] Vital Product Data
> pcilib: sysfs_read_vpd: read failed: Input/output error
> Not readable
> Capabilities: [100 v1] Advanced Error Reporting
> UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> CESta: RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ AdvNonFatalErr-
> CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
> AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn-
> ECRCChkCap+ ECRCChkEn-
> MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
> HeaderLog: 00000000 00000000 00000000 00000000
> Capabilities: [140 v1] Virtual Channel
> Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
> Arb: Fixed- WRR32- WRR64- WRR128-
> Ctrl: ArbSelect=Fixed
> Status: InProgress-
> VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
> Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
> Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01
> Status: NegoPending- InProgress-
> Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
> Capabilities: [170 v1] Latency Tolerance Reporting
> Max snoop latency: 71680ns
> Max no snoop latency: 71680ns
> Kernel driver in use: r8169
> Kernel modules: r8169
>
> Please let me know if you have any other ideas in terms of testing.
>
> Thanks!
>
> Peter.
>
>
>
>
>
>
>
>
>
> On Tue, 29 Jan 2019 at 05:28, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>
>> On 28.01.2019 12:13, Peter Ceiley wrote:
>>> Hi,
>>>
>>> I have been experiencing very poor network performance since Kernel
>>> 4.19 and I'm confident it's related to the r8169 driver.
>>>
>>> I have no issue with kernel versions 4.18 and prior. I am experiencing
>>> this issue in kernels 4.19 and 4.20 (currently running/testing with
>>> 4.20.4 & 4.19.18).
>>>
>>> If someone could guide me in the right direction, I'm happy to help
>>> troubleshoot this issue. Note that I have been keeping an eye on one
>>> issue related to loading of the PHY driver, however, my symptoms
>>> differ in that I still have a network connection. I have attempted to
>>> reload the driver on a running system, but this does not improve the
>>> situation.
>>>
>>> Using the proprietary r8168 driver returns my device to proper working order.
>>>
>>> lshw shows:
>>> description: Ethernet interface
>>> product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
>>> vendor: Realtek Semiconductor Co., Ltd.
>>> physical id: 0
>>> bus info: pci@0000:03:00.0
>>> logical name: enp3s0
>>> version: 0c
>>> serial:
>>> size: 1Gbit/s
>>> capacity: 1Gbit/s
>>> width: 64 bits
>>> clock: 33MHz
>>> capabilities: pm msi pciexpress msix vpd bus_master cap_list
>>> ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd
>>> 1000bt-fd autonegotiation
>>> configuration: autonegotiation=on broadcast=yes driver=r8169
>>> duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25
>>> latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
>>> resources: irq:19 ioport:d000(size=256)
>>> memory:f7b00000-f7b00fff memory:f2100000-f2103fff
>>>
>>> Kind Regards,
>>>
>>> Peter.
>>>
>> Hi Peter,
>>
>> the description "poor network performance" is quite vague, therefore:
>>
>> - Can you provide any measurements?
>> - iperf results before and after
>> - statistics about dropped packets (rx and/or tx)
>> - Do you use jumbo packets?
>>
>> Also help would be a "lspci -vv" output for the network card and
>> the dmesg output line with the chip XID.
>>
>> Heiner
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: r8169 Driver - Poor Network Performance Since Kernel 4.19
2019-01-29 6:16 ` Heiner Kallweit
@ 2019-01-29 6:20 ` Peter Ceiley
2019-01-29 6:44 ` Heiner Kallweit
0 siblings, 1 reply; 25+ messages in thread
From: Peter Ceiley @ 2019-01-29 6:20 UTC (permalink / raw)
To: Heiner Kallweit; +Cc: Realtek linux nic maintainers, netdev
Hi Heiner,
Thanks, I'll do some more testing. It might not be the driver - I
assumed it was due to the fact that using the r8168 driver 'resolves'
the issue. I'll see if I can test the r8169.c on top of 4.19 - this is
a good idea.
Cheers,
Peter.
On Tue, 29 Jan 2019 at 17:16, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>
> Hi Peter,
>
> at a first glance it doesn't look like a typical driver issue.
> What you could do:
>
> - Test the r8169.c from 4.18 on top of 4.19.
>
> - Check whether disabling ASPM (/sys/module/pcie_aspm) has an effect.
>
> - Bisect between 4.18 and 4.19 to find the offending commit.
>
> Any specific reason why you think root cause is in the driver and not
> elsewhere in the network subsystem?
>
> Heiner
>
>
> On 28.01.2019 23:10, Peter Ceiley wrote:
> > Hi Heiner,
> >
> > Thanks for getting back to me.
> >
> > No, I don't use jumbo packets.
> >
> > Bandwidth is *generally* good, and iperf results to my NAS provide
> > over 900 Mbits/s in both circumstances. The issue seems to appear when
> > establishing a connection and is most notable, for example, on my
> > mounted NFS shares where it takes seconds (up to 10's of seconds on
> > larger directories) to list the contents of each directory. Once a
> > transfer begins on a file, I appear to get good bandwidth.
> >
> > I'm unsure of the best scientific data to provide you in order to
> > troubleshoot this issue. Running the following
> >
> > netstat -s |grep retransmitted
> >
> > shows a steady increase in retransmitted segments each time I list the
> > contents of a remote directory, for example, running 'ls' on a
> > directory containing 345 media files did the following using kernel
> > 4.19.18:
> >
> > increased retransmitted segments by 21 and the 'time' command showed
> > the following:
> > real 0m19.867s
> > user 0m0.012s
> > sys 0m0.036s
> >
> > The same command shows no retransmitted segments running kernel
> > 4.18.16 and 'time' showed:
> > real 0m0.300s
> > user 0m0.004s
> > sys 0m0.007s
> >
> > ifconfig does not show any RX/TX errors nor dropped packets in either case.
> >
> > dmesg XID:
> > [ 2.979984] r8169 0000:03:00.0 eth0: RTL8168g/8111g,
> > f8:b1:56:fe:67:e0, XID 4c000800, IRQ 32
> >
> > # lspci -vv
> > 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
> > RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
> > Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
> > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> > ParErr- Stepping- SERR- FastB2B- DisINTx+
> > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> > <TAbort- <MAbort- >SERR- <PERR- INTx-
> > Latency: 0, Cache Line Size: 64 bytes
> > Interrupt: pin A routed to IRQ 19
> > Region 0: I/O ports at d000 [size=256]
> > Region 2: Memory at f7b00000 (64-bit, non-prefetchable) [size=4K]
> > Region 4: Memory at f2100000 (64-bit, prefetchable) [size=16K]
> > Capabilities: [40] Power Management version 3
> > Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
> > PME(D0+,D1+,D2+,D3hot+,D3cold+)
> > Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> > Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
> > Address: 0000000000000000 Data: 0000
> > Capabilities: [70] Express (v2) Endpoint, MSI 01
> > DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s
> > <512ns, L1 <64us
> > ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
> > SlotPowerLimit 10.000W
> > DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
> > RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
> > MaxPayload 128 bytes, MaxReadReq 4096 bytes
> > DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
> > LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
> > Latency L0s unlimited, L1 <64us
> > ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
> > LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
> > ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
> > LnkSta: Speed 2.5GT/s (ok), Width x1 (ok)
> > TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> > DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+,
> > OBFF Via message/WAKE#
> > AtomicOpsCap: 32bit- 64bit- 128bitCAS-
> > DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+,
> > OBFF Disabled
> > AtomicOpsCtl: ReqEn-
> > LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
> > Transmit Margin: Normal Operating Range,
> > EnterModifiedCompliance- ComplianceSOS-
> > Compliance De-emphasis: -6dB
> > LnkSta2: Current De-emphasis Level: -6dB,
> > EqualizationComplete-, EqualizationPhase1-
> > EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
> > Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
> > Vector table: BAR=4 offset=00000000
> > PBA: BAR=4 offset=00000800
> > Capabilities: [d0] Vital Product Data
> > pcilib: sysfs_read_vpd: read failed: Input/output error
> > Not readable
> > Capabilities: [100 v1] Advanced Error Reporting
> > UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> > RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> > UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> > RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> > UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
> > RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> > CESta: RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ AdvNonFatalErr-
> > CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
> > AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn-
> > ECRCChkCap+ ECRCChkEn-
> > MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
> > HeaderLog: 00000000 00000000 00000000 00000000
> > Capabilities: [140 v1] Virtual Channel
> > Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
> > Arb: Fixed- WRR32- WRR64- WRR128-
> > Ctrl: ArbSelect=Fixed
> > Status: InProgress-
> > VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
> > Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
> > Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01
> > Status: NegoPending- InProgress-
> > Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
> > Capabilities: [170 v1] Latency Tolerance Reporting
> > Max snoop latency: 71680ns
> > Max no snoop latency: 71680ns
> > Kernel driver in use: r8169
> > Kernel modules: r8169
> >
> > Please let me know if you have any other ideas in terms of testing.
> >
> > Thanks!
> >
> > Peter.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Tue, 29 Jan 2019 at 05:28, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>
> >> On 28.01.2019 12:13, Peter Ceiley wrote:
> >>> Hi,
> >>>
> >>> I have been experiencing very poor network performance since Kernel
> >>> 4.19 and I'm confident it's related to the r8169 driver.
> >>>
> >>> I have no issue with kernel versions 4.18 and prior. I am experiencing
> >>> this issue in kernels 4.19 and 4.20 (currently running/testing with
> >>> 4.20.4 & 4.19.18).
> >>>
> >>> If someone could guide me in the right direction, I'm happy to help
> >>> troubleshoot this issue. Note that I have been keeping an eye on one
> >>> issue related to loading of the PHY driver, however, my symptoms
> >>> differ in that I still have a network connection. I have attempted to
> >>> reload the driver on a running system, but this does not improve the
> >>> situation.
> >>>
> >>> Using the proprietary r8168 driver returns my device to proper working order.
> >>>
> >>> lshw shows:
> >>> description: Ethernet interface
> >>> product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
> >>> vendor: Realtek Semiconductor Co., Ltd.
> >>> physical id: 0
> >>> bus info: pci@0000:03:00.0
> >>> logical name: enp3s0
> >>> version: 0c
> >>> serial:
> >>> size: 1Gbit/s
> >>> capacity: 1Gbit/s
> >>> width: 64 bits
> >>> clock: 33MHz
> >>> capabilities: pm msi pciexpress msix vpd bus_master cap_list
> >>> ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd
> >>> 1000bt-fd autonegotiation
> >>> configuration: autonegotiation=on broadcast=yes driver=r8169
> >>> duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25
> >>> latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
> >>> resources: irq:19 ioport:d000(size=256)
> >>> memory:f7b00000-f7b00fff memory:f2100000-f2103fff
> >>>
> >>> Kind Regards,
> >>>
> >>> Peter.
> >>>
> >> Hi Peter,
> >>
> >> the description "poor network performance" is quite vague, therefore:
> >>
> >> - Can you provide any measurements?
> >> - iperf results before and after
> >> - statistics about dropped packets (rx and/or tx)
> >> - Do you use jumbo packets?
> >>
> >> Also help would be a "lspci -vv" output for the network card and
> >> the dmesg output line with the chip XID.
> >>
> >> Heiner
> >
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: r8169 Driver - Poor Network Performance Since Kernel 4.19
2019-01-29 6:20 ` Peter Ceiley
@ 2019-01-29 6:44 ` Heiner Kallweit
2019-01-30 9:59 ` Peter Ceiley
0 siblings, 1 reply; 25+ messages in thread
From: Heiner Kallweit @ 2019-01-29 6:44 UTC (permalink / raw)
To: Peter Ceiley; +Cc: Realtek linux nic maintainers, netdev
Hi Peter,
I think the vendor driver doesn't enable ASPM per default.
So it's worth a try to disable ASPM in the BIOS or via sysfs.
Few older systems seem to have issues with ASPM, what kind of
system / mainboard are you using? The RTL8168 is the onboard
network chip?
Rgds, Heiner
On 29.01.2019 07:20, Peter Ceiley wrote:
> Hi Heiner,
>
> Thanks, I'll do some more testing. It might not be the driver - I
> assumed it was due to the fact that using the r8168 driver 'resolves'
> the issue. I'll see if I can test the r8169.c on top of 4.19 - this is
> a good idea.
>
> Cheers,
>
> Peter.
>
> On Tue, 29 Jan 2019 at 17:16, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>
>> Hi Peter,
>>
>> at a first glance it doesn't look like a typical driver issue.
>> What you could do:
>>
>> - Test the r8169.c from 4.18 on top of 4.19.
>>
>> - Check whether disabling ASPM (/sys/module/pcie_aspm) has an effect.
>>
>> - Bisect between 4.18 and 4.19 to find the offending commit.
>>
>> Any specific reason why you think root cause is in the driver and not
>> elsewhere in the network subsystem?
>>
>> Heiner
>>
>>
>> On 28.01.2019 23:10, Peter Ceiley wrote:
>>> Hi Heiner,
>>>
>>> Thanks for getting back to me.
>>>
>>> No, I don't use jumbo packets.
>>>
>>> Bandwidth is *generally* good, and iperf results to my NAS provide
>>> over 900 Mbits/s in both circumstances. The issue seems to appear when
>>> establishing a connection and is most notable, for example, on my
>>> mounted NFS shares where it takes seconds (up to 10's of seconds on
>>> larger directories) to list the contents of each directory. Once a
>>> transfer begins on a file, I appear to get good bandwidth.
>>>
>>> I'm unsure of the best scientific data to provide you in order to
>>> troubleshoot this issue. Running the following
>>>
>>> netstat -s |grep retransmitted
>>>
>>> shows a steady increase in retransmitted segments each time I list the
>>> contents of a remote directory, for example, running 'ls' on a
>>> directory containing 345 media files did the following using kernel
>>> 4.19.18:
>>>
>>> increased retransmitted segments by 21 and the 'time' command showed
>>> the following:
>>> real 0m19.867s
>>> user 0m0.012s
>>> sys 0m0.036s
>>>
>>> The same command shows no retransmitted segments running kernel
>>> 4.18.16 and 'time' showed:
>>> real 0m0.300s
>>> user 0m0.004s
>>> sys 0m0.007s
>>>
>>> ifconfig does not show any RX/TX errors nor dropped packets in either case.
>>>
>>> dmesg XID:
>>> [ 2.979984] r8169 0000:03:00.0 eth0: RTL8168g/8111g,
>>> f8:b1:56:fe:67:e0, XID 4c000800, IRQ 32
>>>
>>> # lspci -vv
>>> 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
>>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
>>> Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
>>> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
>>> ParErr- Stepping- SERR- FastB2B- DisINTx+
>>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
>>> <TAbort- <MAbort- >SERR- <PERR- INTx-
>>> Latency: 0, Cache Line Size: 64 bytes
>>> Interrupt: pin A routed to IRQ 19
>>> Region 0: I/O ports at d000 [size=256]
>>> Region 2: Memory at f7b00000 (64-bit, non-prefetchable) [size=4K]
>>> Region 4: Memory at f2100000 (64-bit, prefetchable) [size=16K]
>>> Capabilities: [40] Power Management version 3
>>> Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
>>> PME(D0+,D1+,D2+,D3hot+,D3cold+)
>>> Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
>>> Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
>>> Address: 0000000000000000 Data: 0000
>>> Capabilities: [70] Express (v2) Endpoint, MSI 01
>>> DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s
>>> <512ns, L1 <64us
>>> ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
>>> SlotPowerLimit 10.000W
>>> DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
>>> RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
>>> MaxPayload 128 bytes, MaxReadReq 4096 bytes
>>> DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
>>> LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
>>> Latency L0s unlimited, L1 <64us
>>> ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
>>> LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
>>> ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
>>> LnkSta: Speed 2.5GT/s (ok), Width x1 (ok)
>>> TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
>>> DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+,
>>> OBFF Via message/WAKE#
>>> AtomicOpsCap: 32bit- 64bit- 128bitCAS-
>>> DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+,
>>> OBFF Disabled
>>> AtomicOpsCtl: ReqEn-
>>> LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
>>> Transmit Margin: Normal Operating Range,
>>> EnterModifiedCompliance- ComplianceSOS-
>>> Compliance De-emphasis: -6dB
>>> LnkSta2: Current De-emphasis Level: -6dB,
>>> EqualizationComplete-, EqualizationPhase1-
>>> EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
>>> Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
>>> Vector table: BAR=4 offset=00000000
>>> PBA: BAR=4 offset=00000800
>>> Capabilities: [d0] Vital Product Data
>>> pcilib: sysfs_read_vpd: read failed: Input/output error
>>> Not readable
>>> Capabilities: [100 v1] Advanced Error Reporting
>>> UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>> UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>> UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
>>> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>>> CESta: RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ AdvNonFatalErr-
>>> CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
>>> AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn-
>>> ECRCChkCap+ ECRCChkEn-
>>> MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
>>> HeaderLog: 00000000 00000000 00000000 00000000
>>> Capabilities: [140 v1] Virtual Channel
>>> Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
>>> Arb: Fixed- WRR32- WRR64- WRR128-
>>> Ctrl: ArbSelect=Fixed
>>> Status: InProgress-
>>> VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
>>> Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
>>> Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01
>>> Status: NegoPending- InProgress-
>>> Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
>>> Capabilities: [170 v1] Latency Tolerance Reporting
>>> Max snoop latency: 71680ns
>>> Max no snoop latency: 71680ns
>>> Kernel driver in use: r8169
>>> Kernel modules: r8169
>>>
>>> Please let me know if you have any other ideas in terms of testing.
>>>
>>> Thanks!
>>>
>>> Peter.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Tue, 29 Jan 2019 at 05:28, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>
>>>> On 28.01.2019 12:13, Peter Ceiley wrote:
>>>>> Hi,
>>>>>
>>>>> I have been experiencing very poor network performance since Kernel
>>>>> 4.19 and I'm confident it's related to the r8169 driver.
>>>>>
>>>>> I have no issue with kernel versions 4.18 and prior. I am experiencing
>>>>> this issue in kernels 4.19 and 4.20 (currently running/testing with
>>>>> 4.20.4 & 4.19.18).
>>>>>
>>>>> If someone could guide me in the right direction, I'm happy to help
>>>>> troubleshoot this issue. Note that I have been keeping an eye on one
>>>>> issue related to loading of the PHY driver, however, my symptoms
>>>>> differ in that I still have a network connection. I have attempted to
>>>>> reload the driver on a running system, but this does not improve the
>>>>> situation.
>>>>>
>>>>> Using the proprietary r8168 driver returns my device to proper working order.
>>>>>
>>>>> lshw shows:
>>>>> description: Ethernet interface
>>>>> product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
>>>>> vendor: Realtek Semiconductor Co., Ltd.
>>>>> physical id: 0
>>>>> bus info: pci@0000:03:00.0
>>>>> logical name: enp3s0
>>>>> version: 0c
>>>>> serial:
>>>>> size: 1Gbit/s
>>>>> capacity: 1Gbit/s
>>>>> width: 64 bits
>>>>> clock: 33MHz
>>>>> capabilities: pm msi pciexpress msix vpd bus_master cap_list
>>>>> ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd
>>>>> 1000bt-fd autonegotiation
>>>>> configuration: autonegotiation=on broadcast=yes driver=r8169
>>>>> duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25
>>>>> latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
>>>>> resources: irq:19 ioport:d000(size=256)
>>>>> memory:f7b00000-f7b00fff memory:f2100000-f2103fff
>>>>>
>>>>> Kind Regards,
>>>>>
>>>>> Peter.
>>>>>
>>>> Hi Peter,
>>>>
>>>> the description "poor network performance" is quite vague, therefore:
>>>>
>>>> - Can you provide any measurements?
>>>> - iperf results before and after
>>>> - statistics about dropped packets (rx and/or tx)
>>>> - Do you use jumbo packets?
>>>>
>>>> Also help would be a "lspci -vv" output for the network card and
>>>> the dmesg output line with the chip XID.
>>>>
>>>> Heiner
>>>
>>
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: r8169 Driver - Poor Network Performance Since Kernel 4.19
2019-01-29 6:44 ` Heiner Kallweit
@ 2019-01-30 9:59 ` Peter Ceiley
2019-01-30 19:15 ` Heiner Kallweit
0 siblings, 1 reply; 25+ messages in thread
From: Peter Ceiley @ 2019-01-30 9:59 UTC (permalink / raw)
To: Heiner Kallweit; +Cc: Realtek linux nic maintainers, netdev
Hi Heiner,
I tried disabling the ASPM using the pcie_aspm=off kernel parameter
and this made no difference.
I tried compiling the 4.18.16 r8169.c with the 4.19.18 source and
subsequently loaded the module in the running 4.19.18 kernel. I can
confirm that this immediately resolved the issue and access to the NFS
shares operated as expected.
I presume this means it is an issue with the r8169 driver included in
4.19 onwards?
To answer your last questions:
Base Board Information
Manufacturer: Alienware
Product Name: 0PGRP5
Version: A02
... and yes, the RTL8168 is the onboard network chip.
Regards,
Peter.
On Tue, 29 Jan 2019 at 17:44, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>
> Hi Peter,
>
> I think the vendor driver doesn't enable ASPM per default.
> So it's worth a try to disable ASPM in the BIOS or via sysfs.
> Few older systems seem to have issues with ASPM, what kind of
> system / mainboard are you using? The RTL8168 is the onboard
> network chip?
>
> Rgds, Heiner
>
>
> On 29.01.2019 07:20, Peter Ceiley wrote:
> > Hi Heiner,
> >
> > Thanks, I'll do some more testing. It might not be the driver - I
> > assumed it was due to the fact that using the r8168 driver 'resolves'
> > the issue. I'll see if I can test the r8169.c on top of 4.19 - this is
> > a good idea.
> >
> > Cheers,
> >
> > Peter.
> >
> > On Tue, 29 Jan 2019 at 17:16, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>
> >> Hi Peter,
> >>
> >> at a first glance it doesn't look like a typical driver issue.
> >> What you could do:
> >>
> >> - Test the r8169.c from 4.18 on top of 4.19.
> >>
> >> - Check whether disabling ASPM (/sys/module/pcie_aspm) has an effect.
> >>
> >> - Bisect between 4.18 and 4.19 to find the offending commit.
> >>
> >> Any specific reason why you think root cause is in the driver and not
> >> elsewhere in the network subsystem?
> >>
> >> Heiner
> >>
> >>
> >> On 28.01.2019 23:10, Peter Ceiley wrote:
> >>> Hi Heiner,
> >>>
> >>> Thanks for getting back to me.
> >>>
> >>> No, I don't use jumbo packets.
> >>>
> >>> Bandwidth is *generally* good, and iperf results to my NAS provide
> >>> over 900 Mbits/s in both circumstances. The issue seems to appear when
> >>> establishing a connection and is most notable, for example, on my
> >>> mounted NFS shares where it takes seconds (up to 10's of seconds on
> >>> larger directories) to list the contents of each directory. Once a
> >>> transfer begins on a file, I appear to get good bandwidth.
> >>>
> >>> I'm unsure of the best scientific data to provide you in order to
> >>> troubleshoot this issue. Running the following
> >>>
> >>> netstat -s |grep retransmitted
> >>>
> >>> shows a steady increase in retransmitted segments each time I list the
> >>> contents of a remote directory, for example, running 'ls' on a
> >>> directory containing 345 media files did the following using kernel
> >>> 4.19.18:
> >>>
> >>> increased retransmitted segments by 21 and the 'time' command showed
> >>> the following:
> >>> real 0m19.867s
> >>> user 0m0.012s
> >>> sys 0m0.036s
> >>>
> >>> The same command shows no retransmitted segments running kernel
> >>> 4.18.16 and 'time' showed:
> >>> real 0m0.300s
> >>> user 0m0.004s
> >>> sys 0m0.007s
> >>>
> >>> ifconfig does not show any RX/TX errors nor dropped packets in either case.
> >>>
> >>> dmesg XID:
> >>> [ 2.979984] r8169 0000:03:00.0 eth0: RTL8168g/8111g,
> >>> f8:b1:56:fe:67:e0, XID 4c000800, IRQ 32
> >>>
> >>> # lspci -vv
> >>> 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
> >>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
> >>> Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
> >>> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> >>> ParErr- Stepping- SERR- FastB2B- DisINTx+
> >>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> >>> <TAbort- <MAbort- >SERR- <PERR- INTx-
> >>> Latency: 0, Cache Line Size: 64 bytes
> >>> Interrupt: pin A routed to IRQ 19
> >>> Region 0: I/O ports at d000 [size=256]
> >>> Region 2: Memory at f7b00000 (64-bit, non-prefetchable) [size=4K]
> >>> Region 4: Memory at f2100000 (64-bit, prefetchable) [size=16K]
> >>> Capabilities: [40] Power Management version 3
> >>> Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
> >>> PME(D0+,D1+,D2+,D3hot+,D3cold+)
> >>> Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> >>> Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
> >>> Address: 0000000000000000 Data: 0000
> >>> Capabilities: [70] Express (v2) Endpoint, MSI 01
> >>> DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s
> >>> <512ns, L1 <64us
> >>> ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
> >>> SlotPowerLimit 10.000W
> >>> DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
> >>> RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
> >>> MaxPayload 128 bytes, MaxReadReq 4096 bytes
> >>> DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
> >>> LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
> >>> Latency L0s unlimited, L1 <64us
> >>> ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
> >>> LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
> >>> ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
> >>> LnkSta: Speed 2.5GT/s (ok), Width x1 (ok)
> >>> TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> >>> DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+,
> >>> OBFF Via message/WAKE#
> >>> AtomicOpsCap: 32bit- 64bit- 128bitCAS-
> >>> DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+,
> >>> OBFF Disabled
> >>> AtomicOpsCtl: ReqEn-
> >>> LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
> >>> Transmit Margin: Normal Operating Range,
> >>> EnterModifiedCompliance- ComplianceSOS-
> >>> Compliance De-emphasis: -6dB
> >>> LnkSta2: Current De-emphasis Level: -6dB,
> >>> EqualizationComplete-, EqualizationPhase1-
> >>> EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
> >>> Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
> >>> Vector table: BAR=4 offset=00000000
> >>> PBA: BAR=4 offset=00000800
> >>> Capabilities: [d0] Vital Product Data
> >>> pcilib: sysfs_read_vpd: read failed: Input/output error
> >>> Not readable
> >>> Capabilities: [100 v1] Advanced Error Reporting
> >>> UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> >>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> >>> UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> >>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> >>> UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
> >>> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> >>> CESta: RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ AdvNonFatalErr-
> >>> CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
> >>> AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn-
> >>> ECRCChkCap+ ECRCChkEn-
> >>> MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
> >>> HeaderLog: 00000000 00000000 00000000 00000000
> >>> Capabilities: [140 v1] Virtual Channel
> >>> Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
> >>> Arb: Fixed- WRR32- WRR64- WRR128-
> >>> Ctrl: ArbSelect=Fixed
> >>> Status: InProgress-
> >>> VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
> >>> Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
> >>> Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01
> >>> Status: NegoPending- InProgress-
> >>> Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
> >>> Capabilities: [170 v1] Latency Tolerance Reporting
> >>> Max snoop latency: 71680ns
> >>> Max no snoop latency: 71680ns
> >>> Kernel driver in use: r8169
> >>> Kernel modules: r8169
> >>>
> >>> Please let me know if you have any other ideas in terms of testing.
> >>>
> >>> Thanks!
> >>>
> >>> Peter.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Tue, 29 Jan 2019 at 05:28, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>>>
> >>>> On 28.01.2019 12:13, Peter Ceiley wrote:
> >>>>> Hi,
> >>>>>
> >>>>> I have been experiencing very poor network performance since Kernel
> >>>>> 4.19 and I'm confident it's related to the r8169 driver.
> >>>>>
> >>>>> I have no issue with kernel versions 4.18 and prior. I am experiencing
> >>>>> this issue in kernels 4.19 and 4.20 (currently running/testing with
> >>>>> 4.20.4 & 4.19.18).
> >>>>>
> >>>>> If someone could guide me in the right direction, I'm happy to help
> >>>>> troubleshoot this issue. Note that I have been keeping an eye on one
> >>>>> issue related to loading of the PHY driver, however, my symptoms
> >>>>> differ in that I still have a network connection. I have attempted to
> >>>>> reload the driver on a running system, but this does not improve the
> >>>>> situation.
> >>>>>
> >>>>> Using the proprietary r8168 driver returns my device to proper working order.
> >>>>>
> >>>>> lshw shows:
> >>>>> description: Ethernet interface
> >>>>> product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
> >>>>> vendor: Realtek Semiconductor Co., Ltd.
> >>>>> physical id: 0
> >>>>> bus info: pci@0000:03:00.0
> >>>>> logical name: enp3s0
> >>>>> version: 0c
> >>>>> serial:
> >>>>> size: 1Gbit/s
> >>>>> capacity: 1Gbit/s
> >>>>> width: 64 bits
> >>>>> clock: 33MHz
> >>>>> capabilities: pm msi pciexpress msix vpd bus_master cap_list
> >>>>> ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd
> >>>>> 1000bt-fd autonegotiation
> >>>>> configuration: autonegotiation=on broadcast=yes driver=r8169
> >>>>> duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25
> >>>>> latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
> >>>>> resources: irq:19 ioport:d000(size=256)
> >>>>> memory:f7b00000-f7b00fff memory:f2100000-f2103fff
> >>>>>
> >>>>> Kind Regards,
> >>>>>
> >>>>> Peter.
> >>>>>
> >>>> Hi Peter,
> >>>>
> >>>> the description "poor network performance" is quite vague, therefore:
> >>>>
> >>>> - Can you provide any measurements?
> >>>> - iperf results before and after
> >>>> - statistics about dropped packets (rx and/or tx)
> >>>> - Do you use jumbo packets?
> >>>>
> >>>> Also help would be a "lspci -vv" output for the network card and
> >>>> the dmesg output line with the chip XID.
> >>>>
> >>>> Heiner
> >>>
> >>
> >
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: r8169 Driver - Poor Network Performance Since Kernel 4.19
2019-01-30 9:59 ` Peter Ceiley
@ 2019-01-30 19:15 ` Heiner Kallweit
2019-01-31 2:32 ` David Chang
0 siblings, 1 reply; 25+ messages in thread
From: Heiner Kallweit @ 2019-01-30 19:15 UTC (permalink / raw)
To: Peter Ceiley; +Cc: Realtek linux nic maintainers, netdev
Hi Peter,
recently I had somebody where pcie_aspm=off for whatever reason didn't
do the trick, can you also check with pcie_aspm.policy=performance.
And please check with "ethtool -S <if>" whether the chip statistics
show a significant number of errors.
If this doesn't help you may have to bisect to find the offending commit.
Heiner
On 30.01.2019 10:59, Peter Ceiley wrote:
> Hi Heiner,
>
> I tried disabling the ASPM using the pcie_aspm=off kernel parameter
> and this made no difference.
>
> I tried compiling the 4.18.16 r8169.c with the 4.19.18 source and
> subsequently loaded the module in the running 4.19.18 kernel. I can
> confirm that this immediately resolved the issue and access to the NFS
> shares operated as expected.
>
> I presume this means it is an issue with the r8169 driver included in
> 4.19 onwards?
>
> To answer your last questions:
>
> Base Board Information
> Manufacturer: Alienware
> Product Name: 0PGRP5
> Version: A02
>
> ... and yes, the RTL8168 is the onboard network chip.
>
> Regards,
>
> Peter.
>
> On Tue, 29 Jan 2019 at 17:44, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>
>> Hi Peter,
>>
>> I think the vendor driver doesn't enable ASPM per default.
>> So it's worth a try to disable ASPM in the BIOS or via sysfs.
>> Few older systems seem to have issues with ASPM, what kind of
>> system / mainboard are you using? The RTL8168 is the onboard
>> network chip?
>>
>> Rgds, Heiner
>>
>>
>> On 29.01.2019 07:20, Peter Ceiley wrote:
>>> Hi Heiner,
>>>
>>> Thanks, I'll do some more testing. It might not be the driver - I
>>> assumed it was due to the fact that using the r8168 driver 'resolves'
>>> the issue. I'll see if I can test the r8169.c on top of 4.19 - this is
>>> a good idea.
>>>
>>> Cheers,
>>>
>>> Peter.
>>>
>>> On Tue, 29 Jan 2019 at 17:16, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>
>>>> Hi Peter,
>>>>
>>>> at a first glance it doesn't look like a typical driver issue.
>>>> What you could do:
>>>>
>>>> - Test the r8169.c from 4.18 on top of 4.19.
>>>>
>>>> - Check whether disabling ASPM (/sys/module/pcie_aspm) has an effect.
>>>>
>>>> - Bisect between 4.18 and 4.19 to find the offending commit.
>>>>
>>>> Any specific reason why you think root cause is in the driver and not
>>>> elsewhere in the network subsystem?
>>>>
>>>> Heiner
>>>>
>>>>
>>>> On 28.01.2019 23:10, Peter Ceiley wrote:
>>>>> Hi Heiner,
>>>>>
>>>>> Thanks for getting back to me.
>>>>>
>>>>> No, I don't use jumbo packets.
>>>>>
>>>>> Bandwidth is *generally* good, and iperf results to my NAS provide
>>>>> over 900 Mbits/s in both circumstances. The issue seems to appear when
>>>>> establishing a connection and is most notable, for example, on my
>>>>> mounted NFS shares where it takes seconds (up to 10's of seconds on
>>>>> larger directories) to list the contents of each directory. Once a
>>>>> transfer begins on a file, I appear to get good bandwidth.
>>>>>
>>>>> I'm unsure of the best scientific data to provide you in order to
>>>>> troubleshoot this issue. Running the following
>>>>>
>>>>> netstat -s |grep retransmitted
>>>>>
>>>>> shows a steady increase in retransmitted segments each time I list the
>>>>> contents of a remote directory, for example, running 'ls' on a
>>>>> directory containing 345 media files did the following using kernel
>>>>> 4.19.18:
>>>>>
>>>>> increased retransmitted segments by 21 and the 'time' command showed
>>>>> the following:
>>>>> real 0m19.867s
>>>>> user 0m0.012s
>>>>> sys 0m0.036s
>>>>>
>>>>> The same command shows no retransmitted segments running kernel
>>>>> 4.18.16 and 'time' showed:
>>>>> real 0m0.300s
>>>>> user 0m0.004s
>>>>> sys 0m0.007s
>>>>>
>>>>> ifconfig does not show any RX/TX errors nor dropped packets in either case.
>>>>>
>>>>> dmesg XID:
>>>>> [ 2.979984] r8169 0000:03:00.0 eth0: RTL8168g/8111g,
>>>>> f8:b1:56:fe:67:e0, XID 4c000800, IRQ 32
>>>>>
>>>>> # lspci -vv
>>>>> 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
>>>>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
>>>>> Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
>>>>> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
>>>>> ParErr- Stepping- SERR- FastB2B- DisINTx+
>>>>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
>>>>> <TAbort- <MAbort- >SERR- <PERR- INTx-
>>>>> Latency: 0, Cache Line Size: 64 bytes
>>>>> Interrupt: pin A routed to IRQ 19
>>>>> Region 0: I/O ports at d000 [size=256]
>>>>> Region 2: Memory at f7b00000 (64-bit, non-prefetchable) [size=4K]
>>>>> Region 4: Memory at f2100000 (64-bit, prefetchable) [size=16K]
>>>>> Capabilities: [40] Power Management version 3
>>>>> Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
>>>>> PME(D0+,D1+,D2+,D3hot+,D3cold+)
>>>>> Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
>>>>> Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
>>>>> Address: 0000000000000000 Data: 0000
>>>>> Capabilities: [70] Express (v2) Endpoint, MSI 01
>>>>> DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s
>>>>> <512ns, L1 <64us
>>>>> ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
>>>>> SlotPowerLimit 10.000W
>>>>> DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
>>>>> RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
>>>>> MaxPayload 128 bytes, MaxReadReq 4096 bytes
>>>>> DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
>>>>> LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
>>>>> Latency L0s unlimited, L1 <64us
>>>>> ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
>>>>> LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
>>>>> ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
>>>>> LnkSta: Speed 2.5GT/s (ok), Width x1 (ok)
>>>>> TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
>>>>> DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+,
>>>>> OBFF Via message/WAKE#
>>>>> AtomicOpsCap: 32bit- 64bit- 128bitCAS-
>>>>> DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+,
>>>>> OBFF Disabled
>>>>> AtomicOpsCtl: ReqEn-
>>>>> LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
>>>>> Transmit Margin: Normal Operating Range,
>>>>> EnterModifiedCompliance- ComplianceSOS-
>>>>> Compliance De-emphasis: -6dB
>>>>> LnkSta2: Current De-emphasis Level: -6dB,
>>>>> EqualizationComplete-, EqualizationPhase1-
>>>>> EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
>>>>> Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
>>>>> Vector table: BAR=4 offset=00000000
>>>>> PBA: BAR=4 offset=00000800
>>>>> Capabilities: [d0] Vital Product Data
>>>>> pcilib: sysfs_read_vpd: read failed: Input/output error
>>>>> Not readable
>>>>> Capabilities: [100 v1] Advanced Error Reporting
>>>>> UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>>>> UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>>>> UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
>>>>> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>>>>> CESta: RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ AdvNonFatalErr-
>>>>> CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
>>>>> AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn-
>>>>> ECRCChkCap+ ECRCChkEn-
>>>>> MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
>>>>> HeaderLog: 00000000 00000000 00000000 00000000
>>>>> Capabilities: [140 v1] Virtual Channel
>>>>> Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
>>>>> Arb: Fixed- WRR32- WRR64- WRR128-
>>>>> Ctrl: ArbSelect=Fixed
>>>>> Status: InProgress-
>>>>> VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
>>>>> Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
>>>>> Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01
>>>>> Status: NegoPending- InProgress-
>>>>> Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
>>>>> Capabilities: [170 v1] Latency Tolerance Reporting
>>>>> Max snoop latency: 71680ns
>>>>> Max no snoop latency: 71680ns
>>>>> Kernel driver in use: r8169
>>>>> Kernel modules: r8169
>>>>>
>>>>> Please let me know if you have any other ideas in terms of testing.
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Peter.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, 29 Jan 2019 at 05:28, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>>>
>>>>>> On 28.01.2019 12:13, Peter Ceiley wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I have been experiencing very poor network performance since Kernel
>>>>>>> 4.19 and I'm confident it's related to the r8169 driver.
>>>>>>>
>>>>>>> I have no issue with kernel versions 4.18 and prior. I am experiencing
>>>>>>> this issue in kernels 4.19 and 4.20 (currently running/testing with
>>>>>>> 4.20.4 & 4.19.18).
>>>>>>>
>>>>>>> If someone could guide me in the right direction, I'm happy to help
>>>>>>> troubleshoot this issue. Note that I have been keeping an eye on one
>>>>>>> issue related to loading of the PHY driver, however, my symptoms
>>>>>>> differ in that I still have a network connection. I have attempted to
>>>>>>> reload the driver on a running system, but this does not improve the
>>>>>>> situation.
>>>>>>>
>>>>>>> Using the proprietary r8168 driver returns my device to proper working order.
>>>>>>>
>>>>>>> lshw shows:
>>>>>>> description: Ethernet interface
>>>>>>> product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
>>>>>>> vendor: Realtek Semiconductor Co., Ltd.
>>>>>>> physical id: 0
>>>>>>> bus info: pci@0000:03:00.0
>>>>>>> logical name: enp3s0
>>>>>>> version: 0c
>>>>>>> serial:
>>>>>>> size: 1Gbit/s
>>>>>>> capacity: 1Gbit/s
>>>>>>> width: 64 bits
>>>>>>> clock: 33MHz
>>>>>>> capabilities: pm msi pciexpress msix vpd bus_master cap_list
>>>>>>> ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd
>>>>>>> 1000bt-fd autonegotiation
>>>>>>> configuration: autonegotiation=on broadcast=yes driver=r8169
>>>>>>> duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25
>>>>>>> latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
>>>>>>> resources: irq:19 ioport:d000(size=256)
>>>>>>> memory:f7b00000-f7b00fff memory:f2100000-f2103fff
>>>>>>>
>>>>>>> Kind Regards,
>>>>>>>
>>>>>>> Peter.
>>>>>>>
>>>>>> Hi Peter,
>>>>>>
>>>>>> the description "poor network performance" is quite vague, therefore:
>>>>>>
>>>>>> - Can you provide any measurements?
>>>>>> - iperf results before and after
>>>>>> - statistics about dropped packets (rx and/or tx)
>>>>>> - Do you use jumbo packets?
>>>>>>
>>>>>> Also help would be a "lspci -vv" output for the network card and
>>>>>> the dmesg output line with the chip XID.
>>>>>>
>>>>>> Heiner
>>>>>
>>>>
>>>
>>
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: r8169 Driver - Poor Network Performance Since Kernel 4.19
2019-01-30 19:15 ` Heiner Kallweit
@ 2019-01-31 2:32 ` David Chang
2019-01-31 6:21 ` Heiner Kallweit
` (2 more replies)
0 siblings, 3 replies; 25+ messages in thread
From: David Chang @ 2019-01-31 2:32 UTC (permalink / raw)
To: Heiner Kallweit; +Cc: Peter Ceiley, Realtek linux nic maintainers, netdev
Hi,
We had a similr case here.
- Realtek r8169 receive performance regression in kernel 4.19
https://bugzilla.suse.com/show_bug.cgi?id=1119649
kernel: r8169 0000:01:00.0 eth0: RTL8168h/8111h, XID 54100880
The major symptom is there are many rx_missed count.
On Jan 30, 2019 at 20:15:45 +0100, Heiner Kallweit wrote:
> Hi Peter,
>
> recently I had somebody where pcie_aspm=off for whatever reason didn't
> do the trick, can you also check with pcie_aspm.policy=performance.
We will give it a try later.
> And please check with "ethtool -S <if>" whether the chip statistics
> show a significant number of errors.
>
> If this doesn't help you may have to bisect to find the offending commit.
We had tried fallback driver to a few previous commits as following,
but with no luck.
9675931e6b65 r8169: re-enable MSI-X on RTL8168g (v4.19)
098b01ad9837 r8169: don't include asm headers directly (v4.19-rc1)
a2965f12fde6 r8169: remove rtl8169_set_speed_xmii (v4.19-rc1)
6fcf9b1d4d6c r8169: fix runtime suspend (v4.19-rc1)
e397286b8e89 r8169: remove TBI 1000BaseX support (v4.19-rc1)
Thanks,
David Chang
>
> Heiner
>
>
> On 30.01.2019 10:59, Peter Ceiley wrote:
> > Hi Heiner,
> >
> > I tried disabling the ASPM using the pcie_aspm=off kernel parameter
> > and this made no difference.
> >
> > I tried compiling the 4.18.16 r8169.c with the 4.19.18 source and
> > subsequently loaded the module in the running 4.19.18 kernel. I can
> > confirm that this immediately resolved the issue and access to the NFS
> > shares operated as expected.
> >
> > I presume this means it is an issue with the r8169 driver included in
> > 4.19 onwards?
> >
> > To answer your last questions:
> >
> > Base Board Information
> > Manufacturer: Alienware
> > Product Name: 0PGRP5
> > Version: A02
> >
> > ... and yes, the RTL8168 is the onboard network chip.
> >
> > Regards,
> >
> > Peter.
> >
> > On Tue, 29 Jan 2019 at 17:44, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>
> >> Hi Peter,
> >>
> >> I think the vendor driver doesn't enable ASPM per default.
> >> So it's worth a try to disable ASPM in the BIOS or via sysfs.
> >> Few older systems seem to have issues with ASPM, what kind of
> >> system / mainboard are you using? The RTL8168 is the onboard
> >> network chip?
> >>
> >> Rgds, Heiner
> >>
> >>
> >> On 29.01.2019 07:20, Peter Ceiley wrote:
> >>> Hi Heiner,
> >>>
> >>> Thanks, I'll do some more testing. It might not be the driver - I
> >>> assumed it was due to the fact that using the r8168 driver 'resolves'
> >>> the issue. I'll see if I can test the r8169.c on top of 4.19 - this is
> >>> a good idea.
> >>>
> >>> Cheers,
> >>>
> >>> Peter.
> >>>
> >>> On Tue, 29 Jan 2019 at 17:16, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>>>
> >>>> Hi Peter,
> >>>>
> >>>> at a first glance it doesn't look like a typical driver issue.
> >>>> What you could do:
> >>>>
> >>>> - Test the r8169.c from 4.18 on top of 4.19.
> >>>>
> >>>> - Check whether disabling ASPM (/sys/module/pcie_aspm) has an effect.
> >>>>
> >>>> - Bisect between 4.18 and 4.19 to find the offending commit.
> >>>>
> >>>> Any specific reason why you think root cause is in the driver and not
> >>>> elsewhere in the network subsystem?
> >>>>
> >>>> Heiner
> >>>>
> >>>>
> >>>> On 28.01.2019 23:10, Peter Ceiley wrote:
> >>>>> Hi Heiner,
> >>>>>
> >>>>> Thanks for getting back to me.
> >>>>>
> >>>>> No, I don't use jumbo packets.
> >>>>>
> >>>>> Bandwidth is *generally* good, and iperf results to my NAS provide
> >>>>> over 900 Mbits/s in both circumstances. The issue seems to appear when
> >>>>> establishing a connection and is most notable, for example, on my
> >>>>> mounted NFS shares where it takes seconds (up to 10's of seconds on
> >>>>> larger directories) to list the contents of each directory. Once a
> >>>>> transfer begins on a file, I appear to get good bandwidth.
> >>>>>
> >>>>> I'm unsure of the best scientific data to provide you in order to
> >>>>> troubleshoot this issue. Running the following
> >>>>>
> >>>>> netstat -s |grep retransmitted
> >>>>>
> >>>>> shows a steady increase in retransmitted segments each time I list the
> >>>>> contents of a remote directory, for example, running 'ls' on a
> >>>>> directory containing 345 media files did the following using kernel
> >>>>> 4.19.18:
> >>>>>
> >>>>> increased retransmitted segments by 21 and the 'time' command showed
> >>>>> the following:
> >>>>> real 0m19.867s
> >>>>> user 0m0.012s
> >>>>> sys 0m0.036s
> >>>>>
> >>>>> The same command shows no retransmitted segments running kernel
> >>>>> 4.18.16 and 'time' showed:
> >>>>> real 0m0.300s
> >>>>> user 0m0.004s
> >>>>> sys 0m0.007s
> >>>>>
> >>>>> ifconfig does not show any RX/TX errors nor dropped packets in either case.
> >>>>>
> >>>>> dmesg XID:
> >>>>> [ 2.979984] r8169 0000:03:00.0 eth0: RTL8168g/8111g,
> >>>>> f8:b1:56:fe:67:e0, XID 4c000800, IRQ 32
> >>>>>
> >>>>> # lspci -vv
> >>>>> 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
> >>>>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
> >>>>> Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
> >>>>> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> >>>>> ParErr- Stepping- SERR- FastB2B- DisINTx+
> >>>>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> >>>>> <TAbort- <MAbort- >SERR- <PERR- INTx-
> >>>>> Latency: 0, Cache Line Size: 64 bytes
> >>>>> Interrupt: pin A routed to IRQ 19
> >>>>> Region 0: I/O ports at d000 [size=256]
> >>>>> Region 2: Memory at f7b00000 (64-bit, non-prefetchable) [size=4K]
> >>>>> Region 4: Memory at f2100000 (64-bit, prefetchable) [size=16K]
> >>>>> Capabilities: [40] Power Management version 3
> >>>>> Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
> >>>>> PME(D0+,D1+,D2+,D3hot+,D3cold+)
> >>>>> Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> >>>>> Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
> >>>>> Address: 0000000000000000 Data: 0000
> >>>>> Capabilities: [70] Express (v2) Endpoint, MSI 01
> >>>>> DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s
> >>>>> <512ns, L1 <64us
> >>>>> ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
> >>>>> SlotPowerLimit 10.000W
> >>>>> DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
> >>>>> RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
> >>>>> MaxPayload 128 bytes, MaxReadReq 4096 bytes
> >>>>> DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
> >>>>> LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
> >>>>> Latency L0s unlimited, L1 <64us
> >>>>> ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
> >>>>> LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
> >>>>> ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
> >>>>> LnkSta: Speed 2.5GT/s (ok), Width x1 (ok)
> >>>>> TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> >>>>> DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+,
> >>>>> OBFF Via message/WAKE#
> >>>>> AtomicOpsCap: 32bit- 64bit- 128bitCAS-
> >>>>> DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+,
> >>>>> OBFF Disabled
> >>>>> AtomicOpsCtl: ReqEn-
> >>>>> LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
> >>>>> Transmit Margin: Normal Operating Range,
> >>>>> EnterModifiedCompliance- ComplianceSOS-
> >>>>> Compliance De-emphasis: -6dB
> >>>>> LnkSta2: Current De-emphasis Level: -6dB,
> >>>>> EqualizationComplete-, EqualizationPhase1-
> >>>>> EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
> >>>>> Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
> >>>>> Vector table: BAR=4 offset=00000000
> >>>>> PBA: BAR=4 offset=00000800
> >>>>> Capabilities: [d0] Vital Product Data
> >>>>> pcilib: sysfs_read_vpd: read failed: Input/output error
> >>>>> Not readable
> >>>>> Capabilities: [100 v1] Advanced Error Reporting
> >>>>> UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> >>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> >>>>> UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> >>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> >>>>> UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
> >>>>> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> >>>>> CESta: RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ AdvNonFatalErr-
> >>>>> CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
> >>>>> AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn-
> >>>>> ECRCChkCap+ ECRCChkEn-
> >>>>> MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
> >>>>> HeaderLog: 00000000 00000000 00000000 00000000
> >>>>> Capabilities: [140 v1] Virtual Channel
> >>>>> Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
> >>>>> Arb: Fixed- WRR32- WRR64- WRR128-
> >>>>> Ctrl: ArbSelect=Fixed
> >>>>> Status: InProgress-
> >>>>> VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
> >>>>> Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
> >>>>> Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01
> >>>>> Status: NegoPending- InProgress-
> >>>>> Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
> >>>>> Capabilities: [170 v1] Latency Tolerance Reporting
> >>>>> Max snoop latency: 71680ns
> >>>>> Max no snoop latency: 71680ns
> >>>>> Kernel driver in use: r8169
> >>>>> Kernel modules: r8169
> >>>>>
> >>>>> Please let me know if you have any other ideas in terms of testing.
> >>>>>
> >>>>> Thanks!
> >>>>>
> >>>>> Peter.
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Tue, 29 Jan 2019 at 05:28, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>>>>>
> >>>>>> On 28.01.2019 12:13, Peter Ceiley wrote:
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> I have been experiencing very poor network performance since Kernel
> >>>>>>> 4.19 and I'm confident it's related to the r8169 driver.
> >>>>>>>
> >>>>>>> I have no issue with kernel versions 4.18 and prior. I am experiencing
> >>>>>>> this issue in kernels 4.19 and 4.20 (currently running/testing with
> >>>>>>> 4.20.4 & 4.19.18).
> >>>>>>>
> >>>>>>> If someone could guide me in the right direction, I'm happy to help
> >>>>>>> troubleshoot this issue. Note that I have been keeping an eye on one
> >>>>>>> issue related to loading of the PHY driver, however, my symptoms
> >>>>>>> differ in that I still have a network connection. I have attempted to
> >>>>>>> reload the driver on a running system, but this does not improve the
> >>>>>>> situation.
> >>>>>>>
> >>>>>>> Using the proprietary r8168 driver returns my device to proper working order.
> >>>>>>>
> >>>>>>> lshw shows:
> >>>>>>> description: Ethernet interface
> >>>>>>> product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
> >>>>>>> vendor: Realtek Semiconductor Co., Ltd.
> >>>>>>> physical id: 0
> >>>>>>> bus info: pci@0000:03:00.0
> >>>>>>> logical name: enp3s0
> >>>>>>> version: 0c
> >>>>>>> serial:
> >>>>>>> size: 1Gbit/s
> >>>>>>> capacity: 1Gbit/s
> >>>>>>> width: 64 bits
> >>>>>>> clock: 33MHz
> >>>>>>> capabilities: pm msi pciexpress msix vpd bus_master cap_list
> >>>>>>> ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd
> >>>>>>> 1000bt-fd autonegotiation
> >>>>>>> configuration: autonegotiation=on broadcast=yes driver=r8169
> >>>>>>> duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25
> >>>>>>> latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
> >>>>>>> resources: irq:19 ioport:d000(size=256)
> >>>>>>> memory:f7b00000-f7b00fff memory:f2100000-f2103fff
> >>>>>>>
> >>>>>>> Kind Regards,
> >>>>>>>
> >>>>>>> Peter.
> >>>>>>>
> >>>>>> Hi Peter,
> >>>>>>
> >>>>>> the description "poor network performance" is quite vague, therefore:
> >>>>>>
> >>>>>> - Can you provide any measurements?
> >>>>>> - iperf results before and after
> >>>>>> - statistics about dropped packets (rx and/or tx)
> >>>>>> - Do you use jumbo packets?
> >>>>>>
> >>>>>> Also help would be a "lspci -vv" output for the network card and
> >>>>>> the dmesg output line with the chip XID.
> >>>>>>
> >>>>>> Heiner
> >>>>>
> >>>>
> >>>
> >>
> >
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: r8169 Driver - Poor Network Performance Since Kernel 4.19
2019-01-31 2:32 ` David Chang
@ 2019-01-31 6:21 ` Heiner Kallweit
2019-01-31 6:35 ` Heiner Kallweit
2019-02-02 12:25 ` Heiner Kallweit
2019-02-05 18:50 ` Heiner Kallweit
2 siblings, 1 reply; 25+ messages in thread
From: Heiner Kallweit @ 2019-01-31 6:21 UTC (permalink / raw)
To: David Chang; +Cc: Peter Ceiley, Realtek linux nic maintainers, netdev
David, thanks for the link to the bug ticket.
I think only a proper bisect can help to find the offending commit.
Heiner
On 31.01.2019 03:32, David Chang wrote:
> Hi,
>
> We had a similr case here.
> - Realtek r8169 receive performance regression in kernel 4.19
> https://bugzilla.suse.com/show_bug.cgi?id=1119649
>
> kernel: r8169 0000:01:00.0 eth0: RTL8168h/8111h, XID 54100880
> The major symptom is there are many rx_missed count.
>
>
> On Jan 30, 2019 at 20:15:45 +0100, Heiner Kallweit wrote:
>> Hi Peter,
>>
>> recently I had somebody where pcie_aspm=off for whatever reason didn't
>> do the trick, can you also check with pcie_aspm.policy=performance.
>
> We will give it a try later.
>
>> And please check with "ethtool -S <if>" whether the chip statistics
>> show a significant number of errors.
>>
>> If this doesn't help you may have to bisect to find the offending commit.
>
> We had tried fallback driver to a few previous commits as following,
> but with no luck.
>
> 9675931e6b65 r8169: re-enable MSI-X on RTL8168g (v4.19)
> 098b01ad9837 r8169: don't include asm headers directly (v4.19-rc1)
> a2965f12fde6 r8169: remove rtl8169_set_speed_xmii (v4.19-rc1)
> 6fcf9b1d4d6c r8169: fix runtime suspend (v4.19-rc1)
> e397286b8e89 r8169: remove TBI 1000BaseX support (v4.19-rc1)
>
> Thanks,
> David Chang
>
>>
>> Heiner
>>
>>
>> On 30.01.2019 10:59, Peter Ceiley wrote:
>>> Hi Heiner,
>>>
>>> I tried disabling the ASPM using the pcie_aspm=off kernel parameter
>>> and this made no difference.
>>>
>>> I tried compiling the 4.18.16 r8169.c with the 4.19.18 source and
>>> subsequently loaded the module in the running 4.19.18 kernel. I can
>>> confirm that this immediately resolved the issue and access to the NFS
>>> shares operated as expected.
>>>
>>> I presume this means it is an issue with the r8169 driver included in
>>> 4.19 onwards?
>>>
>>> To answer your last questions:
>>>
>>> Base Board Information
>>> Manufacturer: Alienware
>>> Product Name: 0PGRP5
>>> Version: A02
>>>
>>> ... and yes, the RTL8168 is the onboard network chip.
>>>
>>> Regards,
>>>
>>> Peter.
>>>
>>> On Tue, 29 Jan 2019 at 17:44, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>
>>>> Hi Peter,
>>>>
>>>> I think the vendor driver doesn't enable ASPM per default.
>>>> So it's worth a try to disable ASPM in the BIOS or via sysfs.
>>>> Few older systems seem to have issues with ASPM, what kind of
>>>> system / mainboard are you using? The RTL8168 is the onboard
>>>> network chip?
>>>>
>>>> Rgds, Heiner
>>>>
>>>>
>>>> On 29.01.2019 07:20, Peter Ceiley wrote:
>>>>> Hi Heiner,
>>>>>
>>>>> Thanks, I'll do some more testing. It might not be the driver - I
>>>>> assumed it was due to the fact that using the r8168 driver 'resolves'
>>>>> the issue. I'll see if I can test the r8169.c on top of 4.19 - this is
>>>>> a good idea.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Peter.
>>>>>
>>>>> On Tue, 29 Jan 2019 at 17:16, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>>>
>>>>>> Hi Peter,
>>>>>>
>>>>>> at a first glance it doesn't look like a typical driver issue.
>>>>>> What you could do:
>>>>>>
>>>>>> - Test the r8169.c from 4.18 on top of 4.19.
>>>>>>
>>>>>> - Check whether disabling ASPM (/sys/module/pcie_aspm) has an effect.
>>>>>>
>>>>>> - Bisect between 4.18 and 4.19 to find the offending commit.
>>>>>>
>>>>>> Any specific reason why you think root cause is in the driver and not
>>>>>> elsewhere in the network subsystem?
>>>>>>
>>>>>> Heiner
>>>>>>
>>>>>>
>>>>>> On 28.01.2019 23:10, Peter Ceiley wrote:
>>>>>>> Hi Heiner,
>>>>>>>
>>>>>>> Thanks for getting back to me.
>>>>>>>
>>>>>>> No, I don't use jumbo packets.
>>>>>>>
>>>>>>> Bandwidth is *generally* good, and iperf results to my NAS provide
>>>>>>> over 900 Mbits/s in both circumstances. The issue seems to appear when
>>>>>>> establishing a connection and is most notable, for example, on my
>>>>>>> mounted NFS shares where it takes seconds (up to 10's of seconds on
>>>>>>> larger directories) to list the contents of each directory. Once a
>>>>>>> transfer begins on a file, I appear to get good bandwidth.
>>>>>>>
>>>>>>> I'm unsure of the best scientific data to provide you in order to
>>>>>>> troubleshoot this issue. Running the following
>>>>>>>
>>>>>>> netstat -s |grep retransmitted
>>>>>>>
>>>>>>> shows a steady increase in retransmitted segments each time I list the
>>>>>>> contents of a remote directory, for example, running 'ls' on a
>>>>>>> directory containing 345 media files did the following using kernel
>>>>>>> 4.19.18:
>>>>>>>
>>>>>>> increased retransmitted segments by 21 and the 'time' command showed
>>>>>>> the following:
>>>>>>> real 0m19.867s
>>>>>>> user 0m0.012s
>>>>>>> sys 0m0.036s
>>>>>>>
>>>>>>> The same command shows no retransmitted segments running kernel
>>>>>>> 4.18.16 and 'time' showed:
>>>>>>> real 0m0.300s
>>>>>>> user 0m0.004s
>>>>>>> sys 0m0.007s
>>>>>>>
>>>>>>> ifconfig does not show any RX/TX errors nor dropped packets in either case.
>>>>>>>
>>>>>>> dmesg XID:
>>>>>>> [ 2.979984] r8169 0000:03:00.0 eth0: RTL8168g/8111g,
>>>>>>> f8:b1:56:fe:67:e0, XID 4c000800, IRQ 32
>>>>>>>
>>>>>>> # lspci -vv
>>>>>>> 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
>>>>>>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
>>>>>>> Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
>>>>>>> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
>>>>>>> ParErr- Stepping- SERR- FastB2B- DisINTx+
>>>>>>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
>>>>>>> <TAbort- <MAbort- >SERR- <PERR- INTx-
>>>>>>> Latency: 0, Cache Line Size: 64 bytes
>>>>>>> Interrupt: pin A routed to IRQ 19
>>>>>>> Region 0: I/O ports at d000 [size=256]
>>>>>>> Region 2: Memory at f7b00000 (64-bit, non-prefetchable) [size=4K]
>>>>>>> Region 4: Memory at f2100000 (64-bit, prefetchable) [size=16K]
>>>>>>> Capabilities: [40] Power Management version 3
>>>>>>> Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
>>>>>>> PME(D0+,D1+,D2+,D3hot+,D3cold+)
>>>>>>> Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
>>>>>>> Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
>>>>>>> Address: 0000000000000000 Data: 0000
>>>>>>> Capabilities: [70] Express (v2) Endpoint, MSI 01
>>>>>>> DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s
>>>>>>> <512ns, L1 <64us
>>>>>>> ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
>>>>>>> SlotPowerLimit 10.000W
>>>>>>> DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
>>>>>>> RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
>>>>>>> MaxPayload 128 bytes, MaxReadReq 4096 bytes
>>>>>>> DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
>>>>>>> LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
>>>>>>> Latency L0s unlimited, L1 <64us
>>>>>>> ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
>>>>>>> LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
>>>>>>> ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
>>>>>>> LnkSta: Speed 2.5GT/s (ok), Width x1 (ok)
>>>>>>> TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
>>>>>>> DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+,
>>>>>>> OBFF Via message/WAKE#
>>>>>>> AtomicOpsCap: 32bit- 64bit- 128bitCAS-
>>>>>>> DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+,
>>>>>>> OBFF Disabled
>>>>>>> AtomicOpsCtl: ReqEn-
>>>>>>> LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
>>>>>>> Transmit Margin: Normal Operating Range,
>>>>>>> EnterModifiedCompliance- ComplianceSOS-
>>>>>>> Compliance De-emphasis: -6dB
>>>>>>> LnkSta2: Current De-emphasis Level: -6dB,
>>>>>>> EqualizationComplete-, EqualizationPhase1-
>>>>>>> EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
>>>>>>> Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
>>>>>>> Vector table: BAR=4 offset=00000000
>>>>>>> PBA: BAR=4 offset=00000800
>>>>>>> Capabilities: [d0] Vital Product Data
>>>>>>> pcilib: sysfs_read_vpd: read failed: Input/output error
>>>>>>> Not readable
>>>>>>> Capabilities: [100 v1] Advanced Error Reporting
>>>>>>> UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>>>>>> UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>>>>>> UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
>>>>>>> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>>>>>>> CESta: RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ AdvNonFatalErr-
>>>>>>> CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
>>>>>>> AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn-
>>>>>>> ECRCChkCap+ ECRCChkEn-
>>>>>>> MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
>>>>>>> HeaderLog: 00000000 00000000 00000000 00000000
>>>>>>> Capabilities: [140 v1] Virtual Channel
>>>>>>> Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
>>>>>>> Arb: Fixed- WRR32- WRR64- WRR128-
>>>>>>> Ctrl: ArbSelect=Fixed
>>>>>>> Status: InProgress-
>>>>>>> VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
>>>>>>> Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
>>>>>>> Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01
>>>>>>> Status: NegoPending- InProgress-
>>>>>>> Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
>>>>>>> Capabilities: [170 v1] Latency Tolerance Reporting
>>>>>>> Max snoop latency: 71680ns
>>>>>>> Max no snoop latency: 71680ns
>>>>>>> Kernel driver in use: r8169
>>>>>>> Kernel modules: r8169
>>>>>>>
>>>>>>> Please let me know if you have any other ideas in terms of testing.
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>> Peter.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, 29 Jan 2019 at 05:28, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>>>>>
>>>>>>>> On 28.01.2019 12:13, Peter Ceiley wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I have been experiencing very poor network performance since Kernel
>>>>>>>>> 4.19 and I'm confident it's related to the r8169 driver.
>>>>>>>>>
>>>>>>>>> I have no issue with kernel versions 4.18 and prior. I am experiencing
>>>>>>>>> this issue in kernels 4.19 and 4.20 (currently running/testing with
>>>>>>>>> 4.20.4 & 4.19.18).
>>>>>>>>>
>>>>>>>>> If someone could guide me in the right direction, I'm happy to help
>>>>>>>>> troubleshoot this issue. Note that I have been keeping an eye on one
>>>>>>>>> issue related to loading of the PHY driver, however, my symptoms
>>>>>>>>> differ in that I still have a network connection. I have attempted to
>>>>>>>>> reload the driver on a running system, but this does not improve the
>>>>>>>>> situation.
>>>>>>>>>
>>>>>>>>> Using the proprietary r8168 driver returns my device to proper working order.
>>>>>>>>>
>>>>>>>>> lshw shows:
>>>>>>>>> description: Ethernet interface
>>>>>>>>> product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
>>>>>>>>> vendor: Realtek Semiconductor Co., Ltd.
>>>>>>>>> physical id: 0
>>>>>>>>> bus info: pci@0000:03:00.0
>>>>>>>>> logical name: enp3s0
>>>>>>>>> version: 0c
>>>>>>>>> serial:
>>>>>>>>> size: 1Gbit/s
>>>>>>>>> capacity: 1Gbit/s
>>>>>>>>> width: 64 bits
>>>>>>>>> clock: 33MHz
>>>>>>>>> capabilities: pm msi pciexpress msix vpd bus_master cap_list
>>>>>>>>> ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd
>>>>>>>>> 1000bt-fd autonegotiation
>>>>>>>>> configuration: autonegotiation=on broadcast=yes driver=r8169
>>>>>>>>> duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25
>>>>>>>>> latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
>>>>>>>>> resources: irq:19 ioport:d000(size=256)
>>>>>>>>> memory:f7b00000-f7b00fff memory:f2100000-f2103fff
>>>>>>>>>
>>>>>>>>> Kind Regards,
>>>>>>>>>
>>>>>>>>> Peter.
>>>>>>>>>
>>>>>>>> Hi Peter,
>>>>>>>>
>>>>>>>> the description "poor network performance" is quite vague, therefore:
>>>>>>>>
>>>>>>>> - Can you provide any measurements?
>>>>>>>> - iperf results before and after
>>>>>>>> - statistics about dropped packets (rx and/or tx)
>>>>>>>> - Do you use jumbo packets?
>>>>>>>>
>>>>>>>> Also help would be a "lspci -vv" output for the network card and
>>>>>>>> the dmesg output line with the chip XID.
>>>>>>>>
>>>>>>>> Heiner
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: r8169 Driver - Poor Network Performance Since Kernel 4.19
2019-01-31 6:21 ` Heiner Kallweit
@ 2019-01-31 6:35 ` Heiner Kallweit
2019-01-31 6:49 ` Heiner Kallweit
` (2 more replies)
0 siblings, 3 replies; 25+ messages in thread
From: Heiner Kallweit @ 2019-01-31 6:35 UTC (permalink / raw)
To: David Chang; +Cc: Peter Ceiley, Realtek linux nic maintainers, netdev
Hi David, two more things:
1. Could you please test a recent linux-next kernel?
2. Please get a register dump (ethtool -d <if>) from 4.18 and 4.19
and compare them.
Heiner
On 31.01.2019 07:21, Heiner Kallweit wrote:
> David, thanks for the link to the bug ticket.
> I think only a proper bisect can help to find the offending commit.
>
> Heiner
>
>
> On 31.01.2019 03:32, David Chang wrote:
>> Hi,
>>
>> We had a similr case here.
>> - Realtek r8169 receive performance regression in kernel 4.19
>> https://bugzilla.suse.com/show_bug.cgi?id=1119649
>>
>> kernel: r8169 0000:01:00.0 eth0: RTL8168h/8111h, XID 54100880
>> The major symptom is there are many rx_missed count.
>>
>>
>> On Jan 30, 2019 at 20:15:45 +0100, Heiner Kallweit wrote:
>>> Hi Peter,
>>>
>>> recently I had somebody where pcie_aspm=off for whatever reason didn't
>>> do the trick, can you also check with pcie_aspm.policy=performance.
>>
>> We will give it a try later.
>>
>>> And please check with "ethtool -S <if>" whether the chip statistics
>>> show a significant number of errors.
>>>
>>> If this doesn't help you may have to bisect to find the offending commit.
>>
>> We had tried fallback driver to a few previous commits as following,
>> but with no luck.
>>
>> 9675931e6b65 r8169: re-enable MSI-X on RTL8168g (v4.19)
>> 098b01ad9837 r8169: don't include asm headers directly (v4.19-rc1)
>> a2965f12fde6 r8169: remove rtl8169_set_speed_xmii (v4.19-rc1)
>> 6fcf9b1d4d6c r8169: fix runtime suspend (v4.19-rc1)
>> e397286b8e89 r8169: remove TBI 1000BaseX support (v4.19-rc1)
>>
>> Thanks,
>> David Chang
>>
>>>
>>> Heiner
>>>
>>>
>>> On 30.01.2019 10:59, Peter Ceiley wrote:
>>>> Hi Heiner,
>>>>
>>>> I tried disabling the ASPM using the pcie_aspm=off kernel parameter
>>>> and this made no difference.
>>>>
>>>> I tried compiling the 4.18.16 r8169.c with the 4.19.18 source and
>>>> subsequently loaded the module in the running 4.19.18 kernel. I can
>>>> confirm that this immediately resolved the issue and access to the NFS
>>>> shares operated as expected.
>>>>
>>>> I presume this means it is an issue with the r8169 driver included in
>>>> 4.19 onwards?
>>>>
>>>> To answer your last questions:
>>>>
>>>> Base Board Information
>>>> Manufacturer: Alienware
>>>> Product Name: 0PGRP5
>>>> Version: A02
>>>>
>>>> ... and yes, the RTL8168 is the onboard network chip.
>>>>
>>>> Regards,
>>>>
>>>> Peter.
>>>>
>>>> On Tue, 29 Jan 2019 at 17:44, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>>
>>>>> Hi Peter,
>>>>>
>>>>> I think the vendor driver doesn't enable ASPM per default.
>>>>> So it's worth a try to disable ASPM in the BIOS or via sysfs.
>>>>> Few older systems seem to have issues with ASPM, what kind of
>>>>> system / mainboard are you using? The RTL8168 is the onboard
>>>>> network chip?
>>>>>
>>>>> Rgds, Heiner
>>>>>
>>>>>
>>>>> On 29.01.2019 07:20, Peter Ceiley wrote:
>>>>>> Hi Heiner,
>>>>>>
>>>>>> Thanks, I'll do some more testing. It might not be the driver - I
>>>>>> assumed it was due to the fact that using the r8168 driver 'resolves'
>>>>>> the issue. I'll see if I can test the r8169.c on top of 4.19 - this is
>>>>>> a good idea.
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Peter.
>>>>>>
>>>>>> On Tue, 29 Jan 2019 at 17:16, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>>>>
>>>>>>> Hi Peter,
>>>>>>>
>>>>>>> at a first glance it doesn't look like a typical driver issue.
>>>>>>> What you could do:
>>>>>>>
>>>>>>> - Test the r8169.c from 4.18 on top of 4.19.
>>>>>>>
>>>>>>> - Check whether disabling ASPM (/sys/module/pcie_aspm) has an effect.
>>>>>>>
>>>>>>> - Bisect between 4.18 and 4.19 to find the offending commit.
>>>>>>>
>>>>>>> Any specific reason why you think root cause is in the driver and not
>>>>>>> elsewhere in the network subsystem?
>>>>>>>
>>>>>>> Heiner
>>>>>>>
>>>>>>>
>>>>>>> On 28.01.2019 23:10, Peter Ceiley wrote:
>>>>>>>> Hi Heiner,
>>>>>>>>
>>>>>>>> Thanks for getting back to me.
>>>>>>>>
>>>>>>>> No, I don't use jumbo packets.
>>>>>>>>
>>>>>>>> Bandwidth is *generally* good, and iperf results to my NAS provide
>>>>>>>> over 900 Mbits/s in both circumstances. The issue seems to appear when
>>>>>>>> establishing a connection and is most notable, for example, on my
>>>>>>>> mounted NFS shares where it takes seconds (up to 10's of seconds on
>>>>>>>> larger directories) to list the contents of each directory. Once a
>>>>>>>> transfer begins on a file, I appear to get good bandwidth.
>>>>>>>>
>>>>>>>> I'm unsure of the best scientific data to provide you in order to
>>>>>>>> troubleshoot this issue. Running the following
>>>>>>>>
>>>>>>>> netstat -s |grep retransmitted
>>>>>>>>
>>>>>>>> shows a steady increase in retransmitted segments each time I list the
>>>>>>>> contents of a remote directory, for example, running 'ls' on a
>>>>>>>> directory containing 345 media files did the following using kernel
>>>>>>>> 4.19.18:
>>>>>>>>
>>>>>>>> increased retransmitted segments by 21 and the 'time' command showed
>>>>>>>> the following:
>>>>>>>> real 0m19.867s
>>>>>>>> user 0m0.012s
>>>>>>>> sys 0m0.036s
>>>>>>>>
>>>>>>>> The same command shows no retransmitted segments running kernel
>>>>>>>> 4.18.16 and 'time' showed:
>>>>>>>> real 0m0.300s
>>>>>>>> user 0m0.004s
>>>>>>>> sys 0m0.007s
>>>>>>>>
>>>>>>>> ifconfig does not show any RX/TX errors nor dropped packets in either case.
>>>>>>>>
>>>>>>>> dmesg XID:
>>>>>>>> [ 2.979984] r8169 0000:03:00.0 eth0: RTL8168g/8111g,
>>>>>>>> f8:b1:56:fe:67:e0, XID 4c000800, IRQ 32
>>>>>>>>
>>>>>>>> # lspci -vv
>>>>>>>> 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
>>>>>>>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
>>>>>>>> Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
>>>>>>>> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
>>>>>>>> ParErr- Stepping- SERR- FastB2B- DisINTx+
>>>>>>>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
>>>>>>>> <TAbort- <MAbort- >SERR- <PERR- INTx-
>>>>>>>> Latency: 0, Cache Line Size: 64 bytes
>>>>>>>> Interrupt: pin A routed to IRQ 19
>>>>>>>> Region 0: I/O ports at d000 [size=256]
>>>>>>>> Region 2: Memory at f7b00000 (64-bit, non-prefetchable) [size=4K]
>>>>>>>> Region 4: Memory at f2100000 (64-bit, prefetchable) [size=16K]
>>>>>>>> Capabilities: [40] Power Management version 3
>>>>>>>> Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
>>>>>>>> PME(D0+,D1+,D2+,D3hot+,D3cold+)
>>>>>>>> Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
>>>>>>>> Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
>>>>>>>> Address: 0000000000000000 Data: 0000
>>>>>>>> Capabilities: [70] Express (v2) Endpoint, MSI 01
>>>>>>>> DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s
>>>>>>>> <512ns, L1 <64us
>>>>>>>> ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
>>>>>>>> SlotPowerLimit 10.000W
>>>>>>>> DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
>>>>>>>> RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
>>>>>>>> MaxPayload 128 bytes, MaxReadReq 4096 bytes
>>>>>>>> DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
>>>>>>>> LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
>>>>>>>> Latency L0s unlimited, L1 <64us
>>>>>>>> ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
>>>>>>>> LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
>>>>>>>> ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
>>>>>>>> LnkSta: Speed 2.5GT/s (ok), Width x1 (ok)
>>>>>>>> TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
>>>>>>>> DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+,
>>>>>>>> OBFF Via message/WAKE#
>>>>>>>> AtomicOpsCap: 32bit- 64bit- 128bitCAS-
>>>>>>>> DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+,
>>>>>>>> OBFF Disabled
>>>>>>>> AtomicOpsCtl: ReqEn-
>>>>>>>> LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
>>>>>>>> Transmit Margin: Normal Operating Range,
>>>>>>>> EnterModifiedCompliance- ComplianceSOS-
>>>>>>>> Compliance De-emphasis: -6dB
>>>>>>>> LnkSta2: Current De-emphasis Level: -6dB,
>>>>>>>> EqualizationComplete-, EqualizationPhase1-
>>>>>>>> EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
>>>>>>>> Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
>>>>>>>> Vector table: BAR=4 offset=00000000
>>>>>>>> PBA: BAR=4 offset=00000800
>>>>>>>> Capabilities: [d0] Vital Product Data
>>>>>>>> pcilib: sysfs_read_vpd: read failed: Input/output error
>>>>>>>> Not readable
>>>>>>>> Capabilities: [100 v1] Advanced Error Reporting
>>>>>>>> UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
>>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>>>>>>> UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
>>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>>>>>>> UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
>>>>>>>> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>>>>>>>> CESta: RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ AdvNonFatalErr-
>>>>>>>> CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
>>>>>>>> AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn-
>>>>>>>> ECRCChkCap+ ECRCChkEn-
>>>>>>>> MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
>>>>>>>> HeaderLog: 00000000 00000000 00000000 00000000
>>>>>>>> Capabilities: [140 v1] Virtual Channel
>>>>>>>> Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
>>>>>>>> Arb: Fixed- WRR32- WRR64- WRR128-
>>>>>>>> Ctrl: ArbSelect=Fixed
>>>>>>>> Status: InProgress-
>>>>>>>> VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
>>>>>>>> Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
>>>>>>>> Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01
>>>>>>>> Status: NegoPending- InProgress-
>>>>>>>> Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
>>>>>>>> Capabilities: [170 v1] Latency Tolerance Reporting
>>>>>>>> Max snoop latency: 71680ns
>>>>>>>> Max no snoop latency: 71680ns
>>>>>>>> Kernel driver in use: r8169
>>>>>>>> Kernel modules: r8169
>>>>>>>>
>>>>>>>> Please let me know if you have any other ideas in terms of testing.
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>> Peter.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, 29 Jan 2019 at 05:28, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> On 28.01.2019 12:13, Peter Ceiley wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I have been experiencing very poor network performance since Kernel
>>>>>>>>>> 4.19 and I'm confident it's related to the r8169 driver.
>>>>>>>>>>
>>>>>>>>>> I have no issue with kernel versions 4.18 and prior. I am experiencing
>>>>>>>>>> this issue in kernels 4.19 and 4.20 (currently running/testing with
>>>>>>>>>> 4.20.4 & 4.19.18).
>>>>>>>>>>
>>>>>>>>>> If someone could guide me in the right direction, I'm happy to help
>>>>>>>>>> troubleshoot this issue. Note that I have been keeping an eye on one
>>>>>>>>>> issue related to loading of the PHY driver, however, my symptoms
>>>>>>>>>> differ in that I still have a network connection. I have attempted to
>>>>>>>>>> reload the driver on a running system, but this does not improve the
>>>>>>>>>> situation.
>>>>>>>>>>
>>>>>>>>>> Using the proprietary r8168 driver returns my device to proper working order.
>>>>>>>>>>
>>>>>>>>>> lshw shows:
>>>>>>>>>> description: Ethernet interface
>>>>>>>>>> product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
>>>>>>>>>> vendor: Realtek Semiconductor Co., Ltd.
>>>>>>>>>> physical id: 0
>>>>>>>>>> bus info: pci@0000:03:00.0
>>>>>>>>>> logical name: enp3s0
>>>>>>>>>> version: 0c
>>>>>>>>>> serial:
>>>>>>>>>> size: 1Gbit/s
>>>>>>>>>> capacity: 1Gbit/s
>>>>>>>>>> width: 64 bits
>>>>>>>>>> clock: 33MHz
>>>>>>>>>> capabilities: pm msi pciexpress msix vpd bus_master cap_list
>>>>>>>>>> ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd
>>>>>>>>>> 1000bt-fd autonegotiation
>>>>>>>>>> configuration: autonegotiation=on broadcast=yes driver=r8169
>>>>>>>>>> duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25
>>>>>>>>>> latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
>>>>>>>>>> resources: irq:19 ioport:d000(size=256)
>>>>>>>>>> memory:f7b00000-f7b00fff memory:f2100000-f2103fff
>>>>>>>>>>
>>>>>>>>>> Kind Regards,
>>>>>>>>>>
>>>>>>>>>> Peter.
>>>>>>>>>>
>>>>>>>>> Hi Peter,
>>>>>>>>>
>>>>>>>>> the description "poor network performance" is quite vague, therefore:
>>>>>>>>>
>>>>>>>>> - Can you provide any measurements?
>>>>>>>>> - iperf results before and after
>>>>>>>>> - statistics about dropped packets (rx and/or tx)
>>>>>>>>> - Do you use jumbo packets?
>>>>>>>>>
>>>>>>>>> Also help would be a "lspci -vv" output for the network card and
>>>>>>>>> the dmesg output line with the chip XID.
>>>>>>>>>
>>>>>>>>> Heiner
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: r8169 Driver - Poor Network Performance Since Kernel 4.19
2019-01-31 6:35 ` Heiner Kallweit
@ 2019-01-31 6:49 ` Heiner Kallweit
2019-01-31 7:23 ` David Chang
2019-02-01 4:29 ` David Chang
2 siblings, 0 replies; 25+ messages in thread
From: Heiner Kallweit @ 2019-01-31 6:49 UTC (permalink / raw)
To: David Chang; +Cc: Peter Ceiley, Realtek linux nic maintainers, netdev
And one more inquiry ..
So far I read about the issue only in combination with NFS.
Does the issue also occur with iperf or some other type of
high network load?
Heiner
On 31.01.2019 07:35, Heiner Kallweit wrote:
> Hi David, two more things:
>
> 1. Could you please test a recent linux-next kernel?
> 2. Please get a register dump (ethtool -d <if>) from 4.18 and 4.19
> and compare them.
>
> Heiner
>
>
> On 31.01.2019 07:21, Heiner Kallweit wrote:
>> David, thanks for the link to the bug ticket.
>> I think only a proper bisect can help to find the offending commit.
>>
>> Heiner
>>
>>
>> On 31.01.2019 03:32, David Chang wrote:
>>> Hi,
>>>
>>> We had a similr case here.
>>> - Realtek r8169 receive performance regression in kernel 4.19
>>> https://bugzilla.suse.com/show_bug.cgi?id=1119649
>>>
>>> kernel: r8169 0000:01:00.0 eth0: RTL8168h/8111h, XID 54100880
>>> The major symptom is there are many rx_missed count.
>>>
>>>
>>> On Jan 30, 2019 at 20:15:45 +0100, Heiner Kallweit wrote:
>>>> Hi Peter,
>>>>
>>>> recently I had somebody where pcie_aspm=off for whatever reason didn't
>>>> do the trick, can you also check with pcie_aspm.policy=performance.
>>>
>>> We will give it a try later.
>>>
>>>> And please check with "ethtool -S <if>" whether the chip statistics
>>>> show a significant number of errors.
>>>>
>>>> If this doesn't help you may have to bisect to find the offending commit.
>>>
>>> We had tried fallback driver to a few previous commits as following,
>>> but with no luck.
>>>
>>> 9675931e6b65 r8169: re-enable MSI-X on RTL8168g (v4.19)
>>> 098b01ad9837 r8169: don't include asm headers directly (v4.19-rc1)
>>> a2965f12fde6 r8169: remove rtl8169_set_speed_xmii (v4.19-rc1)
>>> 6fcf9b1d4d6c r8169: fix runtime suspend (v4.19-rc1)
>>> e397286b8e89 r8169: remove TBI 1000BaseX support (v4.19-rc1)
>>>
>>> Thanks,
>>> David Chang
>>>
>>>>
>>>> Heiner
>>>>
>>>>
>>>> On 30.01.2019 10:59, Peter Ceiley wrote:
>>>>> Hi Heiner,
>>>>>
>>>>> I tried disabling the ASPM using the pcie_aspm=off kernel parameter
>>>>> and this made no difference.
>>>>>
>>>>> I tried compiling the 4.18.16 r8169.c with the 4.19.18 source and
>>>>> subsequently loaded the module in the running 4.19.18 kernel. I can
>>>>> confirm that this immediately resolved the issue and access to the NFS
>>>>> shares operated as expected.
>>>>>
>>>>> I presume this means it is an issue with the r8169 driver included in
>>>>> 4.19 onwards?
>>>>>
>>>>> To answer your last questions:
>>>>>
>>>>> Base Board Information
>>>>> Manufacturer: Alienware
>>>>> Product Name: 0PGRP5
>>>>> Version: A02
>>>>>
>>>>> ... and yes, the RTL8168 is the onboard network chip.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Peter.
>>>>>
>>>>> On Tue, 29 Jan 2019 at 17:44, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>>>
>>>>>> Hi Peter,
>>>>>>
>>>>>> I think the vendor driver doesn't enable ASPM per default.
>>>>>> So it's worth a try to disable ASPM in the BIOS or via sysfs.
>>>>>> Few older systems seem to have issues with ASPM, what kind of
>>>>>> system / mainboard are you using? The RTL8168 is the onboard
>>>>>> network chip?
>>>>>>
>>>>>> Rgds, Heiner
>>>>>>
>>>>>>
>>>>>> On 29.01.2019 07:20, Peter Ceiley wrote:
>>>>>>> Hi Heiner,
>>>>>>>
>>>>>>> Thanks, I'll do some more testing. It might not be the driver - I
>>>>>>> assumed it was due to the fact that using the r8168 driver 'resolves'
>>>>>>> the issue. I'll see if I can test the r8169.c on top of 4.19 - this is
>>>>>>> a good idea.
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> Peter.
>>>>>>>
>>>>>>> On Tue, 29 Jan 2019 at 17:16, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>>>>>
>>>>>>>> Hi Peter,
>>>>>>>>
>>>>>>>> at a first glance it doesn't look like a typical driver issue.
>>>>>>>> What you could do:
>>>>>>>>
>>>>>>>> - Test the r8169.c from 4.18 on top of 4.19.
>>>>>>>>
>>>>>>>> - Check whether disabling ASPM (/sys/module/pcie_aspm) has an effect.
>>>>>>>>
>>>>>>>> - Bisect between 4.18 and 4.19 to find the offending commit.
>>>>>>>>
>>>>>>>> Any specific reason why you think root cause is in the driver and not
>>>>>>>> elsewhere in the network subsystem?
>>>>>>>>
>>>>>>>> Heiner
>>>>>>>>
>>>>>>>>
>>>>>>>> On 28.01.2019 23:10, Peter Ceiley wrote:
>>>>>>>>> Hi Heiner,
>>>>>>>>>
>>>>>>>>> Thanks for getting back to me.
>>>>>>>>>
>>>>>>>>> No, I don't use jumbo packets.
>>>>>>>>>
>>>>>>>>> Bandwidth is *generally* good, and iperf results to my NAS provide
>>>>>>>>> over 900 Mbits/s in both circumstances. The issue seems to appear when
>>>>>>>>> establishing a connection and is most notable, for example, on my
>>>>>>>>> mounted NFS shares where it takes seconds (up to 10's of seconds on
>>>>>>>>> larger directories) to list the contents of each directory. Once a
>>>>>>>>> transfer begins on a file, I appear to get good bandwidth.
>>>>>>>>>
>>>>>>>>> I'm unsure of the best scientific data to provide you in order to
>>>>>>>>> troubleshoot this issue. Running the following
>>>>>>>>>
>>>>>>>>> netstat -s |grep retransmitted
>>>>>>>>>
>>>>>>>>> shows a steady increase in retransmitted segments each time I list the
>>>>>>>>> contents of a remote directory, for example, running 'ls' on a
>>>>>>>>> directory containing 345 media files did the following using kernel
>>>>>>>>> 4.19.18:
>>>>>>>>>
>>>>>>>>> increased retransmitted segments by 21 and the 'time' command showed
>>>>>>>>> the following:
>>>>>>>>> real 0m19.867s
>>>>>>>>> user 0m0.012s
>>>>>>>>> sys 0m0.036s
>>>>>>>>>
>>>>>>>>> The same command shows no retransmitted segments running kernel
>>>>>>>>> 4.18.16 and 'time' showed:
>>>>>>>>> real 0m0.300s
>>>>>>>>> user 0m0.004s
>>>>>>>>> sys 0m0.007s
>>>>>>>>>
>>>>>>>>> ifconfig does not show any RX/TX errors nor dropped packets in either case.
>>>>>>>>>
>>>>>>>>> dmesg XID:
>>>>>>>>> [ 2.979984] r8169 0000:03:00.0 eth0: RTL8168g/8111g,
>>>>>>>>> f8:b1:56:fe:67:e0, XID 4c000800, IRQ 32
>>>>>>>>>
>>>>>>>>> # lspci -vv
>>>>>>>>> 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
>>>>>>>>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
>>>>>>>>> Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
>>>>>>>>> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
>>>>>>>>> ParErr- Stepping- SERR- FastB2B- DisINTx+
>>>>>>>>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
>>>>>>>>> <TAbort- <MAbort- >SERR- <PERR- INTx-
>>>>>>>>> Latency: 0, Cache Line Size: 64 bytes
>>>>>>>>> Interrupt: pin A routed to IRQ 19
>>>>>>>>> Region 0: I/O ports at d000 [size=256]
>>>>>>>>> Region 2: Memory at f7b00000 (64-bit, non-prefetchable) [size=4K]
>>>>>>>>> Region 4: Memory at f2100000 (64-bit, prefetchable) [size=16K]
>>>>>>>>> Capabilities: [40] Power Management version 3
>>>>>>>>> Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
>>>>>>>>> PME(D0+,D1+,D2+,D3hot+,D3cold+)
>>>>>>>>> Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
>>>>>>>>> Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
>>>>>>>>> Address: 0000000000000000 Data: 0000
>>>>>>>>> Capabilities: [70] Express (v2) Endpoint, MSI 01
>>>>>>>>> DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s
>>>>>>>>> <512ns, L1 <64us
>>>>>>>>> ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
>>>>>>>>> SlotPowerLimit 10.000W
>>>>>>>>> DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
>>>>>>>>> RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
>>>>>>>>> MaxPayload 128 bytes, MaxReadReq 4096 bytes
>>>>>>>>> DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
>>>>>>>>> LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
>>>>>>>>> Latency L0s unlimited, L1 <64us
>>>>>>>>> ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
>>>>>>>>> LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
>>>>>>>>> ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
>>>>>>>>> LnkSta: Speed 2.5GT/s (ok), Width x1 (ok)
>>>>>>>>> TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
>>>>>>>>> DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+,
>>>>>>>>> OBFF Via message/WAKE#
>>>>>>>>> AtomicOpsCap: 32bit- 64bit- 128bitCAS-
>>>>>>>>> DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+,
>>>>>>>>> OBFF Disabled
>>>>>>>>> AtomicOpsCtl: ReqEn-
>>>>>>>>> LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
>>>>>>>>> Transmit Margin: Normal Operating Range,
>>>>>>>>> EnterModifiedCompliance- ComplianceSOS-
>>>>>>>>> Compliance De-emphasis: -6dB
>>>>>>>>> LnkSta2: Current De-emphasis Level: -6dB,
>>>>>>>>> EqualizationComplete-, EqualizationPhase1-
>>>>>>>>> EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
>>>>>>>>> Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
>>>>>>>>> Vector table: BAR=4 offset=00000000
>>>>>>>>> PBA: BAR=4 offset=00000800
>>>>>>>>> Capabilities: [d0] Vital Product Data
>>>>>>>>> pcilib: sysfs_read_vpd: read failed: Input/output error
>>>>>>>>> Not readable
>>>>>>>>> Capabilities: [100 v1] Advanced Error Reporting
>>>>>>>>> UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
>>>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>>>>>>>> UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
>>>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>>>>>>>> UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
>>>>>>>>> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>>>>>>>>> CESta: RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ AdvNonFatalErr-
>>>>>>>>> CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
>>>>>>>>> AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn-
>>>>>>>>> ECRCChkCap+ ECRCChkEn-
>>>>>>>>> MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
>>>>>>>>> HeaderLog: 00000000 00000000 00000000 00000000
>>>>>>>>> Capabilities: [140 v1] Virtual Channel
>>>>>>>>> Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
>>>>>>>>> Arb: Fixed- WRR32- WRR64- WRR128-
>>>>>>>>> Ctrl: ArbSelect=Fixed
>>>>>>>>> Status: InProgress-
>>>>>>>>> VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
>>>>>>>>> Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
>>>>>>>>> Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01
>>>>>>>>> Status: NegoPending- InProgress-
>>>>>>>>> Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
>>>>>>>>> Capabilities: [170 v1] Latency Tolerance Reporting
>>>>>>>>> Max snoop latency: 71680ns
>>>>>>>>> Max no snoop latency: 71680ns
>>>>>>>>> Kernel driver in use: r8169
>>>>>>>>> Kernel modules: r8169
>>>>>>>>>
>>>>>>>>> Please let me know if you have any other ideas in terms of testing.
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>>
>>>>>>>>> Peter.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, 29 Jan 2019 at 05:28, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>> On 28.01.2019 12:13, Peter Ceiley wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I have been experiencing very poor network performance since Kernel
>>>>>>>>>>> 4.19 and I'm confident it's related to the r8169 driver.
>>>>>>>>>>>
>>>>>>>>>>> I have no issue with kernel versions 4.18 and prior. I am experiencing
>>>>>>>>>>> this issue in kernels 4.19 and 4.20 (currently running/testing with
>>>>>>>>>>> 4.20.4 & 4.19.18).
>>>>>>>>>>>
>>>>>>>>>>> If someone could guide me in the right direction, I'm happy to help
>>>>>>>>>>> troubleshoot this issue. Note that I have been keeping an eye on one
>>>>>>>>>>> issue related to loading of the PHY driver, however, my symptoms
>>>>>>>>>>> differ in that I still have a network connection. I have attempted to
>>>>>>>>>>> reload the driver on a running system, but this does not improve the
>>>>>>>>>>> situation.
>>>>>>>>>>>
>>>>>>>>>>> Using the proprietary r8168 driver returns my device to proper working order.
>>>>>>>>>>>
>>>>>>>>>>> lshw shows:
>>>>>>>>>>> description: Ethernet interface
>>>>>>>>>>> product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
>>>>>>>>>>> vendor: Realtek Semiconductor Co., Ltd.
>>>>>>>>>>> physical id: 0
>>>>>>>>>>> bus info: pci@0000:03:00.0
>>>>>>>>>>> logical name: enp3s0
>>>>>>>>>>> version: 0c
>>>>>>>>>>> serial:
>>>>>>>>>>> size: 1Gbit/s
>>>>>>>>>>> capacity: 1Gbit/s
>>>>>>>>>>> width: 64 bits
>>>>>>>>>>> clock: 33MHz
>>>>>>>>>>> capabilities: pm msi pciexpress msix vpd bus_master cap_list
>>>>>>>>>>> ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd
>>>>>>>>>>> 1000bt-fd autonegotiation
>>>>>>>>>>> configuration: autonegotiation=on broadcast=yes driver=r8169
>>>>>>>>>>> duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25
>>>>>>>>>>> latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
>>>>>>>>>>> resources: irq:19 ioport:d000(size=256)
>>>>>>>>>>> memory:f7b00000-f7b00fff memory:f2100000-f2103fff
>>>>>>>>>>>
>>>>>>>>>>> Kind Regards,
>>>>>>>>>>>
>>>>>>>>>>> Peter.
>>>>>>>>>>>
>>>>>>>>>> Hi Peter,
>>>>>>>>>>
>>>>>>>>>> the description "poor network performance" is quite vague, therefore:
>>>>>>>>>>
>>>>>>>>>> - Can you provide any measurements?
>>>>>>>>>> - iperf results before and after
>>>>>>>>>> - statistics about dropped packets (rx and/or tx)
>>>>>>>>>> - Do you use jumbo packets?
>>>>>>>>>>
>>>>>>>>>> Also help would be a "lspci -vv" output for the network card and
>>>>>>>>>> the dmesg output line with the chip XID.
>>>>>>>>>>
>>>>>>>>>> Heiner
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: r8169 Driver - Poor Network Performance Since Kernel 4.19
2019-01-31 6:35 ` Heiner Kallweit
2019-01-31 6:49 ` Heiner Kallweit
@ 2019-01-31 7:23 ` David Chang
2019-01-31 12:09 ` Peter Ceiley
2019-02-01 4:29 ` David Chang
2 siblings, 1 reply; 25+ messages in thread
From: David Chang @ 2019-01-31 7:23 UTC (permalink / raw)
To: Heiner Kallweit; +Cc: Peter Ceiley, Realtek linux nic maintainers, netdev
Hi Heiner,
On Jan 31, 2019 at 07:35:30 +0100, Heiner Kallweit wrote:
> Hi David, two more things:
>
> 1. Could you please test a recent linux-next kernel?
> 2. Please get a register dump (ethtool -d <if>) from 4.18 and 4.19
> and compare them.
I'm sorry that I do not have the issue machine handy. I would ask
our user to do the test. Thanks!
Regards,
David
>
> Heiner
>
>
> On 31.01.2019 07:21, Heiner Kallweit wrote:
> > David, thanks for the link to the bug ticket.
> > I think only a proper bisect can help to find the offending commit.
> >
> > Heiner
> >
> >
> > On 31.01.2019 03:32, David Chang wrote:
> >> Hi,
> >>
> >> We had a similr case here.
> >> - Realtek r8169 receive performance regression in kernel 4.19
> >> https://bugzilla.suse.com/show_bug.cgi?id=1119649
> >>
> >> kernel: r8169 0000:01:00.0 eth0: RTL8168h/8111h, XID 54100880
> >> The major symptom is there are many rx_missed count.
> >>
> >>
> >> On Jan 30, 2019 at 20:15:45 +0100, Heiner Kallweit wrote:
> >>> Hi Peter,
> >>>
> >>> recently I had somebody where pcie_aspm=off for whatever reason didn't
> >>> do the trick, can you also check with pcie_aspm.policy=performance.
> >>
> >> We will give it a try later.
> >>
> >>> And please check with "ethtool -S <if>" whether the chip statistics
> >>> show a significant number of errors.
> >>>
> >>> If this doesn't help you may have to bisect to find the offending commit.
> >>
> >> We had tried fallback driver to a few previous commits as following,
> >> but with no luck.
> >>
> >> 9675931e6b65 r8169: re-enable MSI-X on RTL8168g (v4.19)
> >> 098b01ad9837 r8169: don't include asm headers directly (v4.19-rc1)
> >> a2965f12fde6 r8169: remove rtl8169_set_speed_xmii (v4.19-rc1)
> >> 6fcf9b1d4d6c r8169: fix runtime suspend (v4.19-rc1)
> >> e397286b8e89 r8169: remove TBI 1000BaseX support (v4.19-rc1)
> >>
> >> Thanks,
> >> David Chang
> >>
> >>>
> >>> Heiner
> >>>
> >>>
> >>> On 30.01.2019 10:59, Peter Ceiley wrote:
> >>>> Hi Heiner,
> >>>>
> >>>> I tried disabling the ASPM using the pcie_aspm=off kernel parameter
> >>>> and this made no difference.
> >>>>
> >>>> I tried compiling the 4.18.16 r8169.c with the 4.19.18 source and
> >>>> subsequently loaded the module in the running 4.19.18 kernel. I can
> >>>> confirm that this immediately resolved the issue and access to the NFS
> >>>> shares operated as expected.
> >>>>
> >>>> I presume this means it is an issue with the r8169 driver included in
> >>>> 4.19 onwards?
> >>>>
> >>>> To answer your last questions:
> >>>>
> >>>> Base Board Information
> >>>> Manufacturer: Alienware
> >>>> Product Name: 0PGRP5
> >>>> Version: A02
> >>>>
> >>>> ... and yes, the RTL8168 is the onboard network chip.
> >>>>
> >>>> Regards,
> >>>>
> >>>> Peter.
> >>>>
> >>>> On Tue, 29 Jan 2019 at 17:44, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>>>>
> >>>>> Hi Peter,
> >>>>>
> >>>>> I think the vendor driver doesn't enable ASPM per default.
> >>>>> So it's worth a try to disable ASPM in the BIOS or via sysfs.
> >>>>> Few older systems seem to have issues with ASPM, what kind of
> >>>>> system / mainboard are you using? The RTL8168 is the onboard
> >>>>> network chip?
> >>>>>
> >>>>> Rgds, Heiner
> >>>>>
> >>>>>
> >>>>> On 29.01.2019 07:20, Peter Ceiley wrote:
> >>>>>> Hi Heiner,
> >>>>>>
> >>>>>> Thanks, I'll do some more testing. It might not be the driver - I
> >>>>>> assumed it was due to the fact that using the r8168 driver 'resolves'
> >>>>>> the issue. I'll see if I can test the r8169.c on top of 4.19 - this is
> >>>>>> a good idea.
> >>>>>>
> >>>>>> Cheers,
> >>>>>>
> >>>>>> Peter.
> >>>>>>
> >>>>>> On Tue, 29 Jan 2019 at 17:16, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>>>>>>
> >>>>>>> Hi Peter,
> >>>>>>>
> >>>>>>> at a first glance it doesn't look like a typical driver issue.
> >>>>>>> What you could do:
> >>>>>>>
> >>>>>>> - Test the r8169.c from 4.18 on top of 4.19.
> >>>>>>>
> >>>>>>> - Check whether disabling ASPM (/sys/module/pcie_aspm) has an effect.
> >>>>>>>
> >>>>>>> - Bisect between 4.18 and 4.19 to find the offending commit.
> >>>>>>>
> >>>>>>> Any specific reason why you think root cause is in the driver and not
> >>>>>>> elsewhere in the network subsystem?
> >>>>>>>
> >>>>>>> Heiner
> >>>>>>>
> >>>>>>>
> >>>>>>> On 28.01.2019 23:10, Peter Ceiley wrote:
> >>>>>>>> Hi Heiner,
> >>>>>>>>
> >>>>>>>> Thanks for getting back to me.
> >>>>>>>>
> >>>>>>>> No, I don't use jumbo packets.
> >>>>>>>>
> >>>>>>>> Bandwidth is *generally* good, and iperf results to my NAS provide
> >>>>>>>> over 900 Mbits/s in both circumstances. The issue seems to appear when
> >>>>>>>> establishing a connection and is most notable, for example, on my
> >>>>>>>> mounted NFS shares where it takes seconds (up to 10's of seconds on
> >>>>>>>> larger directories) to list the contents of each directory. Once a
> >>>>>>>> transfer begins on a file, I appear to get good bandwidth.
> >>>>>>>>
> >>>>>>>> I'm unsure of the best scientific data to provide you in order to
> >>>>>>>> troubleshoot this issue. Running the following
> >>>>>>>>
> >>>>>>>> netstat -s |grep retransmitted
> >>>>>>>>
> >>>>>>>> shows a steady increase in retransmitted segments each time I list the
> >>>>>>>> contents of a remote directory, for example, running 'ls' on a
> >>>>>>>> directory containing 345 media files did the following using kernel
> >>>>>>>> 4.19.18:
> >>>>>>>>
> >>>>>>>> increased retransmitted segments by 21 and the 'time' command showed
> >>>>>>>> the following:
> >>>>>>>> real 0m19.867s
> >>>>>>>> user 0m0.012s
> >>>>>>>> sys 0m0.036s
> >>>>>>>>
> >>>>>>>> The same command shows no retransmitted segments running kernel
> >>>>>>>> 4.18.16 and 'time' showed:
> >>>>>>>> real 0m0.300s
> >>>>>>>> user 0m0.004s
> >>>>>>>> sys 0m0.007s
> >>>>>>>>
> >>>>>>>> ifconfig does not show any RX/TX errors nor dropped packets in either case.
> >>>>>>>>
> >>>>>>>> dmesg XID:
> >>>>>>>> [ 2.979984] r8169 0000:03:00.0 eth0: RTL8168g/8111g,
> >>>>>>>> f8:b1:56:fe:67:e0, XID 4c000800, IRQ 32
> >>>>>>>>
> >>>>>>>> # lspci -vv
> >>>>>>>> 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
> >>>>>>>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
> >>>>>>>> Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
> >>>>>>>> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> >>>>>>>> ParErr- Stepping- SERR- FastB2B- DisINTx+
> >>>>>>>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> >>>>>>>> <TAbort- <MAbort- >SERR- <PERR- INTx-
> >>>>>>>> Latency: 0, Cache Line Size: 64 bytes
> >>>>>>>> Interrupt: pin A routed to IRQ 19
> >>>>>>>> Region 0: I/O ports at d000 [size=256]
> >>>>>>>> Region 2: Memory at f7b00000 (64-bit, non-prefetchable) [size=4K]
> >>>>>>>> Region 4: Memory at f2100000 (64-bit, prefetchable) [size=16K]
> >>>>>>>> Capabilities: [40] Power Management version 3
> >>>>>>>> Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
> >>>>>>>> PME(D0+,D1+,D2+,D3hot+,D3cold+)
> >>>>>>>> Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> >>>>>>>> Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
> >>>>>>>> Address: 0000000000000000 Data: 0000
> >>>>>>>> Capabilities: [70] Express (v2) Endpoint, MSI 01
> >>>>>>>> DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s
> >>>>>>>> <512ns, L1 <64us
> >>>>>>>> ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
> >>>>>>>> SlotPowerLimit 10.000W
> >>>>>>>> DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
> >>>>>>>> RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
> >>>>>>>> MaxPayload 128 bytes, MaxReadReq 4096 bytes
> >>>>>>>> DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
> >>>>>>>> LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
> >>>>>>>> Latency L0s unlimited, L1 <64us
> >>>>>>>> ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
> >>>>>>>> LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
> >>>>>>>> ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
> >>>>>>>> LnkSta: Speed 2.5GT/s (ok), Width x1 (ok)
> >>>>>>>> TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> >>>>>>>> DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+,
> >>>>>>>> OBFF Via message/WAKE#
> >>>>>>>> AtomicOpsCap: 32bit- 64bit- 128bitCAS-
> >>>>>>>> DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+,
> >>>>>>>> OBFF Disabled
> >>>>>>>> AtomicOpsCtl: ReqEn-
> >>>>>>>> LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
> >>>>>>>> Transmit Margin: Normal Operating Range,
> >>>>>>>> EnterModifiedCompliance- ComplianceSOS-
> >>>>>>>> Compliance De-emphasis: -6dB
> >>>>>>>> LnkSta2: Current De-emphasis Level: -6dB,
> >>>>>>>> EqualizationComplete-, EqualizationPhase1-
> >>>>>>>> EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
> >>>>>>>> Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
> >>>>>>>> Vector table: BAR=4 offset=00000000
> >>>>>>>> PBA: BAR=4 offset=00000800
> >>>>>>>> Capabilities: [d0] Vital Product Data
> >>>>>>>> pcilib: sysfs_read_vpd: read failed: Input/output error
> >>>>>>>> Not readable
> >>>>>>>> Capabilities: [100 v1] Advanced Error Reporting
> >>>>>>>> UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> >>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> >>>>>>>> UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> >>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> >>>>>>>> UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
> >>>>>>>> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> >>>>>>>> CESta: RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ AdvNonFatalErr-
> >>>>>>>> CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
> >>>>>>>> AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn-
> >>>>>>>> ECRCChkCap+ ECRCChkEn-
> >>>>>>>> MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
> >>>>>>>> HeaderLog: 00000000 00000000 00000000 00000000
> >>>>>>>> Capabilities: [140 v1] Virtual Channel
> >>>>>>>> Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
> >>>>>>>> Arb: Fixed- WRR32- WRR64- WRR128-
> >>>>>>>> Ctrl: ArbSelect=Fixed
> >>>>>>>> Status: InProgress-
> >>>>>>>> VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
> >>>>>>>> Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
> >>>>>>>> Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01
> >>>>>>>> Status: NegoPending- InProgress-
> >>>>>>>> Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
> >>>>>>>> Capabilities: [170 v1] Latency Tolerance Reporting
> >>>>>>>> Max snoop latency: 71680ns
> >>>>>>>> Max no snoop latency: 71680ns
> >>>>>>>> Kernel driver in use: r8169
> >>>>>>>> Kernel modules: r8169
> >>>>>>>>
> >>>>>>>> Please let me know if you have any other ideas in terms of testing.
> >>>>>>>>
> >>>>>>>> Thanks!
> >>>>>>>>
> >>>>>>>> Peter.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Tue, 29 Jan 2019 at 05:28, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>>>>>>>>
> >>>>>>>>> On 28.01.2019 12:13, Peter Ceiley wrote:
> >>>>>>>>>> Hi,
> >>>>>>>>>>
> >>>>>>>>>> I have been experiencing very poor network performance since Kernel
> >>>>>>>>>> 4.19 and I'm confident it's related to the r8169 driver.
> >>>>>>>>>>
> >>>>>>>>>> I have no issue with kernel versions 4.18 and prior. I am experiencing
> >>>>>>>>>> this issue in kernels 4.19 and 4.20 (currently running/testing with
> >>>>>>>>>> 4.20.4 & 4.19.18).
> >>>>>>>>>>
> >>>>>>>>>> If someone could guide me in the right direction, I'm happy to help
> >>>>>>>>>> troubleshoot this issue. Note that I have been keeping an eye on one
> >>>>>>>>>> issue related to loading of the PHY driver, however, my symptoms
> >>>>>>>>>> differ in that I still have a network connection. I have attempted to
> >>>>>>>>>> reload the driver on a running system, but this does not improve the
> >>>>>>>>>> situation.
> >>>>>>>>>>
> >>>>>>>>>> Using the proprietary r8168 driver returns my device to proper working order.
> >>>>>>>>>>
> >>>>>>>>>> lshw shows:
> >>>>>>>>>> description: Ethernet interface
> >>>>>>>>>> product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
> >>>>>>>>>> vendor: Realtek Semiconductor Co., Ltd.
> >>>>>>>>>> physical id: 0
> >>>>>>>>>> bus info: pci@0000:03:00.0
> >>>>>>>>>> logical name: enp3s0
> >>>>>>>>>> version: 0c
> >>>>>>>>>> serial:
> >>>>>>>>>> size: 1Gbit/s
> >>>>>>>>>> capacity: 1Gbit/s
> >>>>>>>>>> width: 64 bits
> >>>>>>>>>> clock: 33MHz
> >>>>>>>>>> capabilities: pm msi pciexpress msix vpd bus_master cap_list
> >>>>>>>>>> ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd
> >>>>>>>>>> 1000bt-fd autonegotiation
> >>>>>>>>>> configuration: autonegotiation=on broadcast=yes driver=r8169
> >>>>>>>>>> duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25
> >>>>>>>>>> latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
> >>>>>>>>>> resources: irq:19 ioport:d000(size=256)
> >>>>>>>>>> memory:f7b00000-f7b00fff memory:f2100000-f2103fff
> >>>>>>>>>>
> >>>>>>>>>> Kind Regards,
> >>>>>>>>>>
> >>>>>>>>>> Peter.
> >>>>>>>>>>
> >>>>>>>>> Hi Peter,
> >>>>>>>>>
> >>>>>>>>> the description "poor network performance" is quite vague, therefore:
> >>>>>>>>>
> >>>>>>>>> - Can you provide any measurements?
> >>>>>>>>> - iperf results before and after
> >>>>>>>>> - statistics about dropped packets (rx and/or tx)
> >>>>>>>>> - Do you use jumbo packets?
> >>>>>>>>>
> >>>>>>>>> Also help would be a "lspci -vv" output for the network card and
> >>>>>>>>> the dmesg output line with the chip XID.
> >>>>>>>>>
> >>>>>>>>> Heiner
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
>
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: r8169 Driver - Poor Network Performance Since Kernel 4.19
2019-01-31 7:23 ` David Chang
@ 2019-01-31 12:09 ` Peter Ceiley
2019-01-31 18:28 ` Heiner Kallweit
0 siblings, 1 reply; 25+ messages in thread
From: Peter Ceiley @ 2019-01-31 12:09 UTC (permalink / raw)
To: David Chang; +Cc: Heiner Kallweit, Realtek linux nic maintainers, netdev
Hi Heiner,
A quick update on my testing with different pcie_aspm settings:
pcie_aspm=off | no change
pcie_aspm.policy=default | no change
pcie_aspm.policy=performance | issue resolved
pcie_aspm.policy=powersave | issue resolved
pcie_aspm.policy=powersupersave | issue resolved
It seems the new driver does not play nicely with the default ASPM policy.
As requested, I've included an output of ethtool below when experiencing
the issue - note that no errors are recorded.
# ethtool -S enp3s0
NIC statistics:
tx_packets: 2749
rx_packets: 4089
tx_errors: 0
rx_errors: 0
rx_missed: 0
align_errors: 0
tx_single_collisions: 0
tx_multi_collisions: 0
unicast: 4078
broadcast: 9
multicast: 2
tx_aborted: 0
tx_underrun: 0
David, I hope this helps for your user as well. I appreciate you sharing
the bug ticket - thanks.
Heiner, thanks very much for your help to date.
Regards,
Peter.
On Thu, 31 Jan 2019 at 18:23, David Chang <dchang@suse.com> wrote:
>
> Hi Heiner,
>
> On Jan 31, 2019 at 07:35:30 +0100, Heiner Kallweit wrote:
> > Hi David, two more things:
> >
> > 1. Could you please test a recent linux-next kernel?
> > 2. Please get a register dump (ethtool -d <if>) from 4.18 and 4.19
> > and compare them.
>
> I'm sorry that I do not have the issue machine handy. I would ask
> our user to do the test. Thanks!
>
> Regards,
> David
>
> >
> > Heiner
> >
> >
> > On 31.01.2019 07:21, Heiner Kallweit wrote:
> > > David, thanks for the link to the bug ticket.
> > > I think only a proper bisect can help to find the offending commit.
> > >
> > > Heiner
> > >
> > >
> > > On 31.01.2019 03:32, David Chang wrote:
> > >> Hi,
> > >>
> > >> We had a similr case here.
> > >> - Realtek r8169 receive performance regression in kernel 4.19
> > >> https://bugzilla.suse.com/show_bug.cgi?id=1119649
> > >>
> > >> kernel: r8169 0000:01:00.0 eth0: RTL8168h/8111h, XID 54100880
> > >> The major symptom is there are many rx_missed count.
> > >>
> > >>
> > >> On Jan 30, 2019 at 20:15:45 +0100, Heiner Kallweit wrote:
> > >>> Hi Peter,
> > >>>
> > >>> recently I had somebody where pcie_aspm=off for whatever reason didn't
> > >>> do the trick, can you also check with pcie_aspm.policy=performance.
> > >>
> > >> We will give it a try later.
> > >>
> > >>> And please check with "ethtool -S <if>" whether the chip statistics
> > >>> show a significant number of errors.
> > >>>
> > >>> If this doesn't help you may have to bisect to find the offending commit.
> > >>
> > >> We had tried fallback driver to a few previous commits as following,
> > >> but with no luck.
> > >>
> > >> 9675931e6b65 r8169: re-enable MSI-X on RTL8168g (v4.19)
> > >> 098b01ad9837 r8169: don't include asm headers directly (v4.19-rc1)
> > >> a2965f12fde6 r8169: remove rtl8169_set_speed_xmii (v4.19-rc1)
> > >> 6fcf9b1d4d6c r8169: fix runtime suspend (v4.19-rc1)
> > >> e397286b8e89 r8169: remove TBI 1000BaseX support (v4.19-rc1)
> > >>
> > >> Thanks,
> > >> David Chang
> > >>
> > >>>
> > >>> Heiner
> > >>>
> > >>>
> > >>> On 30.01.2019 10:59, Peter Ceiley wrote:
> > >>>> Hi Heiner,
> > >>>>
> > >>>> I tried disabling the ASPM using the pcie_aspm=off kernel parameter
> > >>>> and this made no difference.
> > >>>>
> > >>>> I tried compiling the 4.18.16 r8169.c with the 4.19.18 source and
> > >>>> subsequently loaded the module in the running 4.19.18 kernel. I can
> > >>>> confirm that this immediately resolved the issue and access to the NFS
> > >>>> shares operated as expected.
> > >>>>
> > >>>> I presume this means it is an issue with the r8169 driver included in
> > >>>> 4.19 onwards?
> > >>>>
> > >>>> To answer your last questions:
> > >>>>
> > >>>> Base Board Information
> > >>>> Manufacturer: Alienware
> > >>>> Product Name: 0PGRP5
> > >>>> Version: A02
> > >>>>
> > >>>> ... and yes, the RTL8168 is the onboard network chip.
> > >>>>
> > >>>> Regards,
> > >>>>
> > >>>> Peter.
> > >>>>
> > >>>> On Tue, 29 Jan 2019 at 17:44, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> > >>>>>
> > >>>>> Hi Peter,
> > >>>>>
> > >>>>> I think the vendor driver doesn't enable ASPM per default.
> > >>>>> So it's worth a try to disable ASPM in the BIOS or via sysfs.
> > >>>>> Few older systems seem to have issues with ASPM, what kind of
> > >>>>> system / mainboard are you using? The RTL8168 is the onboard
> > >>>>> network chip?
> > >>>>>
> > >>>>> Rgds, Heiner
> > >>>>>
> > >>>>>
> > >>>>> On 29.01.2019 07:20, Peter Ceiley wrote:
> > >>>>>> Hi Heiner,
> > >>>>>>
> > >>>>>> Thanks, I'll do some more testing. It might not be the driver - I
> > >>>>>> assumed it was due to the fact that using the r8168 driver 'resolves'
> > >>>>>> the issue. I'll see if I can test the r8169.c on top of 4.19 - this is
> > >>>>>> a good idea.
> > >>>>>>
> > >>>>>> Cheers,
> > >>>>>>
> > >>>>>> Peter.
> > >>>>>>
> > >>>>>> On Tue, 29 Jan 2019 at 17:16, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> > >>>>>>>
> > >>>>>>> Hi Peter,
> > >>>>>>>
> > >>>>>>> at a first glance it doesn't look like a typical driver issue.
> > >>>>>>> What you could do:
> > >>>>>>>
> > >>>>>>> - Test the r8169.c from 4.18 on top of 4.19.
> > >>>>>>>
> > >>>>>>> - Check whether disabling ASPM (/sys/module/pcie_aspm) has an effect.
> > >>>>>>>
> > >>>>>>> - Bisect between 4.18 and 4.19 to find the offending commit.
> > >>>>>>>
> > >>>>>>> Any specific reason why you think root cause is in the driver and not
> > >>>>>>> elsewhere in the network subsystem?
> > >>>>>>>
> > >>>>>>> Heiner
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On 28.01.2019 23:10, Peter Ceiley wrote:
> > >>>>>>>> Hi Heiner,
> > >>>>>>>>
> > >>>>>>>> Thanks for getting back to me.
> > >>>>>>>>
> > >>>>>>>> No, I don't use jumbo packets.
> > >>>>>>>>
> > >>>>>>>> Bandwidth is *generally* good, and iperf results to my NAS provide
> > >>>>>>>> over 900 Mbits/s in both circumstances. The issue seems to appear when
> > >>>>>>>> establishing a connection and is most notable, for example, on my
> > >>>>>>>> mounted NFS shares where it takes seconds (up to 10's of seconds on
> > >>>>>>>> larger directories) to list the contents of each directory. Once a
> > >>>>>>>> transfer begins on a file, I appear to get good bandwidth.
> > >>>>>>>>
> > >>>>>>>> I'm unsure of the best scientific data to provide you in order to
> > >>>>>>>> troubleshoot this issue. Running the following
> > >>>>>>>>
> > >>>>>>>> netstat -s |grep retransmitted
> > >>>>>>>>
> > >>>>>>>> shows a steady increase in retransmitted segments each time I list the
> > >>>>>>>> contents of a remote directory, for example, running 'ls' on a
> > >>>>>>>> directory containing 345 media files did the following using kernel
> > >>>>>>>> 4.19.18:
> > >>>>>>>>
> > >>>>>>>> increased retransmitted segments by 21 and the 'time' command showed
> > >>>>>>>> the following:
> > >>>>>>>> real 0m19.867s
> > >>>>>>>> user 0m0.012s
> > >>>>>>>> sys 0m0.036s
> > >>>>>>>>
> > >>>>>>>> The same command shows no retransmitted segments running kernel
> > >>>>>>>> 4.18.16 and 'time' showed:
> > >>>>>>>> real 0m0.300s
> > >>>>>>>> user 0m0.004s
> > >>>>>>>> sys 0m0.007s
> > >>>>>>>>
> > >>>>>>>> ifconfig does not show any RX/TX errors nor dropped packets in either case.
> > >>>>>>>>
> > >>>>>>>> dmesg XID:
> > >>>>>>>> [ 2.979984] r8169 0000:03:00.0 eth0: RTL8168g/8111g,
> > >>>>>>>> f8:b1:56:fe:67:e0, XID 4c000800, IRQ 32
> > >>>>>>>>
> > >>>>>>>> # lspci -vv
> > >>>>>>>> 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
> > >>>>>>>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
> > >>>>>>>> Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
> > >>>>>>>> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> > >>>>>>>> ParErr- Stepping- SERR- FastB2B- DisINTx+
> > >>>>>>>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> > >>>>>>>> <TAbort- <MAbort- >SERR- <PERR- INTx-
> > >>>>>>>> Latency: 0, Cache Line Size: 64 bytes
> > >>>>>>>> Interrupt: pin A routed to IRQ 19
> > >>>>>>>> Region 0: I/O ports at d000 [size=256]
> > >>>>>>>> Region 2: Memory at f7b00000 (64-bit, non-prefetchable) [size=4K]
> > >>>>>>>> Region 4: Memory at f2100000 (64-bit, prefetchable) [size=16K]
> > >>>>>>>> Capabilities: [40] Power Management version 3
> > >>>>>>>> Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
> > >>>>>>>> PME(D0+,D1+,D2+,D3hot+,D3cold+)
> > >>>>>>>> Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> > >>>>>>>> Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
> > >>>>>>>> Address: 0000000000000000 Data: 0000
> > >>>>>>>> Capabilities: [70] Express (v2) Endpoint, MSI 01
> > >>>>>>>> DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s
> > >>>>>>>> <512ns, L1 <64us
> > >>>>>>>> ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
> > >>>>>>>> SlotPowerLimit 10.000W
> > >>>>>>>> DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
> > >>>>>>>> RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
> > >>>>>>>> MaxPayload 128 bytes, MaxReadReq 4096 bytes
> > >>>>>>>> DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
> > >>>>>>>> LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
> > >>>>>>>> Latency L0s unlimited, L1 <64us
> > >>>>>>>> ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
> > >>>>>>>> LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
> > >>>>>>>> ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
> > >>>>>>>> LnkSta: Speed 2.5GT/s (ok), Width x1 (ok)
> > >>>>>>>> TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> > >>>>>>>> DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+,
> > >>>>>>>> OBFF Via message/WAKE#
> > >>>>>>>> AtomicOpsCap: 32bit- 64bit- 128bitCAS-
> > >>>>>>>> DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+,
> > >>>>>>>> OBFF Disabled
> > >>>>>>>> AtomicOpsCtl: ReqEn-
> > >>>>>>>> LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
> > >>>>>>>> Transmit Margin: Normal Operating Range,
> > >>>>>>>> EnterModifiedCompliance- ComplianceSOS-
> > >>>>>>>> Compliance De-emphasis: -6dB
> > >>>>>>>> LnkSta2: Current De-emphasis Level: -6dB,
> > >>>>>>>> EqualizationComplete-, EqualizationPhase1-
> > >>>>>>>> EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
> > >>>>>>>> Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
> > >>>>>>>> Vector table: BAR=4 offset=00000000
> > >>>>>>>> PBA: BAR=4 offset=00000800
> > >>>>>>>> Capabilities: [d0] Vital Product Data
> > >>>>>>>> pcilib: sysfs_read_vpd: read failed: Input/output error
> > >>>>>>>> Not readable
> > >>>>>>>> Capabilities: [100 v1] Advanced Error Reporting
> > >>>>>>>> UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> > >>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> > >>>>>>>> UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> > >>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> > >>>>>>>> UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
> > >>>>>>>> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> > >>>>>>>> CESta: RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ AdvNonFatalErr-
> > >>>>>>>> CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
> > >>>>>>>> AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn-
> > >>>>>>>> ECRCChkCap+ ECRCChkEn-
> > >>>>>>>> MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
> > >>>>>>>> HeaderLog: 00000000 00000000 00000000 00000000
> > >>>>>>>> Capabilities: [140 v1] Virtual Channel
> > >>>>>>>> Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
> > >>>>>>>> Arb: Fixed- WRR32- WRR64- WRR128-
> > >>>>>>>> Ctrl: ArbSelect=Fixed
> > >>>>>>>> Status: InProgress-
> > >>>>>>>> VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
> > >>>>>>>> Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
> > >>>>>>>> Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01
> > >>>>>>>> Status: NegoPending- InProgress-
> > >>>>>>>> Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
> > >>>>>>>> Capabilities: [170 v1] Latency Tolerance Reporting
> > >>>>>>>> Max snoop latency: 71680ns
> > >>>>>>>> Max no snoop latency: 71680ns
> > >>>>>>>> Kernel driver in use: r8169
> > >>>>>>>> Kernel modules: r8169
> > >>>>>>>>
> > >>>>>>>> Please let me know if you have any other ideas in terms of testing.
> > >>>>>>>>
> > >>>>>>>> Thanks!
> > >>>>>>>>
> > >>>>>>>> Peter.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> On Tue, 29 Jan 2019 at 05:28, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> > >>>>>>>>>
> > >>>>>>>>> On 28.01.2019 12:13, Peter Ceiley wrote:
> > >>>>>>>>>> Hi,
> > >>>>>>>>>>
> > >>>>>>>>>> I have been experiencing very poor network performance since Kernel
> > >>>>>>>>>> 4.19 and I'm confident it's related to the r8169 driver.
> > >>>>>>>>>>
> > >>>>>>>>>> I have no issue with kernel versions 4.18 and prior. I am experiencing
> > >>>>>>>>>> this issue in kernels 4.19 and 4.20 (currently running/testing with
> > >>>>>>>>>> 4.20.4 & 4.19.18).
> > >>>>>>>>>>
> > >>>>>>>>>> If someone could guide me in the right direction, I'm happy to help
> > >>>>>>>>>> troubleshoot this issue. Note that I have been keeping an eye on one
> > >>>>>>>>>> issue related to loading of the PHY driver, however, my symptoms
> > >>>>>>>>>> differ in that I still have a network connection. I have attempted to
> > >>>>>>>>>> reload the driver on a running system, but this does not improve the
> > >>>>>>>>>> situation.
> > >>>>>>>>>>
> > >>>>>>>>>> Using the proprietary r8168 driver returns my device to proper working order.
> > >>>>>>>>>>
> > >>>>>>>>>> lshw shows:
> > >>>>>>>>>> description: Ethernet interface
> > >>>>>>>>>> product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
> > >>>>>>>>>> vendor: Realtek Semiconductor Co., Ltd.
> > >>>>>>>>>> physical id: 0
> > >>>>>>>>>> bus info: pci@0000:03:00.0
> > >>>>>>>>>> logical name: enp3s0
> > >>>>>>>>>> version: 0c
> > >>>>>>>>>> serial:
> > >>>>>>>>>> size: 1Gbit/s
> > >>>>>>>>>> capacity: 1Gbit/s
> > >>>>>>>>>> width: 64 bits
> > >>>>>>>>>> clock: 33MHz
> > >>>>>>>>>> capabilities: pm msi pciexpress msix vpd bus_master cap_list
> > >>>>>>>>>> ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd
> > >>>>>>>>>> 1000bt-fd autonegotiation
> > >>>>>>>>>> configuration: autonegotiation=on broadcast=yes driver=r8169
> > >>>>>>>>>> duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25
> > >>>>>>>>>> latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
> > >>>>>>>>>> resources: irq:19 ioport:d000(size=256)
> > >>>>>>>>>> memory:f7b00000-f7b00fff memory:f2100000-f2103fff
> > >>>>>>>>>>
> > >>>>>>>>>> Kind Regards,
> > >>>>>>>>>>
> > >>>>>>>>>> Peter.
> > >>>>>>>>>>
> > >>>>>>>>> Hi Peter,
> > >>>>>>>>>
> > >>>>>>>>> the description "poor network performance" is quite vague, therefore:
> > >>>>>>>>>
> > >>>>>>>>> - Can you provide any measurements?
> > >>>>>>>>> - iperf results before and after
> > >>>>>>>>> - statistics about dropped packets (rx and/or tx)
> > >>>>>>>>> - Do you use jumbo packets?
> > >>>>>>>>>
> > >>>>>>>>> Also help would be a "lspci -vv" output for the network card and
> > >>>>>>>>> the dmesg output line with the chip XID.
> > >>>>>>>>>
> > >>>>>>>>> Heiner
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> > >
> >
> >
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: r8169 Driver - Poor Network Performance Since Kernel 4.19
2019-01-31 12:09 ` Peter Ceiley
@ 2019-01-31 18:28 ` Heiner Kallweit
2019-02-01 4:27 ` David Chang
0 siblings, 1 reply; 25+ messages in thread
From: Heiner Kallweit @ 2019-01-31 18:28 UTC (permalink / raw)
To: Peter Ceiley, David Chang; +Cc: Realtek linux nic maintainers, netdev
Thanks for testing, Peter!
So we have an ASPM-related issue indeed. I'm aware that there are certain
incompatibilities between board chipsets and network chip versions
(although it's not known which combinations are affected).
And we don't know whether it's a hardware or BIOS issue.
Older driver versions dealt with this by simply disabling ASPM in general.
As a result all systems with a supported Realtek chip didn't reach higher
package power-saving states, resulting in significantly reduced battery
lifetime on notebooks.
The network driver has no stake in dealing with the ASPM policies, this
is handled by lower PCI layers.
Unfortunately we can't detect ASPM incompatibilities at runtime. Maybe
we could build some heuristics based on rx_missed percentage, but it's
not clear that ASPM issues always show the same symptoms.
So for now people with affected systems have to set a proper
pcie_aspm.policy parameter.
Just what is not clear to me is why pcie_aspm=off doesn't help.
@David:
I assume you'll check with the affected user to test the ASPM policy
parameter.
Heiner
On 31.01.2019 13:09, Peter Ceiley wrote:
> Hi Heiner,
>
> A quick update on my testing with different pcie_aspm settings:
>
> pcie_aspm=off | no change
> pcie_aspm.policy=default | no change
> pcie_aspm.policy=performance | issue resolved
> pcie_aspm.policy=powersave | issue resolved
> pcie_aspm.policy=powersupersave | issue resolved
>
> It seems the new driver does not play nicely with the default ASPM policy.
>
> As requested, I've included an output of ethtool below when experiencing
> the issue - note that no errors are recorded.
>
> # ethtool -S enp3s0
> NIC statistics:
> tx_packets: 2749
> rx_packets: 4089
> tx_errors: 0
> rx_errors: 0
> rx_missed: 0
> align_errors: 0
> tx_single_collisions: 0
> tx_multi_collisions: 0
> unicast: 4078
> broadcast: 9
> multicast: 2
> tx_aborted: 0
> tx_underrun: 0
>
> David, I hope this helps for your user as well. I appreciate you sharing
> the bug ticket - thanks.
>
> Heiner, thanks very much for your help to date.
>
> Regards,
>
> Peter.
>
> On Thu, 31 Jan 2019 at 18:23, David Chang <dchang@suse.com> wrote:
>>
>> Hi Heiner,
>>
>> On Jan 31, 2019 at 07:35:30 +0100, Heiner Kallweit wrote:
>>> Hi David, two more things:
>>>
>>> 1. Could you please test a recent linux-next kernel?
>>> 2. Please get a register dump (ethtool -d <if>) from 4.18 and 4.19
>>> and compare them.
>>
>> I'm sorry that I do not have the issue machine handy. I would ask
>> our user to do the test. Thanks!
>>
>> Regards,
>> David
>>
>>>
>>> Heiner
>>>
>>>
>>> On 31.01.2019 07:21, Heiner Kallweit wrote:
>>>> David, thanks for the link to the bug ticket.
>>>> I think only a proper bisect can help to find the offending commit.
>>>>
>>>> Heiner
>>>>
>>>>
>>>> On 31.01.2019 03:32, David Chang wrote:
>>>>> Hi,
>>>>>
>>>>> We had a similr case here.
>>>>> - Realtek r8169 receive performance regression in kernel 4.19
>>>>> https://bugzilla.suse.com/show_bug.cgi?id=1119649
>>>>>
>>>>> kernel: r8169 0000:01:00.0 eth0: RTL8168h/8111h, XID 54100880
>>>>> The major symptom is there are many rx_missed count.
>>>>>
>>>>>
>>>>> On Jan 30, 2019 at 20:15:45 +0100, Heiner Kallweit wrote:
>>>>>> Hi Peter,
>>>>>>
>>>>>> recently I had somebody where pcie_aspm=off for whatever reason didn't
>>>>>> do the trick, can you also check with pcie_aspm.policy=performance.
>>>>>
>>>>> We will give it a try later.
>>>>>
>>>>>> And please check with "ethtool -S <if>" whether the chip statistics
>>>>>> show a significant number of errors.
>>>>>>
>>>>>> If this doesn't help you may have to bisect to find the offending commit.
>>>>>
>>>>> We had tried fallback driver to a few previous commits as following,
>>>>> but with no luck.
>>>>>
>>>>> 9675931e6b65 r8169: re-enable MSI-X on RTL8168g (v4.19)
>>>>> 098b01ad9837 r8169: don't include asm headers directly (v4.19-rc1)
>>>>> a2965f12fde6 r8169: remove rtl8169_set_speed_xmii (v4.19-rc1)
>>>>> 6fcf9b1d4d6c r8169: fix runtime suspend (v4.19-rc1)
>>>>> e397286b8e89 r8169: remove TBI 1000BaseX support (v4.19-rc1)
>>>>>
>>>>> Thanks,
>>>>> David Chang
>>>>>
>>>>>>
>>>>>> Heiner
>>>>>>
>>>>>>
>>>>>> On 30.01.2019 10:59, Peter Ceiley wrote:
>>>>>>> Hi Heiner,
>>>>>>>
>>>>>>> I tried disabling the ASPM using the pcie_aspm=off kernel parameter
>>>>>>> and this made no difference.
>>>>>>>
>>>>>>> I tried compiling the 4.18.16 r8169.c with the 4.19.18 source and
>>>>>>> subsequently loaded the module in the running 4.19.18 kernel. I can
>>>>>>> confirm that this immediately resolved the issue and access to the NFS
>>>>>>> shares operated as expected.
>>>>>>>
>>>>>>> I presume this means it is an issue with the r8169 driver included in
>>>>>>> 4.19 onwards?
>>>>>>>
>>>>>>> To answer your last questions:
>>>>>>>
>>>>>>> Base Board Information
>>>>>>> Manufacturer: Alienware
>>>>>>> Product Name: 0PGRP5
>>>>>>> Version: A02
>>>>>>>
>>>>>>> ... and yes, the RTL8168 is the onboard network chip.
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Peter.
>>>>>>>
>>>>>>> On Tue, 29 Jan 2019 at 17:44, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>>>>>
>>>>>>>> Hi Peter,
>>>>>>>>
>>>>>>>> I think the vendor driver doesn't enable ASPM per default.
>>>>>>>> So it's worth a try to disable ASPM in the BIOS or via sysfs.
>>>>>>>> Few older systems seem to have issues with ASPM, what kind of
>>>>>>>> system / mainboard are you using? The RTL8168 is the onboard
>>>>>>>> network chip?
>>>>>>>>
>>>>>>>> Rgds, Heiner
>>>>>>>>
>>>>>>>>
>>>>>>>> On 29.01.2019 07:20, Peter Ceiley wrote:
>>>>>>>>> Hi Heiner,
>>>>>>>>>
>>>>>>>>> Thanks, I'll do some more testing. It might not be the driver - I
>>>>>>>>> assumed it was due to the fact that using the r8168 driver 'resolves'
>>>>>>>>> the issue. I'll see if I can test the r8169.c on top of 4.19 - this is
>>>>>>>>> a good idea.
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>>
>>>>>>>>> Peter.
>>>>>>>>>
>>>>>>>>> On Tue, 29 Jan 2019 at 17:16, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi Peter,
>>>>>>>>>>
>>>>>>>>>> at a first glance it doesn't look like a typical driver issue.
>>>>>>>>>> What you could do:
>>>>>>>>>>
>>>>>>>>>> - Test the r8169.c from 4.18 on top of 4.19.
>>>>>>>>>>
>>>>>>>>>> - Check whether disabling ASPM (/sys/module/pcie_aspm) has an effect.
>>>>>>>>>>
>>>>>>>>>> - Bisect between 4.18 and 4.19 to find the offending commit.
>>>>>>>>>>
>>>>>>>>>> Any specific reason why you think root cause is in the driver and not
>>>>>>>>>> elsewhere in the network subsystem?
>>>>>>>>>>
>>>>>>>>>> Heiner
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 28.01.2019 23:10, Peter Ceiley wrote:
>>>>>>>>>>> Hi Heiner,
>>>>>>>>>>>
>>>>>>>>>>> Thanks for getting back to me.
>>>>>>>>>>>
>>>>>>>>>>> No, I don't use jumbo packets.
>>>>>>>>>>>
>>>>>>>>>>> Bandwidth is *generally* good, and iperf results to my NAS provide
>>>>>>>>>>> over 900 Mbits/s in both circumstances. The issue seems to appear when
>>>>>>>>>>> establishing a connection and is most notable, for example, on my
>>>>>>>>>>> mounted NFS shares where it takes seconds (up to 10's of seconds on
>>>>>>>>>>> larger directories) to list the contents of each directory. Once a
>>>>>>>>>>> transfer begins on a file, I appear to get good bandwidth.
>>>>>>>>>>>
>>>>>>>>>>> I'm unsure of the best scientific data to provide you in order to
>>>>>>>>>>> troubleshoot this issue. Running the following
>>>>>>>>>>>
>>>>>>>>>>> netstat -s |grep retransmitted
>>>>>>>>>>>
>>>>>>>>>>> shows a steady increase in retransmitted segments each time I list the
>>>>>>>>>>> contents of a remote directory, for example, running 'ls' on a
>>>>>>>>>>> directory containing 345 media files did the following using kernel
>>>>>>>>>>> 4.19.18:
>>>>>>>>>>>
>>>>>>>>>>> increased retransmitted segments by 21 and the 'time' command showed
>>>>>>>>>>> the following:
>>>>>>>>>>> real 0m19.867s
>>>>>>>>>>> user 0m0.012s
>>>>>>>>>>> sys 0m0.036s
>>>>>>>>>>>
>>>>>>>>>>> The same command shows no retransmitted segments running kernel
>>>>>>>>>>> 4.18.16 and 'time' showed:
>>>>>>>>>>> real 0m0.300s
>>>>>>>>>>> user 0m0.004s
>>>>>>>>>>> sys 0m0.007s
>>>>>>>>>>>
>>>>>>>>>>> ifconfig does not show any RX/TX errors nor dropped packets in either case.
>>>>>>>>>>>
>>>>>>>>>>> dmesg XID:
>>>>>>>>>>> [ 2.979984] r8169 0000:03:00.0 eth0: RTL8168g/8111g,
>>>>>>>>>>> f8:b1:56:fe:67:e0, XID 4c000800, IRQ 32
>>>>>>>>>>>
>>>>>>>>>>> # lspci -vv
>>>>>>>>>>> 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
>>>>>>>>>>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
>>>>>>>>>>> Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
>>>>>>>>>>> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
>>>>>>>>>>> ParErr- Stepping- SERR- FastB2B- DisINTx+
>>>>>>>>>>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
>>>>>>>>>>> <TAbort- <MAbort- >SERR- <PERR- INTx-
>>>>>>>>>>> Latency: 0, Cache Line Size: 64 bytes
>>>>>>>>>>> Interrupt: pin A routed to IRQ 19
>>>>>>>>>>> Region 0: I/O ports at d000 [size=256]
>>>>>>>>>>> Region 2: Memory at f7b00000 (64-bit, non-prefetchable) [size=4K]
>>>>>>>>>>> Region 4: Memory at f2100000 (64-bit, prefetchable) [size=16K]
>>>>>>>>>>> Capabilities: [40] Power Management version 3
>>>>>>>>>>> Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
>>>>>>>>>>> PME(D0+,D1+,D2+,D3hot+,D3cold+)
>>>>>>>>>>> Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
>>>>>>>>>>> Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
>>>>>>>>>>> Address: 0000000000000000 Data: 0000
>>>>>>>>>>> Capabilities: [70] Express (v2) Endpoint, MSI 01
>>>>>>>>>>> DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s
>>>>>>>>>>> <512ns, L1 <64us
>>>>>>>>>>> ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
>>>>>>>>>>> SlotPowerLimit 10.000W
>>>>>>>>>>> DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
>>>>>>>>>>> RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
>>>>>>>>>>> MaxPayload 128 bytes, MaxReadReq 4096 bytes
>>>>>>>>>>> DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
>>>>>>>>>>> LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
>>>>>>>>>>> Latency L0s unlimited, L1 <64us
>>>>>>>>>>> ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
>>>>>>>>>>> LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
>>>>>>>>>>> ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
>>>>>>>>>>> LnkSta: Speed 2.5GT/s (ok), Width x1 (ok)
>>>>>>>>>>> TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
>>>>>>>>>>> DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+,
>>>>>>>>>>> OBFF Via message/WAKE#
>>>>>>>>>>> AtomicOpsCap: 32bit- 64bit- 128bitCAS-
>>>>>>>>>>> DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+,
>>>>>>>>>>> OBFF Disabled
>>>>>>>>>>> AtomicOpsCtl: ReqEn-
>>>>>>>>>>> LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
>>>>>>>>>>> Transmit Margin: Normal Operating Range,
>>>>>>>>>>> EnterModifiedCompliance- ComplianceSOS-
>>>>>>>>>>> Compliance De-emphasis: -6dB
>>>>>>>>>>> LnkSta2: Current De-emphasis Level: -6dB,
>>>>>>>>>>> EqualizationComplete-, EqualizationPhase1-
>>>>>>>>>>> EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
>>>>>>>>>>> Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
>>>>>>>>>>> Vector table: BAR=4 offset=00000000
>>>>>>>>>>> PBA: BAR=4 offset=00000800
>>>>>>>>>>> Capabilities: [d0] Vital Product Data
>>>>>>>>>>> pcilib: sysfs_read_vpd: read failed: Input/output error
>>>>>>>>>>> Not readable
>>>>>>>>>>> Capabilities: [100 v1] Advanced Error Reporting
>>>>>>>>>>> UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
>>>>>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>>>>>>>>>> UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
>>>>>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>>>>>>>>>> UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
>>>>>>>>>>> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>>>>>>>>>>> CESta: RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ AdvNonFatalErr-
>>>>>>>>>>> CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
>>>>>>>>>>> AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn-
>>>>>>>>>>> ECRCChkCap+ ECRCChkEn-
>>>>>>>>>>> MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
>>>>>>>>>>> HeaderLog: 00000000 00000000 00000000 00000000
>>>>>>>>>>> Capabilities: [140 v1] Virtual Channel
>>>>>>>>>>> Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
>>>>>>>>>>> Arb: Fixed- WRR32- WRR64- WRR128-
>>>>>>>>>>> Ctrl: ArbSelect=Fixed
>>>>>>>>>>> Status: InProgress-
>>>>>>>>>>> VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
>>>>>>>>>>> Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
>>>>>>>>>>> Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01
>>>>>>>>>>> Status: NegoPending- InProgress-
>>>>>>>>>>> Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
>>>>>>>>>>> Capabilities: [170 v1] Latency Tolerance Reporting
>>>>>>>>>>> Max snoop latency: 71680ns
>>>>>>>>>>> Max no snoop latency: 71680ns
>>>>>>>>>>> Kernel driver in use: r8169
>>>>>>>>>>> Kernel modules: r8169
>>>>>>>>>>>
>>>>>>>>>>> Please let me know if you have any other ideas in terms of testing.
>>>>>>>>>>>
>>>>>>>>>>> Thanks!
>>>>>>>>>>>
>>>>>>>>>>> Peter.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, 29 Jan 2019 at 05:28, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> On 28.01.2019 12:13, Peter Ceiley wrote:
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have been experiencing very poor network performance since Kernel
>>>>>>>>>>>>> 4.19 and I'm confident it's related to the r8169 driver.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have no issue with kernel versions 4.18 and prior. I am experiencing
>>>>>>>>>>>>> this issue in kernels 4.19 and 4.20 (currently running/testing with
>>>>>>>>>>>>> 4.20.4 & 4.19.18).
>>>>>>>>>>>>>
>>>>>>>>>>>>> If someone could guide me in the right direction, I'm happy to help
>>>>>>>>>>>>> troubleshoot this issue. Note that I have been keeping an eye on one
>>>>>>>>>>>>> issue related to loading of the PHY driver, however, my symptoms
>>>>>>>>>>>>> differ in that I still have a network connection. I have attempted to
>>>>>>>>>>>>> reload the driver on a running system, but this does not improve the
>>>>>>>>>>>>> situation.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Using the proprietary r8168 driver returns my device to proper working order.
>>>>>>>>>>>>>
>>>>>>>>>>>>> lshw shows:
>>>>>>>>>>>>> description: Ethernet interface
>>>>>>>>>>>>> product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
>>>>>>>>>>>>> vendor: Realtek Semiconductor Co., Ltd.
>>>>>>>>>>>>> physical id: 0
>>>>>>>>>>>>> bus info: pci@0000:03:00.0
>>>>>>>>>>>>> logical name: enp3s0
>>>>>>>>>>>>> version: 0c
>>>>>>>>>>>>> serial:
>>>>>>>>>>>>> size: 1Gbit/s
>>>>>>>>>>>>> capacity: 1Gbit/s
>>>>>>>>>>>>> width: 64 bits
>>>>>>>>>>>>> clock: 33MHz
>>>>>>>>>>>>> capabilities: pm msi pciexpress msix vpd bus_master cap_list
>>>>>>>>>>>>> ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd
>>>>>>>>>>>>> 1000bt-fd autonegotiation
>>>>>>>>>>>>> configuration: autonegotiation=on broadcast=yes driver=r8169
>>>>>>>>>>>>> duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25
>>>>>>>>>>>>> latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
>>>>>>>>>>>>> resources: irq:19 ioport:d000(size=256)
>>>>>>>>>>>>> memory:f7b00000-f7b00fff memory:f2100000-f2103fff
>>>>>>>>>>>>>
>>>>>>>>>>>>> Kind Regards,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Peter.
>>>>>>>>>>>>>
>>>>>>>>>>>> Hi Peter,
>>>>>>>>>>>>
>>>>>>>>>>>> the description "poor network performance" is quite vague, therefore:
>>>>>>>>>>>>
>>>>>>>>>>>> - Can you provide any measurements?
>>>>>>>>>>>> - iperf results before and after
>>>>>>>>>>>> - statistics about dropped packets (rx and/or tx)
>>>>>>>>>>>> - Do you use jumbo packets?
>>>>>>>>>>>>
>>>>>>>>>>>> Also help would be a "lspci -vv" output for the network card and
>>>>>>>>>>>> the dmesg output line with the chip XID.
>>>>>>>>>>>>
>>>>>>>>>>>> Heiner
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: r8169 Driver - Poor Network Performance Since Kernel 4.19
2019-01-31 18:28 ` Heiner Kallweit
@ 2019-02-01 4:27 ` David Chang
0 siblings, 0 replies; 25+ messages in thread
From: David Chang @ 2019-02-01 4:27 UTC (permalink / raw)
To: Heiner Kallweit
Cc: Peter Ceiley, Realtek linux nic maintainers, netdev, Martti Laaksonen
On Jan 31, 2019 at 19:28:20 +0100, Heiner Kallweit wrote:
> Thanks for testing, Peter!
> So we have an ASPM-related issue indeed. I'm aware that there are certain
> incompatibilities between board chipsets and network chip versions
> (although it's not known which combinations are affected).
> And we don't know whether it's a hardware or BIOS issue.
>
> Older driver versions dealt with this by simply disabling ASPM in general.
> As a result all systems with a supported Realtek chip didn't reach higher
> package power-saving states, resulting in significantly reduced battery
> lifetime on notebooks.
> The network driver has no stake in dealing with the ASPM policies, this
> is handled by lower PCI layers.
>
> Unfortunately we can't detect ASPM incompatibilities at runtime. Maybe
> we could build some heuristics based on rx_missed percentage, but it's
> not clear that ASPM issues always show the same symptoms.
>
> So for now people with affected systems have to set a proper
> pcie_aspm.policy parameter.
> Just what is not clear to me is why pcie_aspm=off doesn't help.
>
> @David:
> I assume you'll check with the affected user to test the ASPM policy
> parameter.
Unfortunately, we did not have any performace improvement when
using both kernel parameters.
@Peter, thanks for the information.
regards,
David
>
> Heiner
>
>
> On 31.01.2019 13:09, Peter Ceiley wrote:
> > Hi Heiner,
> >
> > A quick update on my testing with different pcie_aspm settings:
> >
> > pcie_aspm=off | no change
> > pcie_aspm.policy=default | no change
> > pcie_aspm.policy=performance | issue resolved
> > pcie_aspm.policy=powersave | issue resolved
> > pcie_aspm.policy=powersupersave | issue resolved
> >
> > It seems the new driver does not play nicely with the default ASPM policy.
> >
> > As requested, I've included an output of ethtool below when experiencing
> > the issue - note that no errors are recorded.
> >
> > # ethtool -S enp3s0
> > NIC statistics:
> > tx_packets: 2749
> > rx_packets: 4089
> > tx_errors: 0
> > rx_errors: 0
> > rx_missed: 0
> > align_errors: 0
> > tx_single_collisions: 0
> > tx_multi_collisions: 0
> > unicast: 4078
> > broadcast: 9
> > multicast: 2
> > tx_aborted: 0
> > tx_underrun: 0
> >
> > David, I hope this helps for your user as well. I appreciate you sharing
> > the bug ticket - thanks.
> >
> > Heiner, thanks very much for your help to date.
> >
> > Regards,
> >
> > Peter.
> >
> > On Thu, 31 Jan 2019 at 18:23, David Chang <dchang@suse.com> wrote:
> >>
> >> Hi Heiner,
> >>
> >> On Jan 31, 2019 at 07:35:30 +0100, Heiner Kallweit wrote:
> >>> Hi David, two more things:
> >>>
> >>> 1. Could you please test a recent linux-next kernel?
> >>> 2. Please get a register dump (ethtool -d <if>) from 4.18 and 4.19
> >>> and compare them.
> >>
> >> I'm sorry that I do not have the issue machine handy. I would ask
> >> our user to do the test. Thanks!
> >>
> >> Regards,
> >> David
> >>
> >>>
> >>> Heiner
> >>>
> >>>
> >>> On 31.01.2019 07:21, Heiner Kallweit wrote:
> >>>> David, thanks for the link to the bug ticket.
> >>>> I think only a proper bisect can help to find the offending commit.
> >>>>
> >>>> Heiner
> >>>>
> >>>>
> >>>> On 31.01.2019 03:32, David Chang wrote:
> >>>>> Hi,
> >>>>>
> >>>>> We had a similr case here.
> >>>>> - Realtek r8169 receive performance regression in kernel 4.19
> >>>>> https://bugzilla.suse.com/show_bug.cgi?id=1119649
> >>>>>
> >>>>> kernel: r8169 0000:01:00.0 eth0: RTL8168h/8111h, XID 54100880
> >>>>> The major symptom is there are many rx_missed count.
> >>>>>
> >>>>>
> >>>>> On Jan 30, 2019 at 20:15:45 +0100, Heiner Kallweit wrote:
> >>>>>> Hi Peter,
> >>>>>>
> >>>>>> recently I had somebody where pcie_aspm=off for whatever reason didn't
> >>>>>> do the trick, can you also check with pcie_aspm.policy=performance.
> >>>>>
> >>>>> We will give it a try later.
> >>>>>
> >>>>>> And please check with "ethtool -S <if>" whether the chip statistics
> >>>>>> show a significant number of errors.
> >>>>>>
> >>>>>> If this doesn't help you may have to bisect to find the offending commit.
> >>>>>
> >>>>> We had tried fallback driver to a few previous commits as following,
> >>>>> but with no luck.
> >>>>>
> >>>>> 9675931e6b65 r8169: re-enable MSI-X on RTL8168g (v4.19)
> >>>>> 098b01ad9837 r8169: don't include asm headers directly (v4.19-rc1)
> >>>>> a2965f12fde6 r8169: remove rtl8169_set_speed_xmii (v4.19-rc1)
> >>>>> 6fcf9b1d4d6c r8169: fix runtime suspend (v4.19-rc1)
> >>>>> e397286b8e89 r8169: remove TBI 1000BaseX support (v4.19-rc1)
> >>>>>
> >>>>> Thanks,
> >>>>> David Chang
> >>>>>
> >>>>>>
> >>>>>> Heiner
> >>>>>>
> >>>>>>
> >>>>>> On 30.01.2019 10:59, Peter Ceiley wrote:
> >>>>>>> Hi Heiner,
> >>>>>>>
> >>>>>>> I tried disabling the ASPM using the pcie_aspm=off kernel parameter
> >>>>>>> and this made no difference.
> >>>>>>>
> >>>>>>> I tried compiling the 4.18.16 r8169.c with the 4.19.18 source and
> >>>>>>> subsequently loaded the module in the running 4.19.18 kernel. I can
> >>>>>>> confirm that this immediately resolved the issue and access to the NFS
> >>>>>>> shares operated as expected.
> >>>>>>>
> >>>>>>> I presume this means it is an issue with the r8169 driver included in
> >>>>>>> 4.19 onwards?
> >>>>>>>
> >>>>>>> To answer your last questions:
> >>>>>>>
> >>>>>>> Base Board Information
> >>>>>>> Manufacturer: Alienware
> >>>>>>> Product Name: 0PGRP5
> >>>>>>> Version: A02
> >>>>>>>
> >>>>>>> ... and yes, the RTL8168 is the onboard network chip.
> >>>>>>>
> >>>>>>> Regards,
> >>>>>>>
> >>>>>>> Peter.
> >>>>>>>
> >>>>>>> On Tue, 29 Jan 2019 at 17:44, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>>>>>>>
> >>>>>>>> Hi Peter,
> >>>>>>>>
> >>>>>>>> I think the vendor driver doesn't enable ASPM per default.
> >>>>>>>> So it's worth a try to disable ASPM in the BIOS or via sysfs.
> >>>>>>>> Few older systems seem to have issues with ASPM, what kind of
> >>>>>>>> system / mainboard are you using? The RTL8168 is the onboard
> >>>>>>>> network chip?
> >>>>>>>>
> >>>>>>>> Rgds, Heiner
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 29.01.2019 07:20, Peter Ceiley wrote:
> >>>>>>>>> Hi Heiner,
> >>>>>>>>>
> >>>>>>>>> Thanks, I'll do some more testing. It might not be the driver - I
> >>>>>>>>> assumed it was due to the fact that using the r8168 driver 'resolves'
> >>>>>>>>> the issue. I'll see if I can test the r8169.c on top of 4.19 - this is
> >>>>>>>>> a good idea.
> >>>>>>>>>
> >>>>>>>>> Cheers,
> >>>>>>>>>
> >>>>>>>>> Peter.
> >>>>>>>>>
> >>>>>>>>> On Tue, 29 Jan 2019 at 17:16, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Hi Peter,
> >>>>>>>>>>
> >>>>>>>>>> at a first glance it doesn't look like a typical driver issue.
> >>>>>>>>>> What you could do:
> >>>>>>>>>>
> >>>>>>>>>> - Test the r8169.c from 4.18 on top of 4.19.
> >>>>>>>>>>
> >>>>>>>>>> - Check whether disabling ASPM (/sys/module/pcie_aspm) has an effect.
> >>>>>>>>>>
> >>>>>>>>>> - Bisect between 4.18 and 4.19 to find the offending commit.
> >>>>>>>>>>
> >>>>>>>>>> Any specific reason why you think root cause is in the driver and not
> >>>>>>>>>> elsewhere in the network subsystem?
> >>>>>>>>>>
> >>>>>>>>>> Heiner
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On 28.01.2019 23:10, Peter Ceiley wrote:
> >>>>>>>>>>> Hi Heiner,
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks for getting back to me.
> >>>>>>>>>>>
> >>>>>>>>>>> No, I don't use jumbo packets.
> >>>>>>>>>>>
> >>>>>>>>>>> Bandwidth is *generally* good, and iperf results to my NAS provide
> >>>>>>>>>>> over 900 Mbits/s in both circumstances. The issue seems to appear when
> >>>>>>>>>>> establishing a connection and is most notable, for example, on my
> >>>>>>>>>>> mounted NFS shares where it takes seconds (up to 10's of seconds on
> >>>>>>>>>>> larger directories) to list the contents of each directory. Once a
> >>>>>>>>>>> transfer begins on a file, I appear to get good bandwidth.
> >>>>>>>>>>>
> >>>>>>>>>>> I'm unsure of the best scientific data to provide you in order to
> >>>>>>>>>>> troubleshoot this issue. Running the following
> >>>>>>>>>>>
> >>>>>>>>>>> netstat -s |grep retransmitted
> >>>>>>>>>>>
> >>>>>>>>>>> shows a steady increase in retransmitted segments each time I list the
> >>>>>>>>>>> contents of a remote directory, for example, running 'ls' on a
> >>>>>>>>>>> directory containing 345 media files did the following using kernel
> >>>>>>>>>>> 4.19.18:
> >>>>>>>>>>>
> >>>>>>>>>>> increased retransmitted segments by 21 and the 'time' command showed
> >>>>>>>>>>> the following:
> >>>>>>>>>>> real 0m19.867s
> >>>>>>>>>>> user 0m0.012s
> >>>>>>>>>>> sys 0m0.036s
> >>>>>>>>>>>
> >>>>>>>>>>> The same command shows no retransmitted segments running kernel
> >>>>>>>>>>> 4.18.16 and 'time' showed:
> >>>>>>>>>>> real 0m0.300s
> >>>>>>>>>>> user 0m0.004s
> >>>>>>>>>>> sys 0m0.007s
> >>>>>>>>>>>
> >>>>>>>>>>> ifconfig does not show any RX/TX errors nor dropped packets in either case.
> >>>>>>>>>>>
> >>>>>>>>>>> dmesg XID:
> >>>>>>>>>>> [ 2.979984] r8169 0000:03:00.0 eth0: RTL8168g/8111g,
> >>>>>>>>>>> f8:b1:56:fe:67:e0, XID 4c000800, IRQ 32
> >>>>>>>>>>>
> >>>>>>>>>>> # lspci -vv
> >>>>>>>>>>> 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
> >>>>>>>>>>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
> >>>>>>>>>>> Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
> >>>>>>>>>>> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> >>>>>>>>>>> ParErr- Stepping- SERR- FastB2B- DisINTx+
> >>>>>>>>>>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> >>>>>>>>>>> <TAbort- <MAbort- >SERR- <PERR- INTx-
> >>>>>>>>>>> Latency: 0, Cache Line Size: 64 bytes
> >>>>>>>>>>> Interrupt: pin A routed to IRQ 19
> >>>>>>>>>>> Region 0: I/O ports at d000 [size=256]
> >>>>>>>>>>> Region 2: Memory at f7b00000 (64-bit, non-prefetchable) [size=4K]
> >>>>>>>>>>> Region 4: Memory at f2100000 (64-bit, prefetchable) [size=16K]
> >>>>>>>>>>> Capabilities: [40] Power Management version 3
> >>>>>>>>>>> Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
> >>>>>>>>>>> PME(D0+,D1+,D2+,D3hot+,D3cold+)
> >>>>>>>>>>> Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> >>>>>>>>>>> Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
> >>>>>>>>>>> Address: 0000000000000000 Data: 0000
> >>>>>>>>>>> Capabilities: [70] Express (v2) Endpoint, MSI 01
> >>>>>>>>>>> DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s
> >>>>>>>>>>> <512ns, L1 <64us
> >>>>>>>>>>> ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
> >>>>>>>>>>> SlotPowerLimit 10.000W
> >>>>>>>>>>> DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
> >>>>>>>>>>> RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
> >>>>>>>>>>> MaxPayload 128 bytes, MaxReadReq 4096 bytes
> >>>>>>>>>>> DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
> >>>>>>>>>>> LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
> >>>>>>>>>>> Latency L0s unlimited, L1 <64us
> >>>>>>>>>>> ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
> >>>>>>>>>>> LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
> >>>>>>>>>>> ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
> >>>>>>>>>>> LnkSta: Speed 2.5GT/s (ok), Width x1 (ok)
> >>>>>>>>>>> TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> >>>>>>>>>>> DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+,
> >>>>>>>>>>> OBFF Via message/WAKE#
> >>>>>>>>>>> AtomicOpsCap: 32bit- 64bit- 128bitCAS-
> >>>>>>>>>>> DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+,
> >>>>>>>>>>> OBFF Disabled
> >>>>>>>>>>> AtomicOpsCtl: ReqEn-
> >>>>>>>>>>> LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
> >>>>>>>>>>> Transmit Margin: Normal Operating Range,
> >>>>>>>>>>> EnterModifiedCompliance- ComplianceSOS-
> >>>>>>>>>>> Compliance De-emphasis: -6dB
> >>>>>>>>>>> LnkSta2: Current De-emphasis Level: -6dB,
> >>>>>>>>>>> EqualizationComplete-, EqualizationPhase1-
> >>>>>>>>>>> EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
> >>>>>>>>>>> Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
> >>>>>>>>>>> Vector table: BAR=4 offset=00000000
> >>>>>>>>>>> PBA: BAR=4 offset=00000800
> >>>>>>>>>>> Capabilities: [d0] Vital Product Data
> >>>>>>>>>>> pcilib: sysfs_read_vpd: read failed: Input/output error
> >>>>>>>>>>> Not readable
> >>>>>>>>>>> Capabilities: [100 v1] Advanced Error Reporting
> >>>>>>>>>>> UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> >>>>>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> >>>>>>>>>>> UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> >>>>>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> >>>>>>>>>>> UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
> >>>>>>>>>>> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> >>>>>>>>>>> CESta: RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ AdvNonFatalErr-
> >>>>>>>>>>> CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
> >>>>>>>>>>> AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn-
> >>>>>>>>>>> ECRCChkCap+ ECRCChkEn-
> >>>>>>>>>>> MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
> >>>>>>>>>>> HeaderLog: 00000000 00000000 00000000 00000000
> >>>>>>>>>>> Capabilities: [140 v1] Virtual Channel
> >>>>>>>>>>> Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
> >>>>>>>>>>> Arb: Fixed- WRR32- WRR64- WRR128-
> >>>>>>>>>>> Ctrl: ArbSelect=Fixed
> >>>>>>>>>>> Status: InProgress-
> >>>>>>>>>>> VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
> >>>>>>>>>>> Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
> >>>>>>>>>>> Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01
> >>>>>>>>>>> Status: NegoPending- InProgress-
> >>>>>>>>>>> Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
> >>>>>>>>>>> Capabilities: [170 v1] Latency Tolerance Reporting
> >>>>>>>>>>> Max snoop latency: 71680ns
> >>>>>>>>>>> Max no snoop latency: 71680ns
> >>>>>>>>>>> Kernel driver in use: r8169
> >>>>>>>>>>> Kernel modules: r8169
> >>>>>>>>>>>
> >>>>>>>>>>> Please let me know if you have any other ideas in terms of testing.
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks!
> >>>>>>>>>>>
> >>>>>>>>>>> Peter.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Tue, 29 Jan 2019 at 05:28, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 28.01.2019 12:13, Peter Ceiley wrote:
> >>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I have been experiencing very poor network performance since Kernel
> >>>>>>>>>>>>> 4.19 and I'm confident it's related to the r8169 driver.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I have no issue with kernel versions 4.18 and prior. I am experiencing
> >>>>>>>>>>>>> this issue in kernels 4.19 and 4.20 (currently running/testing with
> >>>>>>>>>>>>> 4.20.4 & 4.19.18).
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> If someone could guide me in the right direction, I'm happy to help
> >>>>>>>>>>>>> troubleshoot this issue. Note that I have been keeping an eye on one
> >>>>>>>>>>>>> issue related to loading of the PHY driver, however, my symptoms
> >>>>>>>>>>>>> differ in that I still have a network connection. I have attempted to
> >>>>>>>>>>>>> reload the driver on a running system, but this does not improve the
> >>>>>>>>>>>>> situation.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Using the proprietary r8168 driver returns my device to proper working order.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> lshw shows:
> >>>>>>>>>>>>> description: Ethernet interface
> >>>>>>>>>>>>> product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
> >>>>>>>>>>>>> vendor: Realtek Semiconductor Co., Ltd.
> >>>>>>>>>>>>> physical id: 0
> >>>>>>>>>>>>> bus info: pci@0000:03:00.0
> >>>>>>>>>>>>> logical name: enp3s0
> >>>>>>>>>>>>> version: 0c
> >>>>>>>>>>>>> serial:
> >>>>>>>>>>>>> size: 1Gbit/s
> >>>>>>>>>>>>> capacity: 1Gbit/s
> >>>>>>>>>>>>> width: 64 bits
> >>>>>>>>>>>>> clock: 33MHz
> >>>>>>>>>>>>> capabilities: pm msi pciexpress msix vpd bus_master cap_list
> >>>>>>>>>>>>> ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd
> >>>>>>>>>>>>> 1000bt-fd autonegotiation
> >>>>>>>>>>>>> configuration: autonegotiation=on broadcast=yes driver=r8169
> >>>>>>>>>>>>> duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25
> >>>>>>>>>>>>> latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
> >>>>>>>>>>>>> resources: irq:19 ioport:d000(size=256)
> >>>>>>>>>>>>> memory:f7b00000-f7b00fff memory:f2100000-f2103fff
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Kind Regards,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Peter.
> >>>>>>>>>>>>>
> >>>>>>>>>>>> Hi Peter,
> >>>>>>>>>>>>
> >>>>>>>>>>>> the description "poor network performance" is quite vague, therefore:
> >>>>>>>>>>>>
> >>>>>>>>>>>> - Can you provide any measurements?
> >>>>>>>>>>>> - iperf results before and after
> >>>>>>>>>>>> - statistics about dropped packets (rx and/or tx)
> >>>>>>>>>>>> - Do you use jumbo packets?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Also help would be a "lspci -vv" output for the network card and
> >>>>>>>>>>>> the dmesg output line with the chip XID.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Heiner
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >
>
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: r8169 Driver - Poor Network Performance Since Kernel 4.19
2019-01-31 6:35 ` Heiner Kallweit
2019-01-31 6:49 ` Heiner Kallweit
2019-01-31 7:23 ` David Chang
@ 2019-02-01 4:29 ` David Chang
2019-02-01 6:32 ` Heiner Kallweit
2 siblings, 1 reply; 25+ messages in thread
From: David Chang @ 2019-02-01 4:29 UTC (permalink / raw)
To: Heiner Kallweit
Cc: Peter Ceiley, Realtek linux nic maintainers, netdev, Martti Laaksonen
On Jan 31, 2019 at 07:35:30 +0100, Heiner Kallweit wrote:
> Hi David, two more things:
>
> 1. Could you please test a recent linux-next kernel?
Not tested yet. Will do if possible.
> 2. Please get a register dump (ethtool -d <if>) from 4.18 and 4.19
> and compare them.
For your informaiton.
[with pcie_aspm=off]
--- v4.18.15 2019-02-01 12:11:56.019051828 +0800
+++ v4.9.11 2019-02-01 12:12:26.827439645 +0800
@@ -3,18 +3,19 @@
Offset Values
------ ------
0x0000: ec 8e b5 5a 2c f5 00 00 48 00 40 00 80 00 80 00
-0x0010: 00 10 38 0e 04 00 00 00 78 00 06 00 00 00 00 00
-0x0020: 00 f0 9b f6 03 00 00 00 00 00 00 00 00 00 00 00
+0x0010: 00 f0 ba 0d 04 00 00 00 78 00 06 00 00 00 00 00
+0x0020: 00 d0 35 f7 03 00 00 00 00 00 00 00 00 00 00 00
0x0030: 00 00 00 00 00 00 00 0c 00 00 00 00 3f 80 00 00
-0x0040: 80 0f 10 57 0e cf 02 00 00 cf ba 34 00 00 00 00
-0x0050: 10 00 cf 18 60 11 00 01 11 11 11 00 00 00 00 00
+0x0040: 80 0f 10 57 0e cf 02 00 00 d8 c7 50 00 00 00 00
+0x0050: 10 00 cf 98 60 11 01 01 11 11 11 00 00 00 00 00
0x0060: 00 00 00 00 3c 10 00 81 2c f0 00 80 93 00 80 f0
-0x0070: 00 6f 00 c4 b0 31 00 00 07 00 00 00 00 00 00 00
+0x0070: 00 6f 00 c4 b0 31 00 00 07 00 00 00 00 00 76 d0
0x0080: 8b 06 01 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x00a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
-0x00b0: 7f 04 00 00 00 00 00 00 e1 c1 05 d2 00 00 00 00
+0x00b0: 7f 04 00 00 00 00 00 00 ad 79 01 d2 00 00 00 00
0x00c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x00d0: 21 00 00 32 0e 00 00 00 00 00 00 40 06 11 fd 00
-0x00e0: e1 20 51 51 00 30 94 f6 03 00 00 00 27 00 00 00
+0x00e0: e1 20 51 51 00 e0 35 f7 03 00 00 00 27 00 00 00
0x00f0: 3f 00 00 00 00 00 00 00 03 00 00 00 00 00 00 00
[pcie_aspm.policy=performance]
--- v4.18.15-p 2019-02-01 12:18:46.919221060 +0800
+++ v4.9.11-p 2019-02-01 12:19:09.207474824 +0800
@@ -3,18 +3,19 @@
Offset Values
------ ------
0x0000: ec 8e b5 5a 2c f5 00 00 48 00 40 00 80 00 80 00
-0x0010: 00 f0 bc 0d 04 00 00 00 78 00 06 00 00 00 00 00
-0x0020: 00 60 2e f7 03 00 00 00 00 00 00 00 00 00 00 00
+0x0010: 00 c0 22 09 04 00 00 00 78 00 06 00 00 00 00 00
+0x0020: 00 f0 e5 f4 03 00 00 00 00 00 00 00 00 00 00 00
0x0030: 00 00 00 00 00 00 00 0c 00 00 00 00 3f 80 00 00
-0x0040: 80 0f 10 57 0e cf 02 00 00 53 50 1a 00 00 00 00
-0x0050: 10 00 cf 18 60 11 00 01 11 11 11 00 00 00 00 00
+0x0040: 80 0f 10 57 0e cf 02 00 00 d2 35 7b 00 00 00 00
+0x0050: 10 00 cf 98 60 11 01 01 11 11 11 00 00 00 00 00
0x0060: 00 00 00 00 3c 10 00 81 2c f0 00 80 93 00 80 f0
-0x0070: 00 6f 00 c4 b0 31 00 00 07 00 00 00 00 00 00 00
+0x0070: 00 6f 00 c4 b0 31 00 00 07 00 00 00 00 00 a4 a0
0x0080: 8b 06 01 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x00a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
-0x00b0: 7f 04 00 00 00 00 00 00 e1 c1 05 d2 00 00 00 00
+0x00b0: 7f 04 00 00 00 00 00 00 ad 79 01 d2 00 00 00 00
0x00c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x00d0: 21 00 00 32 0e 00 00 00 00 00 00 40 06 11 fd 00
-0x00e0: e1 20 51 51 00 70 2e f7 03 00 00 00 27 00 00 00
+0x00e0: e1 20 51 51 00 00 e6 f4 03 00 00 00 27 00 00 00
0x00f0: 3f 00 00 00 00 00 00 00 03 00 00 00 00 00 00 00
Thanks,
David
> Heiner
>
>
> On 31.01.2019 07:21, Heiner Kallweit wrote:
> > David, thanks for the link to the bug ticket.
> > I think only a proper bisect can help to find the offending commit.
> >
> > Heiner
> >
> >
> > On 31.01.2019 03:32, David Chang wrote:
> >> Hi,
> >>
> >> We had a similr case here.
> >> - Realtek r8169 receive performance regression in kernel 4.19
> >> https://bugzilla.suse.com/show_bug.cgi?id=1119649
> >>
> >> kernel: r8169 0000:01:00.0 eth0: RTL8168h/8111h, XID 54100880
> >> The major symptom is there are many rx_missed count.
> >>
> >>
> >> On Jan 30, 2019 at 20:15:45 +0100, Heiner Kallweit wrote:
> >>> Hi Peter,
> >>>
> >>> recently I had somebody where pcie_aspm=off for whatever reason didn't
> >>> do the trick, can you also check with pcie_aspm.policy=performance.
> >>
> >> We will give it a try later.
> >>
> >>> And please check with "ethtool -S <if>" whether the chip statistics
> >>> show a significant number of errors.
> >>>
> >>> If this doesn't help you may have to bisect to find the offending commit.
> >>
> >> We had tried fallback driver to a few previous commits as following,
> >> but with no luck.
> >>
> >> 9675931e6b65 r8169: re-enable MSI-X on RTL8168g (v4.19)
> >> 098b01ad9837 r8169: don't include asm headers directly (v4.19-rc1)
> >> a2965f12fde6 r8169: remove rtl8169_set_speed_xmii (v4.19-rc1)
> >> 6fcf9b1d4d6c r8169: fix runtime suspend (v4.19-rc1)
> >> e397286b8e89 r8169: remove TBI 1000BaseX support (v4.19-rc1)
> >>
> >> Thanks,
> >> David Chang
> >>
> >>>
> >>> Heiner
> >>>
> >>>
> >>> On 30.01.2019 10:59, Peter Ceiley wrote:
> >>>> Hi Heiner,
> >>>>
> >>>> I tried disabling the ASPM using the pcie_aspm=off kernel parameter
> >>>> and this made no difference.
> >>>>
> >>>> I tried compiling the 4.18.16 r8169.c with the 4.19.18 source and
> >>>> subsequently loaded the module in the running 4.19.18 kernel. I can
> >>>> confirm that this immediately resolved the issue and access to the NFS
> >>>> shares operated as expected.
> >>>>
> >>>> I presume this means it is an issue with the r8169 driver included in
> >>>> 4.19 onwards?
> >>>>
> >>>> To answer your last questions:
> >>>>
> >>>> Base Board Information
> >>>> Manufacturer: Alienware
> >>>> Product Name: 0PGRP5
> >>>> Version: A02
> >>>>
> >>>> ... and yes, the RTL8168 is the onboard network chip.
> >>>>
> >>>> Regards,
> >>>>
> >>>> Peter.
> >>>>
> >>>> On Tue, 29 Jan 2019 at 17:44, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>>>>
> >>>>> Hi Peter,
> >>>>>
> >>>>> I think the vendor driver doesn't enable ASPM per default.
> >>>>> So it's worth a try to disable ASPM in the BIOS or via sysfs.
> >>>>> Few older systems seem to have issues with ASPM, what kind of
> >>>>> system / mainboard are you using? The RTL8168 is the onboard
> >>>>> network chip?
> >>>>>
> >>>>> Rgds, Heiner
> >>>>>
> >>>>>
> >>>>> On 29.01.2019 07:20, Peter Ceiley wrote:
> >>>>>> Hi Heiner,
> >>>>>>
> >>>>>> Thanks, I'll do some more testing. It might not be the driver - I
> >>>>>> assumed it was due to the fact that using the r8168 driver 'resolves'
> >>>>>> the issue. I'll see if I can test the r8169.c on top of 4.19 - this is
> >>>>>> a good idea.
> >>>>>>
> >>>>>> Cheers,
> >>>>>>
> >>>>>> Peter.
> >>>>>>
> >>>>>> On Tue, 29 Jan 2019 at 17:16, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>>>>>>
> >>>>>>> Hi Peter,
> >>>>>>>
> >>>>>>> at a first glance it doesn't look like a typical driver issue.
> >>>>>>> What you could do:
> >>>>>>>
> >>>>>>> - Test the r8169.c from 4.18 on top of 4.19.
> >>>>>>>
> >>>>>>> - Check whether disabling ASPM (/sys/module/pcie_aspm) has an effect.
> >>>>>>>
> >>>>>>> - Bisect between 4.18 and 4.19 to find the offending commit.
> >>>>>>>
> >>>>>>> Any specific reason why you think root cause is in the driver and not
> >>>>>>> elsewhere in the network subsystem?
> >>>>>>>
> >>>>>>> Heiner
> >>>>>>>
> >>>>>>>
> >>>>>>> On 28.01.2019 23:10, Peter Ceiley wrote:
> >>>>>>>> Hi Heiner,
> >>>>>>>>
> >>>>>>>> Thanks for getting back to me.
> >>>>>>>>
> >>>>>>>> No, I don't use jumbo packets.
> >>>>>>>>
> >>>>>>>> Bandwidth is *generally* good, and iperf results to my NAS provide
> >>>>>>>> over 900 Mbits/s in both circumstances. The issue seems to appear when
> >>>>>>>> establishing a connection and is most notable, for example, on my
> >>>>>>>> mounted NFS shares where it takes seconds (up to 10's of seconds on
> >>>>>>>> larger directories) to list the contents of each directory. Once a
> >>>>>>>> transfer begins on a file, I appear to get good bandwidth.
> >>>>>>>>
> >>>>>>>> I'm unsure of the best scientific data to provide you in order to
> >>>>>>>> troubleshoot this issue. Running the following
> >>>>>>>>
> >>>>>>>> netstat -s |grep retransmitted
> >>>>>>>>
> >>>>>>>> shows a steady increase in retransmitted segments each time I list the
> >>>>>>>> contents of a remote directory, for example, running 'ls' on a
> >>>>>>>> directory containing 345 media files did the following using kernel
> >>>>>>>> 4.19.18:
> >>>>>>>>
> >>>>>>>> increased retransmitted segments by 21 and the 'time' command showed
> >>>>>>>> the following:
> >>>>>>>> real 0m19.867s
> >>>>>>>> user 0m0.012s
> >>>>>>>> sys 0m0.036s
> >>>>>>>>
> >>>>>>>> The same command shows no retransmitted segments running kernel
> >>>>>>>> 4.18.16 and 'time' showed:
> >>>>>>>> real 0m0.300s
> >>>>>>>> user 0m0.004s
> >>>>>>>> sys 0m0.007s
> >>>>>>>>
> >>>>>>>> ifconfig does not show any RX/TX errors nor dropped packets in either case.
> >>>>>>>>
> >>>>>>>> dmesg XID:
> >>>>>>>> [ 2.979984] r8169 0000:03:00.0 eth0: RTL8168g/8111g,
> >>>>>>>> f8:b1:56:fe:67:e0, XID 4c000800, IRQ 32
> >>>>>>>>
> >>>>>>>> # lspci -vv
> >>>>>>>> 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
> >>>>>>>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
> >>>>>>>> Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
> >>>>>>>> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> >>>>>>>> ParErr- Stepping- SERR- FastB2B- DisINTx+
> >>>>>>>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> >>>>>>>> <TAbort- <MAbort- >SERR- <PERR- INTx-
> >>>>>>>> Latency: 0, Cache Line Size: 64 bytes
> >>>>>>>> Interrupt: pin A routed to IRQ 19
> >>>>>>>> Region 0: I/O ports at d000 [size=256]
> >>>>>>>> Region 2: Memory at f7b00000 (64-bit, non-prefetchable) [size=4K]
> >>>>>>>> Region 4: Memory at f2100000 (64-bit, prefetchable) [size=16K]
> >>>>>>>> Capabilities: [40] Power Management version 3
> >>>>>>>> Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
> >>>>>>>> PME(D0+,D1+,D2+,D3hot+,D3cold+)
> >>>>>>>> Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> >>>>>>>> Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
> >>>>>>>> Address: 0000000000000000 Data: 0000
> >>>>>>>> Capabilities: [70] Express (v2) Endpoint, MSI 01
> >>>>>>>> DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s
> >>>>>>>> <512ns, L1 <64us
> >>>>>>>> ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
> >>>>>>>> SlotPowerLimit 10.000W
> >>>>>>>> DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
> >>>>>>>> RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
> >>>>>>>> MaxPayload 128 bytes, MaxReadReq 4096 bytes
> >>>>>>>> DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
> >>>>>>>> LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
> >>>>>>>> Latency L0s unlimited, L1 <64us
> >>>>>>>> ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
> >>>>>>>> LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
> >>>>>>>> ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
> >>>>>>>> LnkSta: Speed 2.5GT/s (ok), Width x1 (ok)
> >>>>>>>> TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> >>>>>>>> DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+,
> >>>>>>>> OBFF Via message/WAKE#
> >>>>>>>> AtomicOpsCap: 32bit- 64bit- 128bitCAS-
> >>>>>>>> DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+,
> >>>>>>>> OBFF Disabled
> >>>>>>>> AtomicOpsCtl: ReqEn-
> >>>>>>>> LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
> >>>>>>>> Transmit Margin: Normal Operating Range,
> >>>>>>>> EnterModifiedCompliance- ComplianceSOS-
> >>>>>>>> Compliance De-emphasis: -6dB
> >>>>>>>> LnkSta2: Current De-emphasis Level: -6dB,
> >>>>>>>> EqualizationComplete-, EqualizationPhase1-
> >>>>>>>> EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
> >>>>>>>> Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
> >>>>>>>> Vector table: BAR=4 offset=00000000
> >>>>>>>> PBA: BAR=4 offset=00000800
> >>>>>>>> Capabilities: [d0] Vital Product Data
> >>>>>>>> pcilib: sysfs_read_vpd: read failed: Input/output error
> >>>>>>>> Not readable
> >>>>>>>> Capabilities: [100 v1] Advanced Error Reporting
> >>>>>>>> UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> >>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> >>>>>>>> UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> >>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> >>>>>>>> UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
> >>>>>>>> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> >>>>>>>> CESta: RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ AdvNonFatalErr-
> >>>>>>>> CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
> >>>>>>>> AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn-
> >>>>>>>> ECRCChkCap+ ECRCChkEn-
> >>>>>>>> MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
> >>>>>>>> HeaderLog: 00000000 00000000 00000000 00000000
> >>>>>>>> Capabilities: [140 v1] Virtual Channel
> >>>>>>>> Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
> >>>>>>>> Arb: Fixed- WRR32- WRR64- WRR128-
> >>>>>>>> Ctrl: ArbSelect=Fixed
> >>>>>>>> Status: InProgress-
> >>>>>>>> VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
> >>>>>>>> Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
> >>>>>>>> Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01
> >>>>>>>> Status: NegoPending- InProgress-
> >>>>>>>> Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
> >>>>>>>> Capabilities: [170 v1] Latency Tolerance Reporting
> >>>>>>>> Max snoop latency: 71680ns
> >>>>>>>> Max no snoop latency: 71680ns
> >>>>>>>> Kernel driver in use: r8169
> >>>>>>>> Kernel modules: r8169
> >>>>>>>>
> >>>>>>>> Please let me know if you have any other ideas in terms of testing.
> >>>>>>>>
> >>>>>>>> Thanks!
> >>>>>>>>
> >>>>>>>> Peter.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Tue, 29 Jan 2019 at 05:28, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>>>>>>>>
> >>>>>>>>> On 28.01.2019 12:13, Peter Ceiley wrote:
> >>>>>>>>>> Hi,
> >>>>>>>>>>
> >>>>>>>>>> I have been experiencing very poor network performance since Kernel
> >>>>>>>>>> 4.19 and I'm confident it's related to the r8169 driver.
> >>>>>>>>>>
> >>>>>>>>>> I have no issue with kernel versions 4.18 and prior. I am experiencing
> >>>>>>>>>> this issue in kernels 4.19 and 4.20 (currently running/testing with
> >>>>>>>>>> 4.20.4 & 4.19.18).
> >>>>>>>>>>
> >>>>>>>>>> If someone could guide me in the right direction, I'm happy to help
> >>>>>>>>>> troubleshoot this issue. Note that I have been keeping an eye on one
> >>>>>>>>>> issue related to loading of the PHY driver, however, my symptoms
> >>>>>>>>>> differ in that I still have a network connection. I have attempted to
> >>>>>>>>>> reload the driver on a running system, but this does not improve the
> >>>>>>>>>> situation.
> >>>>>>>>>>
> >>>>>>>>>> Using the proprietary r8168 driver returns my device to proper working order.
> >>>>>>>>>>
> >>>>>>>>>> lshw shows:
> >>>>>>>>>> description: Ethernet interface
> >>>>>>>>>> product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
> >>>>>>>>>> vendor: Realtek Semiconductor Co., Ltd.
> >>>>>>>>>> physical id: 0
> >>>>>>>>>> bus info: pci@0000:03:00.0
> >>>>>>>>>> logical name: enp3s0
> >>>>>>>>>> version: 0c
> >>>>>>>>>> serial:
> >>>>>>>>>> size: 1Gbit/s
> >>>>>>>>>> capacity: 1Gbit/s
> >>>>>>>>>> width: 64 bits
> >>>>>>>>>> clock: 33MHz
> >>>>>>>>>> capabilities: pm msi pciexpress msix vpd bus_master cap_list
> >>>>>>>>>> ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd
> >>>>>>>>>> 1000bt-fd autonegotiation
> >>>>>>>>>> configuration: autonegotiation=on broadcast=yes driver=r8169
> >>>>>>>>>> duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25
> >>>>>>>>>> latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
> >>>>>>>>>> resources: irq:19 ioport:d000(size=256)
> >>>>>>>>>> memory:f7b00000-f7b00fff memory:f2100000-f2103fff
> >>>>>>>>>>
> >>>>>>>>>> Kind Regards,
> >>>>>>>>>>
> >>>>>>>>>> Peter.
> >>>>>>>>>>
> >>>>>>>>> Hi Peter,
> >>>>>>>>>
> >>>>>>>>> the description "poor network performance" is quite vague, therefore:
> >>>>>>>>>
> >>>>>>>>> - Can you provide any measurements?
> >>>>>>>>> - iperf results before and after
> >>>>>>>>> - statistics about dropped packets (rx and/or tx)
> >>>>>>>>> - Do you use jumbo packets?
> >>>>>>>>>
> >>>>>>>>> Also help would be a "lspci -vv" output for the network card and
> >>>>>>>>> the dmesg output line with the chip XID.
> >>>>>>>>>
> >>>>>>>>> Heiner
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
>
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: r8169 Driver - Poor Network Performance Since Kernel 4.19
2019-02-01 4:29 ` David Chang
@ 2019-02-01 6:32 ` Heiner Kallweit
0 siblings, 0 replies; 25+ messages in thread
From: Heiner Kallweit @ 2019-02-01 6:32 UTC (permalink / raw)
To: David Chang
Cc: Peter Ceiley, Realtek linux nic maintainers, netdev, Martti Laaksonen
Thanks, however the register diff is a little hard to read.
Usually ethtool -d outputs something like this:
RealTek RTL8168g/8111g registers:
--------------------------------------------------------
0x00: MAC Address 00:01:2e:83:90:11
0x08: Multicast Address Filter 0x100000c0 0x00000084
0x10: Dump Tally Counter Command 0x78c43000 0x00000001
0x20: Tx Normal Priority Ring Addr 0x77cc4000 0x00000001
0x28: Tx High Priority Ring Addr 0x00000000 0x00000000
0x30: Flash memory read/write 0x00000000
0x34: Early Rx Byte Count 0
0x36: Early Rx Status 0x00
0x37: Command 0x0c
Rx on, Tx on
0x3C: Interrupt Mask 0x003f
LinkChg RxNoBuf TxErr TxOK RxErr RxOK
0x3E: Interrupt Status 0x0000
0x40: Tx Configuration 0x4f000f80
0x44: Rx Configuration 0x0002cf0e
0x48: Timer count 0x00000000
0x4C: Missed packet counter 0x000000
0x50: EEPROM Command 0x10
0x51: Config 0 0x00
0x52: Config 1 0xcf
0x53: Config 2 0x9c
0x54: Config 3 0x60
0x55: Config 4 0x50
0x56: Config 5 0x01
0x58: Timer interrupt 0x00000000
0x5C: Multiple Interrupt Select 0x0000
0x60: PHY access 0x00000000
0x64: TBI control and status 0x00000000
0x68: TBI Autonegotiation advertisement (ANAR) 0x0000
0x6A: TBI Link partner ability (LPAR) 0x0000
0x6C: PHY status 0xf3
..
On 01.02.2019 05:29, David Chang wrote:
> On Jan 31, 2019 at 07:35:30 +0100, Heiner Kallweit wrote:
>> Hi David, two more things:
>>
>> 1. Could you please test a recent linux-next kernel?
>
> Not tested yet. Will do if possible.
>
>> 2. Please get a register dump (ethtool -d <if>) from 4.18 and 4.19
>> and compare them.
>
> For your informaiton.
>
> [with pcie_aspm=off]
> --- v4.18.15 2019-02-01 12:11:56.019051828 +0800
> +++ v4.9.11 2019-02-01 12:12:26.827439645 +0800
> @@ -3,18 +3,19 @@
> Offset Values
> ------ ------
> 0x0000: ec 8e b5 5a 2c f5 00 00 48 00 40 00 80 00 80 00
> -0x0010: 00 10 38 0e 04 00 00 00 78 00 06 00 00 00 00 00
> -0x0020: 00 f0 9b f6 03 00 00 00 00 00 00 00 00 00 00 00
> +0x0010: 00 f0 ba 0d 04 00 00 00 78 00 06 00 00 00 00 00
> +0x0020: 00 d0 35 f7 03 00 00 00 00 00 00 00 00 00 00 00
> 0x0030: 00 00 00 00 00 00 00 0c 00 00 00 00 3f 80 00 00
> -0x0040: 80 0f 10 57 0e cf 02 00 00 cf ba 34 00 00 00 00
> -0x0050: 10 00 cf 18 60 11 00 01 11 11 11 00 00 00 00 00
> +0x0040: 80 0f 10 57 0e cf 02 00 00 d8 c7 50 00 00 00 00
> +0x0050: 10 00 cf 98 60 11 01 01 11 11 11 00 00 00 00 00
> 0x0060: 00 00 00 00 3c 10 00 81 2c f0 00 80 93 00 80 f0
> -0x0070: 00 6f 00 c4 b0 31 00 00 07 00 00 00 00 00 00 00
> +0x0070: 00 6f 00 c4 b0 31 00 00 07 00 00 00 00 00 76 d0
> 0x0080: 8b 06 01 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x0090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x00a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> -0x00b0: 7f 04 00 00 00 00 00 00 e1 c1 05 d2 00 00 00 00
> +0x00b0: 7f 04 00 00 00 00 00 00 ad 79 01 d2 00 00 00 00
> 0x00c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x00d0: 21 00 00 32 0e 00 00 00 00 00 00 40 06 11 fd 00
> -0x00e0: e1 20 51 51 00 30 94 f6 03 00 00 00 27 00 00 00
> +0x00e0: e1 20 51 51 00 e0 35 f7 03 00 00 00 27 00 00 00
> 0x00f0: 3f 00 00 00 00 00 00 00 03 00 00 00 00 00 00 00
>
> [pcie_aspm.policy=performance]
> --- v4.18.15-p 2019-02-01 12:18:46.919221060 +0800
> +++ v4.9.11-p 2019-02-01 12:19:09.207474824 +0800
> @@ -3,18 +3,19 @@
> Offset Values
> ------ ------
> 0x0000: ec 8e b5 5a 2c f5 00 00 48 00 40 00 80 00 80 00
> -0x0010: 00 f0 bc 0d 04 00 00 00 78 00 06 00 00 00 00 00
> -0x0020: 00 60 2e f7 03 00 00 00 00 00 00 00 00 00 00 00
> +0x0010: 00 c0 22 09 04 00 00 00 78 00 06 00 00 00 00 00
> +0x0020: 00 f0 e5 f4 03 00 00 00 00 00 00 00 00 00 00 00
> 0x0030: 00 00 00 00 00 00 00 0c 00 00 00 00 3f 80 00 00
> -0x0040: 80 0f 10 57 0e cf 02 00 00 53 50 1a 00 00 00 00
> -0x0050: 10 00 cf 18 60 11 00 01 11 11 11 00 00 00 00 00
> +0x0040: 80 0f 10 57 0e cf 02 00 00 d2 35 7b 00 00 00 00
> +0x0050: 10 00 cf 98 60 11 01 01 11 11 11 00 00 00 00 00
> 0x0060: 00 00 00 00 3c 10 00 81 2c f0 00 80 93 00 80 f0
> -0x0070: 00 6f 00 c4 b0 31 00 00 07 00 00 00 00 00 00 00
> +0x0070: 00 6f 00 c4 b0 31 00 00 07 00 00 00 00 00 a4 a0
> 0x0080: 8b 06 01 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x0090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x00a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> -0x00b0: 7f 04 00 00 00 00 00 00 e1 c1 05 d2 00 00 00 00
> +0x00b0: 7f 04 00 00 00 00 00 00 ad 79 01 d2 00 00 00 00
> 0x00c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 0x00d0: 21 00 00 32 0e 00 00 00 00 00 00 40 06 11 fd 00
> -0x00e0: e1 20 51 51 00 70 2e f7 03 00 00 00 27 00 00 00
> +0x00e0: e1 20 51 51 00 00 e6 f4 03 00 00 00 27 00 00 00
> 0x00f0: 3f 00 00 00 00 00 00 00 03 00 00 00 00 00 00 00
>
> Thanks,
> David
>
>> Heiner
>>
>>
>> On 31.01.2019 07:21, Heiner Kallweit wrote:
>>> David, thanks for the link to the bug ticket.
>>> I think only a proper bisect can help to find the offending commit.
>>>
>>> Heiner
>>>
>>>
>>> On 31.01.2019 03:32, David Chang wrote:
>>>> Hi,
>>>>
>>>> We had a similr case here.
>>>> - Realtek r8169 receive performance regression in kernel 4.19
>>>> https://bugzilla.suse.com/show_bug.cgi?id=1119649
>>>>
>>>> kernel: r8169 0000:01:00.0 eth0: RTL8168h/8111h, XID 54100880
>>>> The major symptom is there are many rx_missed count.
>>>>
>>>>
>>>> On Jan 30, 2019 at 20:15:45 +0100, Heiner Kallweit wrote:
>>>>> Hi Peter,
>>>>>
>>>>> recently I had somebody where pcie_aspm=off for whatever reason didn't
>>>>> do the trick, can you also check with pcie_aspm.policy=performance.
>>>>
>>>> We will give it a try later.
>>>>
>>>>> And please check with "ethtool -S <if>" whether the chip statistics
>>>>> show a significant number of errors.
>>>>>
>>>>> If this doesn't help you may have to bisect to find the offending commit.
>>>>
>>>> We had tried fallback driver to a few previous commits as following,
>>>> but with no luck.
>>>>
>>>> 9675931e6b65 r8169: re-enable MSI-X on RTL8168g (v4.19)
>>>> 098b01ad9837 r8169: don't include asm headers directly (v4.19-rc1)
>>>> a2965f12fde6 r8169: remove rtl8169_set_speed_xmii (v4.19-rc1)
>>>> 6fcf9b1d4d6c r8169: fix runtime suspend (v4.19-rc1)
>>>> e397286b8e89 r8169: remove TBI 1000BaseX support (v4.19-rc1)
>>>>
>>>> Thanks,
>>>> David Chang
>>>>
>>>>>
>>>>> Heiner
>>>>>
>>>>>
>>>>> On 30.01.2019 10:59, Peter Ceiley wrote:
>>>>>> Hi Heiner,
>>>>>>
>>>>>> I tried disabling the ASPM using the pcie_aspm=off kernel parameter
>>>>>> and this made no difference.
>>>>>>
>>>>>> I tried compiling the 4.18.16 r8169.c with the 4.19.18 source and
>>>>>> subsequently loaded the module in the running 4.19.18 kernel. I can
>>>>>> confirm that this immediately resolved the issue and access to the NFS
>>>>>> shares operated as expected.
>>>>>>
>>>>>> I presume this means it is an issue with the r8169 driver included in
>>>>>> 4.19 onwards?
>>>>>>
>>>>>> To answer your last questions:
>>>>>>
>>>>>> Base Board Information
>>>>>> Manufacturer: Alienware
>>>>>> Product Name: 0PGRP5
>>>>>> Version: A02
>>>>>>
>>>>>> ... and yes, the RTL8168 is the onboard network chip.
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Peter.
>>>>>>
>>>>>> On Tue, 29 Jan 2019 at 17:44, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>>>>
>>>>>>> Hi Peter,
>>>>>>>
>>>>>>> I think the vendor driver doesn't enable ASPM per default.
>>>>>>> So it's worth a try to disable ASPM in the BIOS or via sysfs.
>>>>>>> Few older systems seem to have issues with ASPM, what kind of
>>>>>>> system / mainboard are you using? The RTL8168 is the onboard
>>>>>>> network chip?
>>>>>>>
>>>>>>> Rgds, Heiner
>>>>>>>
>>>>>>>
>>>>>>> On 29.01.2019 07:20, Peter Ceiley wrote:
>>>>>>>> Hi Heiner,
>>>>>>>>
>>>>>>>> Thanks, I'll do some more testing. It might not be the driver - I
>>>>>>>> assumed it was due to the fact that using the r8168 driver 'resolves'
>>>>>>>> the issue. I'll see if I can test the r8169.c on top of 4.19 - this is
>>>>>>>> a good idea.
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>>
>>>>>>>> Peter.
>>>>>>>>
>>>>>>>> On Tue, 29 Jan 2019 at 17:16, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> Hi Peter,
>>>>>>>>>
>>>>>>>>> at a first glance it doesn't look like a typical driver issue.
>>>>>>>>> What you could do:
>>>>>>>>>
>>>>>>>>> - Test the r8169.c from 4.18 on top of 4.19.
>>>>>>>>>
>>>>>>>>> - Check whether disabling ASPM (/sys/module/pcie_aspm) has an effect.
>>>>>>>>>
>>>>>>>>> - Bisect between 4.18 and 4.19 to find the offending commit.
>>>>>>>>>
>>>>>>>>> Any specific reason why you think root cause is in the driver and not
>>>>>>>>> elsewhere in the network subsystem?
>>>>>>>>>
>>>>>>>>> Heiner
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 28.01.2019 23:10, Peter Ceiley wrote:
>>>>>>>>>> Hi Heiner,
>>>>>>>>>>
>>>>>>>>>> Thanks for getting back to me.
>>>>>>>>>>
>>>>>>>>>> No, I don't use jumbo packets.
>>>>>>>>>>
>>>>>>>>>> Bandwidth is *generally* good, and iperf results to my NAS provide
>>>>>>>>>> over 900 Mbits/s in both circumstances. The issue seems to appear when
>>>>>>>>>> establishing a connection and is most notable, for example, on my
>>>>>>>>>> mounted NFS shares where it takes seconds (up to 10's of seconds on
>>>>>>>>>> larger directories) to list the contents of each directory. Once a
>>>>>>>>>> transfer begins on a file, I appear to get good bandwidth.
>>>>>>>>>>
>>>>>>>>>> I'm unsure of the best scientific data to provide you in order to
>>>>>>>>>> troubleshoot this issue. Running the following
>>>>>>>>>>
>>>>>>>>>> netstat -s |grep retransmitted
>>>>>>>>>>
>>>>>>>>>> shows a steady increase in retransmitted segments each time I list the
>>>>>>>>>> contents of a remote directory, for example, running 'ls' on a
>>>>>>>>>> directory containing 345 media files did the following using kernel
>>>>>>>>>> 4.19.18:
>>>>>>>>>>
>>>>>>>>>> increased retransmitted segments by 21 and the 'time' command showed
>>>>>>>>>> the following:
>>>>>>>>>> real 0m19.867s
>>>>>>>>>> user 0m0.012s
>>>>>>>>>> sys 0m0.036s
>>>>>>>>>>
>>>>>>>>>> The same command shows no retransmitted segments running kernel
>>>>>>>>>> 4.18.16 and 'time' showed:
>>>>>>>>>> real 0m0.300s
>>>>>>>>>> user 0m0.004s
>>>>>>>>>> sys 0m0.007s
>>>>>>>>>>
>>>>>>>>>> ifconfig does not show any RX/TX errors nor dropped packets in either case.
>>>>>>>>>>
>>>>>>>>>> dmesg XID:
>>>>>>>>>> [ 2.979984] r8169 0000:03:00.0 eth0: RTL8168g/8111g,
>>>>>>>>>> f8:b1:56:fe:67:e0, XID 4c000800, IRQ 32
>>>>>>>>>>
>>>>>>>>>> # lspci -vv
>>>>>>>>>> 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
>>>>>>>>>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
>>>>>>>>>> Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
>>>>>>>>>> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
>>>>>>>>>> ParErr- Stepping- SERR- FastB2B- DisINTx+
>>>>>>>>>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
>>>>>>>>>> <TAbort- <MAbort- >SERR- <PERR- INTx-
>>>>>>>>>> Latency: 0, Cache Line Size: 64 bytes
>>>>>>>>>> Interrupt: pin A routed to IRQ 19
>>>>>>>>>> Region 0: I/O ports at d000 [size=256]
>>>>>>>>>> Region 2: Memory at f7b00000 (64-bit, non-prefetchable) [size=4K]
>>>>>>>>>> Region 4: Memory at f2100000 (64-bit, prefetchable) [size=16K]
>>>>>>>>>> Capabilities: [40] Power Management version 3
>>>>>>>>>> Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
>>>>>>>>>> PME(D0+,D1+,D2+,D3hot+,D3cold+)
>>>>>>>>>> Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
>>>>>>>>>> Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
>>>>>>>>>> Address: 0000000000000000 Data: 0000
>>>>>>>>>> Capabilities: [70] Express (v2) Endpoint, MSI 01
>>>>>>>>>> DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s
>>>>>>>>>> <512ns, L1 <64us
>>>>>>>>>> ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
>>>>>>>>>> SlotPowerLimit 10.000W
>>>>>>>>>> DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
>>>>>>>>>> RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
>>>>>>>>>> MaxPayload 128 bytes, MaxReadReq 4096 bytes
>>>>>>>>>> DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
>>>>>>>>>> LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
>>>>>>>>>> Latency L0s unlimited, L1 <64us
>>>>>>>>>> ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
>>>>>>>>>> LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
>>>>>>>>>> ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
>>>>>>>>>> LnkSta: Speed 2.5GT/s (ok), Width x1 (ok)
>>>>>>>>>> TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
>>>>>>>>>> DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+,
>>>>>>>>>> OBFF Via message/WAKE#
>>>>>>>>>> AtomicOpsCap: 32bit- 64bit- 128bitCAS-
>>>>>>>>>> DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+,
>>>>>>>>>> OBFF Disabled
>>>>>>>>>> AtomicOpsCtl: ReqEn-
>>>>>>>>>> LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
>>>>>>>>>> Transmit Margin: Normal Operating Range,
>>>>>>>>>> EnterModifiedCompliance- ComplianceSOS-
>>>>>>>>>> Compliance De-emphasis: -6dB
>>>>>>>>>> LnkSta2: Current De-emphasis Level: -6dB,
>>>>>>>>>> EqualizationComplete-, EqualizationPhase1-
>>>>>>>>>> EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
>>>>>>>>>> Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
>>>>>>>>>> Vector table: BAR=4 offset=00000000
>>>>>>>>>> PBA: BAR=4 offset=00000800
>>>>>>>>>> Capabilities: [d0] Vital Product Data
>>>>>>>>>> pcilib: sysfs_read_vpd: read failed: Input/output error
>>>>>>>>>> Not readable
>>>>>>>>>> Capabilities: [100 v1] Advanced Error Reporting
>>>>>>>>>> UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
>>>>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>>>>>>>>> UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
>>>>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>>>>>>>>> UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
>>>>>>>>>> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>>>>>>>>>> CESta: RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ AdvNonFatalErr-
>>>>>>>>>> CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
>>>>>>>>>> AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn-
>>>>>>>>>> ECRCChkCap+ ECRCChkEn-
>>>>>>>>>> MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
>>>>>>>>>> HeaderLog: 00000000 00000000 00000000 00000000
>>>>>>>>>> Capabilities: [140 v1] Virtual Channel
>>>>>>>>>> Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
>>>>>>>>>> Arb: Fixed- WRR32- WRR64- WRR128-
>>>>>>>>>> Ctrl: ArbSelect=Fixed
>>>>>>>>>> Status: InProgress-
>>>>>>>>>> VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
>>>>>>>>>> Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
>>>>>>>>>> Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01
>>>>>>>>>> Status: NegoPending- InProgress-
>>>>>>>>>> Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
>>>>>>>>>> Capabilities: [170 v1] Latency Tolerance Reporting
>>>>>>>>>> Max snoop latency: 71680ns
>>>>>>>>>> Max no snoop latency: 71680ns
>>>>>>>>>> Kernel driver in use: r8169
>>>>>>>>>> Kernel modules: r8169
>>>>>>>>>>
>>>>>>>>>> Please let me know if you have any other ideas in terms of testing.
>>>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>>
>>>>>>>>>> Peter.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, 29 Jan 2019 at 05:28, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> On 28.01.2019 12:13, Peter Ceiley wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> I have been experiencing very poor network performance since Kernel
>>>>>>>>>>>> 4.19 and I'm confident it's related to the r8169 driver.
>>>>>>>>>>>>
>>>>>>>>>>>> I have no issue with kernel versions 4.18 and prior. I am experiencing
>>>>>>>>>>>> this issue in kernels 4.19 and 4.20 (currently running/testing with
>>>>>>>>>>>> 4.20.4 & 4.19.18).
>>>>>>>>>>>>
>>>>>>>>>>>> If someone could guide me in the right direction, I'm happy to help
>>>>>>>>>>>> troubleshoot this issue. Note that I have been keeping an eye on one
>>>>>>>>>>>> issue related to loading of the PHY driver, however, my symptoms
>>>>>>>>>>>> differ in that I still have a network connection. I have attempted to
>>>>>>>>>>>> reload the driver on a running system, but this does not improve the
>>>>>>>>>>>> situation.
>>>>>>>>>>>>
>>>>>>>>>>>> Using the proprietary r8168 driver returns my device to proper working order.
>>>>>>>>>>>>
>>>>>>>>>>>> lshw shows:
>>>>>>>>>>>> description: Ethernet interface
>>>>>>>>>>>> product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
>>>>>>>>>>>> vendor: Realtek Semiconductor Co., Ltd.
>>>>>>>>>>>> physical id: 0
>>>>>>>>>>>> bus info: pci@0000:03:00.0
>>>>>>>>>>>> logical name: enp3s0
>>>>>>>>>>>> version: 0c
>>>>>>>>>>>> serial:
>>>>>>>>>>>> size: 1Gbit/s
>>>>>>>>>>>> capacity: 1Gbit/s
>>>>>>>>>>>> width: 64 bits
>>>>>>>>>>>> clock: 33MHz
>>>>>>>>>>>> capabilities: pm msi pciexpress msix vpd bus_master cap_list
>>>>>>>>>>>> ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd
>>>>>>>>>>>> 1000bt-fd autonegotiation
>>>>>>>>>>>> configuration: autonegotiation=on broadcast=yes driver=r8169
>>>>>>>>>>>> duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25
>>>>>>>>>>>> latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
>>>>>>>>>>>> resources: irq:19 ioport:d000(size=256)
>>>>>>>>>>>> memory:f7b00000-f7b00fff memory:f2100000-f2103fff
>>>>>>>>>>>>
>>>>>>>>>>>> Kind Regards,
>>>>>>>>>>>>
>>>>>>>>>>>> Peter.
>>>>>>>>>>>>
>>>>>>>>>>> Hi Peter,
>>>>>>>>>>>
>>>>>>>>>>> the description "poor network performance" is quite vague, therefore:
>>>>>>>>>>>
>>>>>>>>>>> - Can you provide any measurements?
>>>>>>>>>>> - iperf results before and after
>>>>>>>>>>> - statistics about dropped packets (rx and/or tx)
>>>>>>>>>>> - Do you use jumbo packets?
>>>>>>>>>>>
>>>>>>>>>>> Also help would be a "lspci -vv" output for the network card and
>>>>>>>>>>> the dmesg output line with the chip XID.
>>>>>>>>>>>
>>>>>>>>>>> Heiner
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>>
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: r8169 Driver - Poor Network Performance Since Kernel 4.19
2019-01-31 2:32 ` David Chang
2019-01-31 6:21 ` Heiner Kallweit
@ 2019-02-02 12:25 ` Heiner Kallweit
2019-02-05 18:50 ` Heiner Kallweit
2 siblings, 0 replies; 25+ messages in thread
From: Heiner Kallweit @ 2019-02-02 12:25 UTC (permalink / raw)
To: David Chang; +Cc: Realtek linux nic maintainers, netdev
Hi David,
to check another potential incompatibility:
Could you please test a 4.19 version with the following line disabled.
Rgds, Heiner
---
drivers/net/ethernet/realtek/r8169.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index e8a112149..6ef89f518 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -5334,7 +5334,7 @@ static void rtl_hw_start_8168h_1(struct rtl8169_private *tp)
r8168_mac_ocp_write(tp, 0xc094, 0x0000);
r8168_mac_ocp_write(tp, 0xc09e, 0x0000);
- rtl_hw_aspm_clkreq_enable(tp, true);
+ // rtl_hw_aspm_clkreq_enable(tp, true);
}
static void rtl_hw_start_8168ep(struct rtl8169_private *tp)
--
2.20.1
On 31.01.2019 03:32, David Chang wrote:
> Hi,
>
> We had a similr case here.
> - Realtek r8169 receive performance regression in kernel 4.19
> https://bugzilla.suse.com/show_bug.cgi?id=1119649
>
> kernel: r8169 0000:01:00.0 eth0: RTL8168h/8111h, XID 54100880
> The major symptom is there are many rx_missed count.
>
>
> On Jan 30, 2019 at 20:15:45 +0100, Heiner Kallweit wrote:
>> Hi Peter,
>>
>> recently I had somebody where pcie_aspm=off for whatever reason didn't
>> do the trick, can you also check with pcie_aspm.policy=performance.
>
> We will give it a try later.
>
>> And please check with "ethtool -S <if>" whether the chip statistics
>> show a significant number of errors.
>>
>> If this doesn't help you may have to bisect to find the offending commit.
>
> We had tried fallback driver to a few previous commits as following,
> but with no luck.
>
> 9675931e6b65 r8169: re-enable MSI-X on RTL8168g (v4.19)
> 098b01ad9837 r8169: don't include asm headers directly (v4.19-rc1)
> a2965f12fde6 r8169: remove rtl8169_set_speed_xmii (v4.19-rc1)
> 6fcf9b1d4d6c r8169: fix runtime suspend (v4.19-rc1)
> e397286b8e89 r8169: remove TBI 1000BaseX support (v4.19-rc1)
>
> Thanks,
> David Chang
>
>>
>> Heiner
>>
>>
>> On 30.01.2019 10:59, Peter Ceiley wrote:
>>> Hi Heiner,
>>>
>>> I tried disabling the ASPM using the pcie_aspm=off kernel parameter
>>> and this made no difference.
>>>
>>> I tried compiling the 4.18.16 r8169.c with the 4.19.18 source and
>>> subsequently loaded the module in the running 4.19.18 kernel. I can
>>> confirm that this immediately resolved the issue and access to the NFS
>>> shares operated as expected.
>>>
>>> I presume this means it is an issue with the r8169 driver included in
>>> 4.19 onwards?
>>>
>>> To answer your last questions:
>>>
>>> Base Board Information
>>> Manufacturer: Alienware
>>> Product Name: 0PGRP5
>>> Version: A02
>>>
>>> ... and yes, the RTL8168 is the onboard network chip.
>>>
>>> Regards,
>>>
>>> Peter.
>>>
>>> On Tue, 29 Jan 2019 at 17:44, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>
>>>> Hi Peter,
>>>>
>>>> I think the vendor driver doesn't enable ASPM per default.
>>>> So it's worth a try to disable ASPM in the BIOS or via sysfs.
>>>> Few older systems seem to have issues with ASPM, what kind of
>>>> system / mainboard are you using? The RTL8168 is the onboard
>>>> network chip?
>>>>
>>>> Rgds, Heiner
>>>>
>>>>
>>>> On 29.01.2019 07:20, Peter Ceiley wrote:
>>>>> Hi Heiner,
>>>>>
>>>>> Thanks, I'll do some more testing. It might not be the driver - I
>>>>> assumed it was due to the fact that using the r8168 driver 'resolves'
>>>>> the issue. I'll see if I can test the r8169.c on top of 4.19 - this is
>>>>> a good idea.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Peter.
>>>>>
>>>>> On Tue, 29 Jan 2019 at 17:16, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>>>
>>>>>> Hi Peter,
>>>>>>
>>>>>> at a first glance it doesn't look like a typical driver issue.
>>>>>> What you could do:
>>>>>>
>>>>>> - Test the r8169.c from 4.18 on top of 4.19.
>>>>>>
>>>>>> - Check whether disabling ASPM (/sys/module/pcie_aspm) has an effect.
>>>>>>
>>>>>> - Bisect between 4.18 and 4.19 to find the offending commit.
>>>>>>
>>>>>> Any specific reason why you think root cause is in the driver and not
>>>>>> elsewhere in the network subsystem?
>>>>>>
>>>>>> Heiner
>>>>>>
>>>>>>
>>>>>> On 28.01.2019 23:10, Peter Ceiley wrote:
>>>>>>> Hi Heiner,
>>>>>>>
>>>>>>> Thanks for getting back to me.
>>>>>>>
>>>>>>> No, I don't use jumbo packets.
>>>>>>>
>>>>>>> Bandwidth is *generally* good, and iperf results to my NAS provide
>>>>>>> over 900 Mbits/s in both circumstances. The issue seems to appear when
>>>>>>> establishing a connection and is most notable, for example, on my
>>>>>>> mounted NFS shares where it takes seconds (up to 10's of seconds on
>>>>>>> larger directories) to list the contents of each directory. Once a
>>>>>>> transfer begins on a file, I appear to get good bandwidth.
>>>>>>>
>>>>>>> I'm unsure of the best scientific data to provide you in order to
>>>>>>> troubleshoot this issue. Running the following
>>>>>>>
>>>>>>> netstat -s |grep retransmitted
>>>>>>>
>>>>>>> shows a steady increase in retransmitted segments each time I list the
>>>>>>> contents of a remote directory, for example, running 'ls' on a
>>>>>>> directory containing 345 media files did the following using kernel
>>>>>>> 4.19.18:
>>>>>>>
>>>>>>> increased retransmitted segments by 21 and the 'time' command showed
>>>>>>> the following:
>>>>>>> real 0m19.867s
>>>>>>> user 0m0.012s
>>>>>>> sys 0m0.036s
>>>>>>>
>>>>>>> The same command shows no retransmitted segments running kernel
>>>>>>> 4.18.16 and 'time' showed:
>>>>>>> real 0m0.300s
>>>>>>> user 0m0.004s
>>>>>>> sys 0m0.007s
>>>>>>>
>>>>>>> ifconfig does not show any RX/TX errors nor dropped packets in either case.
>>>>>>>
>>>>>>> dmesg XID:
>>>>>>> [ 2.979984] r8169 0000:03:00.0 eth0: RTL8168g/8111g,
>>>>>>> f8:b1:56:fe:67:e0, XID 4c000800, IRQ 32
>>>>>>>
>>>>>>> # lspci -vv
>>>>>>> 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
>>>>>>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
>>>>>>> Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
>>>>>>> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
>>>>>>> ParErr- Stepping- SERR- FastB2B- DisINTx+
>>>>>>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
>>>>>>> <TAbort- <MAbort- >SERR- <PERR- INTx-
>>>>>>> Latency: 0, Cache Line Size: 64 bytes
>>>>>>> Interrupt: pin A routed to IRQ 19
>>>>>>> Region 0: I/O ports at d000 [size=256]
>>>>>>> Region 2: Memory at f7b00000 (64-bit, non-prefetchable) [size=4K]
>>>>>>> Region 4: Memory at f2100000 (64-bit, prefetchable) [size=16K]
>>>>>>> Capabilities: [40] Power Management version 3
>>>>>>> Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
>>>>>>> PME(D0+,D1+,D2+,D3hot+,D3cold+)
>>>>>>> Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
>>>>>>> Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
>>>>>>> Address: 0000000000000000 Data: 0000
>>>>>>> Capabilities: [70] Express (v2) Endpoint, MSI 01
>>>>>>> DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s
>>>>>>> <512ns, L1 <64us
>>>>>>> ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
>>>>>>> SlotPowerLimit 10.000W
>>>>>>> DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
>>>>>>> RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
>>>>>>> MaxPayload 128 bytes, MaxReadReq 4096 bytes
>>>>>>> DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
>>>>>>> LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
>>>>>>> Latency L0s unlimited, L1 <64us
>>>>>>> ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
>>>>>>> LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
>>>>>>> ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
>>>>>>> LnkSta: Speed 2.5GT/s (ok), Width x1 (ok)
>>>>>>> TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
>>>>>>> DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+,
>>>>>>> OBFF Via message/WAKE#
>>>>>>> AtomicOpsCap: 32bit- 64bit- 128bitCAS-
>>>>>>> DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+,
>>>>>>> OBFF Disabled
>>>>>>> AtomicOpsCtl: ReqEn-
>>>>>>> LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
>>>>>>> Transmit Margin: Normal Operating Range,
>>>>>>> EnterModifiedCompliance- ComplianceSOS-
>>>>>>> Compliance De-emphasis: -6dB
>>>>>>> LnkSta2: Current De-emphasis Level: -6dB,
>>>>>>> EqualizationComplete-, EqualizationPhase1-
>>>>>>> EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
>>>>>>> Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
>>>>>>> Vector table: BAR=4 offset=00000000
>>>>>>> PBA: BAR=4 offset=00000800
>>>>>>> Capabilities: [d0] Vital Product Data
>>>>>>> pcilib: sysfs_read_vpd: read failed: Input/output error
>>>>>>> Not readable
>>>>>>> Capabilities: [100 v1] Advanced Error Reporting
>>>>>>> UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>>>>>> UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>>>>>> UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
>>>>>>> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>>>>>>> CESta: RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ AdvNonFatalErr-
>>>>>>> CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
>>>>>>> AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn-
>>>>>>> ECRCChkCap+ ECRCChkEn-
>>>>>>> MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
>>>>>>> HeaderLog: 00000000 00000000 00000000 00000000
>>>>>>> Capabilities: [140 v1] Virtual Channel
>>>>>>> Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
>>>>>>> Arb: Fixed- WRR32- WRR64- WRR128-
>>>>>>> Ctrl: ArbSelect=Fixed
>>>>>>> Status: InProgress-
>>>>>>> VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
>>>>>>> Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
>>>>>>> Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01
>>>>>>> Status: NegoPending- InProgress-
>>>>>>> Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
>>>>>>> Capabilities: [170 v1] Latency Tolerance Reporting
>>>>>>> Max snoop latency: 71680ns
>>>>>>> Max no snoop latency: 71680ns
>>>>>>> Kernel driver in use: r8169
>>>>>>> Kernel modules: r8169
>>>>>>>
>>>>>>> Please let me know if you have any other ideas in terms of testing.
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>> Peter.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, 29 Jan 2019 at 05:28, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>>>>>
>>>>>>>> On 28.01.2019 12:13, Peter Ceiley wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I have been experiencing very poor network performance since Kernel
>>>>>>>>> 4.19 and I'm confident it's related to the r8169 driver.
>>>>>>>>>
>>>>>>>>> I have no issue with kernel versions 4.18 and prior. I am experiencing
>>>>>>>>> this issue in kernels 4.19 and 4.20 (currently running/testing with
>>>>>>>>> 4.20.4 & 4.19.18).
>>>>>>>>>
>>>>>>>>> If someone could guide me in the right direction, I'm happy to help
>>>>>>>>> troubleshoot this issue. Note that I have been keeping an eye on one
>>>>>>>>> issue related to loading of the PHY driver, however, my symptoms
>>>>>>>>> differ in that I still have a network connection. I have attempted to
>>>>>>>>> reload the driver on a running system, but this does not improve the
>>>>>>>>> situation.
>>>>>>>>>
>>>>>>>>> Using the proprietary r8168 driver returns my device to proper working order.
>>>>>>>>>
>>>>>>>>> lshw shows:
>>>>>>>>> description: Ethernet interface
>>>>>>>>> product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
>>>>>>>>> vendor: Realtek Semiconductor Co., Ltd.
>>>>>>>>> physical id: 0
>>>>>>>>> bus info: pci@0000:03:00.0
>>>>>>>>> logical name: enp3s0
>>>>>>>>> version: 0c
>>>>>>>>> serial:
>>>>>>>>> size: 1Gbit/s
>>>>>>>>> capacity: 1Gbit/s
>>>>>>>>> width: 64 bits
>>>>>>>>> clock: 33MHz
>>>>>>>>> capabilities: pm msi pciexpress msix vpd bus_master cap_list
>>>>>>>>> ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd
>>>>>>>>> 1000bt-fd autonegotiation
>>>>>>>>> configuration: autonegotiation=on broadcast=yes driver=r8169
>>>>>>>>> duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25
>>>>>>>>> latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
>>>>>>>>> resources: irq:19 ioport:d000(size=256)
>>>>>>>>> memory:f7b00000-f7b00fff memory:f2100000-f2103fff
>>>>>>>>>
>>>>>>>>> Kind Regards,
>>>>>>>>>
>>>>>>>>> Peter.
>>>>>>>>>
>>>>>>>> Hi Peter,
>>>>>>>>
>>>>>>>> the description "poor network performance" is quite vague, therefore:
>>>>>>>>
>>>>>>>> - Can you provide any measurements?
>>>>>>>> - iperf results before and after
>>>>>>>> - statistics about dropped packets (rx and/or tx)
>>>>>>>> - Do you use jumbo packets?
>>>>>>>>
>>>>>>>> Also help would be a "lspci -vv" output for the network card and
>>>>>>>> the dmesg output line with the chip XID.
>>>>>>>>
>>>>>>>> Heiner
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: r8169 Driver - Poor Network Performance Since Kernel 4.19
2019-01-31 2:32 ` David Chang
2019-01-31 6:21 ` Heiner Kallweit
2019-02-02 12:25 ` Heiner Kallweit
@ 2019-02-05 18:50 ` Heiner Kallweit
2019-02-05 18:53 ` Heiner Kallweit
` (2 more replies)
2 siblings, 3 replies; 25+ messages in thread
From: Heiner Kallweit @ 2019-02-05 18:50 UTC (permalink / raw)
To: David Chang; +Cc: Realtek linux nic maintainers, netdev
Hi David,
meanwhile there's the following bug report matching what reported.
It's even the same chip version (RTL8168h).
https://bugzilla.redhat.com/show_bug.cgi?id=1671958
Symptom there is also a significant number of rx_missed packets.
Could you try what I mentioned there last:
Try building a kernel with the call to rtl_hw_aspm_clkreq_enable(tp, true) at the
end of rtl_hw_start_8168h_1() being disabled.
Heiner
On 31.01.2019 03:32, David Chang wrote:
> Hi,
>
> We had a similr case here.
> - Realtek r8169 receive performance regression in kernel 4.19
> https://bugzilla.suse.com/show_bug.cgi?id=1119649
>
> kernel: r8169 0000:01:00.0 eth0: RTL8168h/8111h, XID 54100880
> The major symptom is there are many rx_missed count.
>
>
> On Jan 30, 2019 at 20:15:45 +0100, Heiner Kallweit wrote:
>> Hi Peter,
>>
>> recently I had somebody where pcie_aspm=off for whatever reason didn't
>> do the trick, can you also check with pcie_aspm.policy=performance.
>
> We will give it a try later.
>
>> And please check with "ethtool -S <if>" whether the chip statistics
>> show a significant number of errors.
>>
>> If this doesn't help you may have to bisect to find the offending commit.
>
> We had tried fallback driver to a few previous commits as following,
> but with no luck.
>
> 9675931e6b65 r8169: re-enable MSI-X on RTL8168g (v4.19)
> 098b01ad9837 r8169: don't include asm headers directly (v4.19-rc1)
> a2965f12fde6 r8169: remove rtl8169_set_speed_xmii (v4.19-rc1)
> 6fcf9b1d4d6c r8169: fix runtime suspend (v4.19-rc1)
> e397286b8e89 r8169: remove TBI 1000BaseX support (v4.19-rc1)
>
> Thanks,
> David Chang
>
>>
>> Heiner
>>
>>
>> On 30.01.2019 10:59, Peter Ceiley wrote:
>>> Hi Heiner,
>>>
>>> I tried disabling the ASPM using the pcie_aspm=off kernel parameter
>>> and this made no difference.
>>>
>>> I tried compiling the 4.18.16 r8169.c with the 4.19.18 source and
>>> subsequently loaded the module in the running 4.19.18 kernel. I can
>>> confirm that this immediately resolved the issue and access to the NFS
>>> shares operated as expected.
>>>
>>> I presume this means it is an issue with the r8169 driver included in
>>> 4.19 onwards?
>>>
>>> To answer your last questions:
>>>
>>> Base Board Information
>>> Manufacturer: Alienware
>>> Product Name: 0PGRP5
>>> Version: A02
>>>
>>> ... and yes, the RTL8168 is the onboard network chip.
>>>
>>> Regards,
>>>
>>> Peter.
>>>
>>> On Tue, 29 Jan 2019 at 17:44, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>
>>>> Hi Peter,
>>>>
>>>> I think the vendor driver doesn't enable ASPM per default.
>>>> So it's worth a try to disable ASPM in the BIOS or via sysfs.
>>>> Few older systems seem to have issues with ASPM, what kind of
>>>> system / mainboard are you using? The RTL8168 is the onboard
>>>> network chip?
>>>>
>>>> Rgds, Heiner
>>>>
>>>>
>>>> On 29.01.2019 07:20, Peter Ceiley wrote:
>>>>> Hi Heiner,
>>>>>
>>>>> Thanks, I'll do some more testing. It might not be the driver - I
>>>>> assumed it was due to the fact that using the r8168 driver 'resolves'
>>>>> the issue. I'll see if I can test the r8169.c on top of 4.19 - this is
>>>>> a good idea.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Peter.
>>>>>
>>>>> On Tue, 29 Jan 2019 at 17:16, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>>>
>>>>>> Hi Peter,
>>>>>>
>>>>>> at a first glance it doesn't look like a typical driver issue.
>>>>>> What you could do:
>>>>>>
>>>>>> - Test the r8169.c from 4.18 on top of 4.19.
>>>>>>
>>>>>> - Check whether disabling ASPM (/sys/module/pcie_aspm) has an effect.
>>>>>>
>>>>>> - Bisect between 4.18 and 4.19 to find the offending commit.
>>>>>>
>>>>>> Any specific reason why you think root cause is in the driver and not
>>>>>> elsewhere in the network subsystem?
>>>>>>
>>>>>> Heiner
>>>>>>
>>>>>>
>>>>>> On 28.01.2019 23:10, Peter Ceiley wrote:
>>>>>>> Hi Heiner,
>>>>>>>
>>>>>>> Thanks for getting back to me.
>>>>>>>
>>>>>>> No, I don't use jumbo packets.
>>>>>>>
>>>>>>> Bandwidth is *generally* good, and iperf results to my NAS provide
>>>>>>> over 900 Mbits/s in both circumstances. The issue seems to appear when
>>>>>>> establishing a connection and is most notable, for example, on my
>>>>>>> mounted NFS shares where it takes seconds (up to 10's of seconds on
>>>>>>> larger directories) to list the contents of each directory. Once a
>>>>>>> transfer begins on a file, I appear to get good bandwidth.
>>>>>>>
>>>>>>> I'm unsure of the best scientific data to provide you in order to
>>>>>>> troubleshoot this issue. Running the following
>>>>>>>
>>>>>>> netstat -s |grep retransmitted
>>>>>>>
>>>>>>> shows a steady increase in retransmitted segments each time I list the
>>>>>>> contents of a remote directory, for example, running 'ls' on a
>>>>>>> directory containing 345 media files did the following using kernel
>>>>>>> 4.19.18:
>>>>>>>
>>>>>>> increased retransmitted segments by 21 and the 'time' command showed
>>>>>>> the following:
>>>>>>> real 0m19.867s
>>>>>>> user 0m0.012s
>>>>>>> sys 0m0.036s
>>>>>>>
>>>>>>> The same command shows no retransmitted segments running kernel
>>>>>>> 4.18.16 and 'time' showed:
>>>>>>> real 0m0.300s
>>>>>>> user 0m0.004s
>>>>>>> sys 0m0.007s
>>>>>>>
>>>>>>> ifconfig does not show any RX/TX errors nor dropped packets in either case.
>>>>>>>
>>>>>>> dmesg XID:
>>>>>>> [ 2.979984] r8169 0000:03:00.0 eth0: RTL8168g/8111g,
>>>>>>> f8:b1:56:fe:67:e0, XID 4c000800, IRQ 32
>>>>>>>
>>>>>>> # lspci -vv
>>>>>>> 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
>>>>>>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
>>>>>>> Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
>>>>>>> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
>>>>>>> ParErr- Stepping- SERR- FastB2B- DisINTx+
>>>>>>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
>>>>>>> <TAbort- <MAbort- >SERR- <PERR- INTx-
>>>>>>> Latency: 0, Cache Line Size: 64 bytes
>>>>>>> Interrupt: pin A routed to IRQ 19
>>>>>>> Region 0: I/O ports at d000 [size=256]
>>>>>>> Region 2: Memory at f7b00000 (64-bit, non-prefetchable) [size=4K]
>>>>>>> Region 4: Memory at f2100000 (64-bit, prefetchable) [size=16K]
>>>>>>> Capabilities: [40] Power Management version 3
>>>>>>> Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
>>>>>>> PME(D0+,D1+,D2+,D3hot+,D3cold+)
>>>>>>> Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
>>>>>>> Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
>>>>>>> Address: 0000000000000000 Data: 0000
>>>>>>> Capabilities: [70] Express (v2) Endpoint, MSI 01
>>>>>>> DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s
>>>>>>> <512ns, L1 <64us
>>>>>>> ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
>>>>>>> SlotPowerLimit 10.000W
>>>>>>> DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
>>>>>>> RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
>>>>>>> MaxPayload 128 bytes, MaxReadReq 4096 bytes
>>>>>>> DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
>>>>>>> LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
>>>>>>> Latency L0s unlimited, L1 <64us
>>>>>>> ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
>>>>>>> LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
>>>>>>> ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
>>>>>>> LnkSta: Speed 2.5GT/s (ok), Width x1 (ok)
>>>>>>> TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
>>>>>>> DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+,
>>>>>>> OBFF Via message/WAKE#
>>>>>>> AtomicOpsCap: 32bit- 64bit- 128bitCAS-
>>>>>>> DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+,
>>>>>>> OBFF Disabled
>>>>>>> AtomicOpsCtl: ReqEn-
>>>>>>> LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
>>>>>>> Transmit Margin: Normal Operating Range,
>>>>>>> EnterModifiedCompliance- ComplianceSOS-
>>>>>>> Compliance De-emphasis: -6dB
>>>>>>> LnkSta2: Current De-emphasis Level: -6dB,
>>>>>>> EqualizationComplete-, EqualizationPhase1-
>>>>>>> EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
>>>>>>> Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
>>>>>>> Vector table: BAR=4 offset=00000000
>>>>>>> PBA: BAR=4 offset=00000800
>>>>>>> Capabilities: [d0] Vital Product Data
>>>>>>> pcilib: sysfs_read_vpd: read failed: Input/output error
>>>>>>> Not readable
>>>>>>> Capabilities: [100 v1] Advanced Error Reporting
>>>>>>> UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>>>>>> UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>>>>>> UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
>>>>>>> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>>>>>>> CESta: RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ AdvNonFatalErr-
>>>>>>> CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
>>>>>>> AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn-
>>>>>>> ECRCChkCap+ ECRCChkEn-
>>>>>>> MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
>>>>>>> HeaderLog: 00000000 00000000 00000000 00000000
>>>>>>> Capabilities: [140 v1] Virtual Channel
>>>>>>> Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
>>>>>>> Arb: Fixed- WRR32- WRR64- WRR128-
>>>>>>> Ctrl: ArbSelect=Fixed
>>>>>>> Status: InProgress-
>>>>>>> VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
>>>>>>> Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
>>>>>>> Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01
>>>>>>> Status: NegoPending- InProgress-
>>>>>>> Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
>>>>>>> Capabilities: [170 v1] Latency Tolerance Reporting
>>>>>>> Max snoop latency: 71680ns
>>>>>>> Max no snoop latency: 71680ns
>>>>>>> Kernel driver in use: r8169
>>>>>>> Kernel modules: r8169
>>>>>>>
>>>>>>> Please let me know if you have any other ideas in terms of testing.
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>> Peter.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, 29 Jan 2019 at 05:28, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>>>>>
>>>>>>>> On 28.01.2019 12:13, Peter Ceiley wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I have been experiencing very poor network performance since Kernel
>>>>>>>>> 4.19 and I'm confident it's related to the r8169 driver.
>>>>>>>>>
>>>>>>>>> I have no issue with kernel versions 4.18 and prior. I am experiencing
>>>>>>>>> this issue in kernels 4.19 and 4.20 (currently running/testing with
>>>>>>>>> 4.20.4 & 4.19.18).
>>>>>>>>>
>>>>>>>>> If someone could guide me in the right direction, I'm happy to help
>>>>>>>>> troubleshoot this issue. Note that I have been keeping an eye on one
>>>>>>>>> issue related to loading of the PHY driver, however, my symptoms
>>>>>>>>> differ in that I still have a network connection. I have attempted to
>>>>>>>>> reload the driver on a running system, but this does not improve the
>>>>>>>>> situation.
>>>>>>>>>
>>>>>>>>> Using the proprietary r8168 driver returns my device to proper working order.
>>>>>>>>>
>>>>>>>>> lshw shows:
>>>>>>>>> description: Ethernet interface
>>>>>>>>> product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
>>>>>>>>> vendor: Realtek Semiconductor Co., Ltd.
>>>>>>>>> physical id: 0
>>>>>>>>> bus info: pci@0000:03:00.0
>>>>>>>>> logical name: enp3s0
>>>>>>>>> version: 0c
>>>>>>>>> serial:
>>>>>>>>> size: 1Gbit/s
>>>>>>>>> capacity: 1Gbit/s
>>>>>>>>> width: 64 bits
>>>>>>>>> clock: 33MHz
>>>>>>>>> capabilities: pm msi pciexpress msix vpd bus_master cap_list
>>>>>>>>> ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd
>>>>>>>>> 1000bt-fd autonegotiation
>>>>>>>>> configuration: autonegotiation=on broadcast=yes driver=r8169
>>>>>>>>> duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25
>>>>>>>>> latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
>>>>>>>>> resources: irq:19 ioport:d000(size=256)
>>>>>>>>> memory:f7b00000-f7b00fff memory:f2100000-f2103fff
>>>>>>>>>
>>>>>>>>> Kind Regards,
>>>>>>>>>
>>>>>>>>> Peter.
>>>>>>>>>
>>>>>>>> Hi Peter,
>>>>>>>>
>>>>>>>> the description "poor network performance" is quite vague, therefore:
>>>>>>>>
>>>>>>>> - Can you provide any measurements?
>>>>>>>> - iperf results before and after
>>>>>>>> - statistics about dropped packets (rx and/or tx)
>>>>>>>> - Do you use jumbo packets?
>>>>>>>>
>>>>>>>> Also help would be a "lspci -vv" output for the network card and
>>>>>>>> the dmesg output line with the chip XID.
>>>>>>>>
>>>>>>>> Heiner
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: r8169 Driver - Poor Network Performance Since Kernel 4.19
2019-02-05 18:50 ` Heiner Kallweit
@ 2019-02-05 18:53 ` Heiner Kallweit
2019-02-11 6:23 ` David Chang
2019-02-14 2:45 ` David Chang
2 siblings, 0 replies; 25+ messages in thread
From: Heiner Kallweit @ 2019-02-05 18:53 UTC (permalink / raw)
To: David Chang; +Cc: Realtek linux nic maintainers, netdev
By the way: I can't reproduce the issue on a RTL8168g.
So it doesn't seem to be an issue with generic code in the driver.
I would assume it's some kind of incompatibility between activated
chip settings (ASPM etc) and certain systems.
Heiner
On 05.02.2019 19:50, Heiner Kallweit wrote:
> Hi David,
>
> meanwhile there's the following bug report matching what reported.
> It's even the same chip version (RTL8168h).
> https://bugzilla.redhat.com/show_bug.cgi?id=1671958
>
> Symptom there is also a significant number of rx_missed packets.
> Could you try what I mentioned there last:
> Try building a kernel with the call to rtl_hw_aspm_clkreq_enable(tp, true) at the
> end of rtl_hw_start_8168h_1() being disabled.
>
> Heiner
>
>
> On 31.01.2019 03:32, David Chang wrote:
>> Hi,
>>
>> We had a similr case here.
>> - Realtek r8169 receive performance regression in kernel 4.19
>> https://bugzilla.suse.com/show_bug.cgi?id=1119649
>>
>> kernel: r8169 0000:01:00.0 eth0: RTL8168h/8111h, XID 54100880
>> The major symptom is there are many rx_missed count.
>>
>>
>> On Jan 30, 2019 at 20:15:45 +0100, Heiner Kallweit wrote:
>>> Hi Peter,
>>>
>>> recently I had somebody where pcie_aspm=off for whatever reason didn't
>>> do the trick, can you also check with pcie_aspm.policy=performance.
>>
>> We will give it a try later.
>>
>>> And please check with "ethtool -S <if>" whether the chip statistics
>>> show a significant number of errors.
>>>
>>> If this doesn't help you may have to bisect to find the offending commit.
>>
>> We had tried fallback driver to a few previous commits as following,
>> but with no luck.
>>
>> 9675931e6b65 r8169: re-enable MSI-X on RTL8168g (v4.19)
>> 098b01ad9837 r8169: don't include asm headers directly (v4.19-rc1)
>> a2965f12fde6 r8169: remove rtl8169_set_speed_xmii (v4.19-rc1)
>> 6fcf9b1d4d6c r8169: fix runtime suspend (v4.19-rc1)
>> e397286b8e89 r8169: remove TBI 1000BaseX support (v4.19-rc1)
>>
>> Thanks,
>> David Chang
>>
>>>
>>> Heiner
>>>
>>>
>>> On 30.01.2019 10:59, Peter Ceiley wrote:
>>>> Hi Heiner,
>>>>
>>>> I tried disabling the ASPM using the pcie_aspm=off kernel parameter
>>>> and this made no difference.
>>>>
>>>> I tried compiling the 4.18.16 r8169.c with the 4.19.18 source and
>>>> subsequently loaded the module in the running 4.19.18 kernel. I can
>>>> confirm that this immediately resolved the issue and access to the NFS
>>>> shares operated as expected.
>>>>
>>>> I presume this means it is an issue with the r8169 driver included in
>>>> 4.19 onwards?
>>>>
>>>> To answer your last questions:
>>>>
>>>> Base Board Information
>>>> Manufacturer: Alienware
>>>> Product Name: 0PGRP5
>>>> Version: A02
>>>>
>>>> ... and yes, the RTL8168 is the onboard network chip.
>>>>
>>>> Regards,
>>>>
>>>> Peter.
>>>>
>>>> On Tue, 29 Jan 2019 at 17:44, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>>
>>>>> Hi Peter,
>>>>>
>>>>> I think the vendor driver doesn't enable ASPM per default.
>>>>> So it's worth a try to disable ASPM in the BIOS or via sysfs.
>>>>> Few older systems seem to have issues with ASPM, what kind of
>>>>> system / mainboard are you using? The RTL8168 is the onboard
>>>>> network chip?
>>>>>
>>>>> Rgds, Heiner
>>>>>
>>>>>
>>>>> On 29.01.2019 07:20, Peter Ceiley wrote:
>>>>>> Hi Heiner,
>>>>>>
>>>>>> Thanks, I'll do some more testing. It might not be the driver - I
>>>>>> assumed it was due to the fact that using the r8168 driver 'resolves'
>>>>>> the issue. I'll see if I can test the r8169.c on top of 4.19 - this is
>>>>>> a good idea.
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Peter.
>>>>>>
>>>>>> On Tue, 29 Jan 2019 at 17:16, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>>>>
>>>>>>> Hi Peter,
>>>>>>>
>>>>>>> at a first glance it doesn't look like a typical driver issue.
>>>>>>> What you could do:
>>>>>>>
>>>>>>> - Test the r8169.c from 4.18 on top of 4.19.
>>>>>>>
>>>>>>> - Check whether disabling ASPM (/sys/module/pcie_aspm) has an effect.
>>>>>>>
>>>>>>> - Bisect between 4.18 and 4.19 to find the offending commit.
>>>>>>>
>>>>>>> Any specific reason why you think root cause is in the driver and not
>>>>>>> elsewhere in the network subsystem?
>>>>>>>
>>>>>>> Heiner
>>>>>>>
>>>>>>>
>>>>>>> On 28.01.2019 23:10, Peter Ceiley wrote:
>>>>>>>> Hi Heiner,
>>>>>>>>
>>>>>>>> Thanks for getting back to me.
>>>>>>>>
>>>>>>>> No, I don't use jumbo packets.
>>>>>>>>
>>>>>>>> Bandwidth is *generally* good, and iperf results to my NAS provide
>>>>>>>> over 900 Mbits/s in both circumstances. The issue seems to appear when
>>>>>>>> establishing a connection and is most notable, for example, on my
>>>>>>>> mounted NFS shares where it takes seconds (up to 10's of seconds on
>>>>>>>> larger directories) to list the contents of each directory. Once a
>>>>>>>> transfer begins on a file, I appear to get good bandwidth.
>>>>>>>>
>>>>>>>> I'm unsure of the best scientific data to provide you in order to
>>>>>>>> troubleshoot this issue. Running the following
>>>>>>>>
>>>>>>>> netstat -s |grep retransmitted
>>>>>>>>
>>>>>>>> shows a steady increase in retransmitted segments each time I list the
>>>>>>>> contents of a remote directory, for example, running 'ls' on a
>>>>>>>> directory containing 345 media files did the following using kernel
>>>>>>>> 4.19.18:
>>>>>>>>
>>>>>>>> increased retransmitted segments by 21 and the 'time' command showed
>>>>>>>> the following:
>>>>>>>> real 0m19.867s
>>>>>>>> user 0m0.012s
>>>>>>>> sys 0m0.036s
>>>>>>>>
>>>>>>>> The same command shows no retransmitted segments running kernel
>>>>>>>> 4.18.16 and 'time' showed:
>>>>>>>> real 0m0.300s
>>>>>>>> user 0m0.004s
>>>>>>>> sys 0m0.007s
>>>>>>>>
>>>>>>>> ifconfig does not show any RX/TX errors nor dropped packets in either case.
>>>>>>>>
>>>>>>>> dmesg XID:
>>>>>>>> [ 2.979984] r8169 0000:03:00.0 eth0: RTL8168g/8111g,
>>>>>>>> f8:b1:56:fe:67:e0, XID 4c000800, IRQ 32
>>>>>>>>
>>>>>>>> # lspci -vv
>>>>>>>> 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
>>>>>>>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
>>>>>>>> Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
>>>>>>>> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
>>>>>>>> ParErr- Stepping- SERR- FastB2B- DisINTx+
>>>>>>>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
>>>>>>>> <TAbort- <MAbort- >SERR- <PERR- INTx-
>>>>>>>> Latency: 0, Cache Line Size: 64 bytes
>>>>>>>> Interrupt: pin A routed to IRQ 19
>>>>>>>> Region 0: I/O ports at d000 [size=256]
>>>>>>>> Region 2: Memory at f7b00000 (64-bit, non-prefetchable) [size=4K]
>>>>>>>> Region 4: Memory at f2100000 (64-bit, prefetchable) [size=16K]
>>>>>>>> Capabilities: [40] Power Management version 3
>>>>>>>> Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
>>>>>>>> PME(D0+,D1+,D2+,D3hot+,D3cold+)
>>>>>>>> Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
>>>>>>>> Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
>>>>>>>> Address: 0000000000000000 Data: 0000
>>>>>>>> Capabilities: [70] Express (v2) Endpoint, MSI 01
>>>>>>>> DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s
>>>>>>>> <512ns, L1 <64us
>>>>>>>> ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
>>>>>>>> SlotPowerLimit 10.000W
>>>>>>>> DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
>>>>>>>> RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
>>>>>>>> MaxPayload 128 bytes, MaxReadReq 4096 bytes
>>>>>>>> DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
>>>>>>>> LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
>>>>>>>> Latency L0s unlimited, L1 <64us
>>>>>>>> ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
>>>>>>>> LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
>>>>>>>> ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
>>>>>>>> LnkSta: Speed 2.5GT/s (ok), Width x1 (ok)
>>>>>>>> TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
>>>>>>>> DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+,
>>>>>>>> OBFF Via message/WAKE#
>>>>>>>> AtomicOpsCap: 32bit- 64bit- 128bitCAS-
>>>>>>>> DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+,
>>>>>>>> OBFF Disabled
>>>>>>>> AtomicOpsCtl: ReqEn-
>>>>>>>> LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
>>>>>>>> Transmit Margin: Normal Operating Range,
>>>>>>>> EnterModifiedCompliance- ComplianceSOS-
>>>>>>>> Compliance De-emphasis: -6dB
>>>>>>>> LnkSta2: Current De-emphasis Level: -6dB,
>>>>>>>> EqualizationComplete-, EqualizationPhase1-
>>>>>>>> EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
>>>>>>>> Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
>>>>>>>> Vector table: BAR=4 offset=00000000
>>>>>>>> PBA: BAR=4 offset=00000800
>>>>>>>> Capabilities: [d0] Vital Product Data
>>>>>>>> pcilib: sysfs_read_vpd: read failed: Input/output error
>>>>>>>> Not readable
>>>>>>>> Capabilities: [100 v1] Advanced Error Reporting
>>>>>>>> UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
>>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>>>>>>> UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
>>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>>>>>>> UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
>>>>>>>> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>>>>>>>> CESta: RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ AdvNonFatalErr-
>>>>>>>> CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
>>>>>>>> AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn-
>>>>>>>> ECRCChkCap+ ECRCChkEn-
>>>>>>>> MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
>>>>>>>> HeaderLog: 00000000 00000000 00000000 00000000
>>>>>>>> Capabilities: [140 v1] Virtual Channel
>>>>>>>> Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
>>>>>>>> Arb: Fixed- WRR32- WRR64- WRR128-
>>>>>>>> Ctrl: ArbSelect=Fixed
>>>>>>>> Status: InProgress-
>>>>>>>> VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
>>>>>>>> Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
>>>>>>>> Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01
>>>>>>>> Status: NegoPending- InProgress-
>>>>>>>> Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
>>>>>>>> Capabilities: [170 v1] Latency Tolerance Reporting
>>>>>>>> Max snoop latency: 71680ns
>>>>>>>> Max no snoop latency: 71680ns
>>>>>>>> Kernel driver in use: r8169
>>>>>>>> Kernel modules: r8169
>>>>>>>>
>>>>>>>> Please let me know if you have any other ideas in terms of testing.
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>> Peter.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, 29 Jan 2019 at 05:28, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> On 28.01.2019 12:13, Peter Ceiley wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I have been experiencing very poor network performance since Kernel
>>>>>>>>>> 4.19 and I'm confident it's related to the r8169 driver.
>>>>>>>>>>
>>>>>>>>>> I have no issue with kernel versions 4.18 and prior. I am experiencing
>>>>>>>>>> this issue in kernels 4.19 and 4.20 (currently running/testing with
>>>>>>>>>> 4.20.4 & 4.19.18).
>>>>>>>>>>
>>>>>>>>>> If someone could guide me in the right direction, I'm happy to help
>>>>>>>>>> troubleshoot this issue. Note that I have been keeping an eye on one
>>>>>>>>>> issue related to loading of the PHY driver, however, my symptoms
>>>>>>>>>> differ in that I still have a network connection. I have attempted to
>>>>>>>>>> reload the driver on a running system, but this does not improve the
>>>>>>>>>> situation.
>>>>>>>>>>
>>>>>>>>>> Using the proprietary r8168 driver returns my device to proper working order.
>>>>>>>>>>
>>>>>>>>>> lshw shows:
>>>>>>>>>> description: Ethernet interface
>>>>>>>>>> product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
>>>>>>>>>> vendor: Realtek Semiconductor Co., Ltd.
>>>>>>>>>> physical id: 0
>>>>>>>>>> bus info: pci@0000:03:00.0
>>>>>>>>>> logical name: enp3s0
>>>>>>>>>> version: 0c
>>>>>>>>>> serial:
>>>>>>>>>> size: 1Gbit/s
>>>>>>>>>> capacity: 1Gbit/s
>>>>>>>>>> width: 64 bits
>>>>>>>>>> clock: 33MHz
>>>>>>>>>> capabilities: pm msi pciexpress msix vpd bus_master cap_list
>>>>>>>>>> ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd
>>>>>>>>>> 1000bt-fd autonegotiation
>>>>>>>>>> configuration: autonegotiation=on broadcast=yes driver=r8169
>>>>>>>>>> duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25
>>>>>>>>>> latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
>>>>>>>>>> resources: irq:19 ioport:d000(size=256)
>>>>>>>>>> memory:f7b00000-f7b00fff memory:f2100000-f2103fff
>>>>>>>>>>
>>>>>>>>>> Kind Regards,
>>>>>>>>>>
>>>>>>>>>> Peter.
>>>>>>>>>>
>>>>>>>>> Hi Peter,
>>>>>>>>>
>>>>>>>>> the description "poor network performance" is quite vague, therefore:
>>>>>>>>>
>>>>>>>>> - Can you provide any measurements?
>>>>>>>>> - iperf results before and after
>>>>>>>>> - statistics about dropped packets (rx and/or tx)
>>>>>>>>> - Do you use jumbo packets?
>>>>>>>>>
>>>>>>>>> Also help would be a "lspci -vv" output for the network card and
>>>>>>>>> the dmesg output line with the chip XID.
>>>>>>>>>
>>>>>>>>> Heiner
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: r8169 Driver - Poor Network Performance Since Kernel 4.19
2019-02-05 18:50 ` Heiner Kallweit
2019-02-05 18:53 ` Heiner Kallweit
@ 2019-02-11 6:23 ` David Chang
2019-02-14 2:45 ` David Chang
2 siblings, 0 replies; 25+ messages in thread
From: David Chang @ 2019-02-11 6:23 UTC (permalink / raw)
To: Heiner Kallweit; +Cc: Realtek linux nic maintainers, netdev, Martti Laaksonen
Hi Heiner,
Sorry for late!
On Feb 05, 2019 at 19:50:30 +0100, Heiner Kallweit wrote:
> Hi David,
>
> meanwhile there's the following bug report matching what reported.
> It's even the same chip version (RTL8168h).
> https://bugzilla.redhat.com/show_bug.cgi?id=1671958
>
> Symptom there is also a significant number of rx_missed packets.
> Could you try what I mentioned there last:
> Try building a kernel with the call to rtl_hw_aspm_clkreq_enable(tp, true) at the
> end of rtl_hw_start_8168h_1() being disabled.
Will do.
Thanks,
David Chang
> Heiner
>
>
> On 31.01.2019 03:32, David Chang wrote:
> > Hi,
> >
> > We had a similr case here.
> > - Realtek r8169 receive performance regression in kernel 4.19
> > https://bugzilla.suse.com/show_bug.cgi?id=1119649
> >
> > kernel: r8169 0000:01:00.0 eth0: RTL8168h/8111h, XID 54100880
> > The major symptom is there are many rx_missed count.
> >
> >
> > On Jan 30, 2019 at 20:15:45 +0100, Heiner Kallweit wrote:
> >> Hi Peter,
> >>
> >> recently I had somebody where pcie_aspm=off for whatever reason didn't
> >> do the trick, can you also check with pcie_aspm.policy=performance.
> >
> > We will give it a try later.
> >
> >> And please check with "ethtool -S <if>" whether the chip statistics
> >> show a significant number of errors.
> >>
> >> If this doesn't help you may have to bisect to find the offending commit.
> >
> > We had tried fallback driver to a few previous commits as following,
> > but with no luck.
> >
> > 9675931e6b65 r8169: re-enable MSI-X on RTL8168g (v4.19)
> > 098b01ad9837 r8169: don't include asm headers directly (v4.19-rc1)
> > a2965f12fde6 r8169: remove rtl8169_set_speed_xmii (v4.19-rc1)
> > 6fcf9b1d4d6c r8169: fix runtime suspend (v4.19-rc1)
> > e397286b8e89 r8169: remove TBI 1000BaseX support (v4.19-rc1)
> >
> > Thanks,
> > David Chang
> >
> >>
> >> Heiner
> >>
> >>
> >> On 30.01.2019 10:59, Peter Ceiley wrote:
> >>> Hi Heiner,
> >>>
> >>> I tried disabling the ASPM using the pcie_aspm=off kernel parameter
> >>> and this made no difference.
> >>>
> >>> I tried compiling the 4.18.16 r8169.c with the 4.19.18 source and
> >>> subsequently loaded the module in the running 4.19.18 kernel. I can
> >>> confirm that this immediately resolved the issue and access to the NFS
> >>> shares operated as expected.
> >>>
> >>> I presume this means it is an issue with the r8169 driver included in
> >>> 4.19 onwards?
> >>>
> >>> To answer your last questions:
> >>>
> >>> Base Board Information
> >>> Manufacturer: Alienware
> >>> Product Name: 0PGRP5
> >>> Version: A02
> >>>
> >>> ... and yes, the RTL8168 is the onboard network chip.
> >>>
> >>> Regards,
> >>>
> >>> Peter.
> >>>
> >>> On Tue, 29 Jan 2019 at 17:44, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>>>
> >>>> Hi Peter,
> >>>>
> >>>> I think the vendor driver doesn't enable ASPM per default.
> >>>> So it's worth a try to disable ASPM in the BIOS or via sysfs.
> >>>> Few older systems seem to have issues with ASPM, what kind of
> >>>> system / mainboard are you using? The RTL8168 is the onboard
> >>>> network chip?
> >>>>
> >>>> Rgds, Heiner
> >>>>
> >>>>
> >>>> On 29.01.2019 07:20, Peter Ceiley wrote:
> >>>>> Hi Heiner,
> >>>>>
> >>>>> Thanks, I'll do some more testing. It might not be the driver - I
> >>>>> assumed it was due to the fact that using the r8168 driver 'resolves'
> >>>>> the issue. I'll see if I can test the r8169.c on top of 4.19 - this is
> >>>>> a good idea.
> >>>>>
> >>>>> Cheers,
> >>>>>
> >>>>> Peter.
> >>>>>
> >>>>> On Tue, 29 Jan 2019 at 17:16, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>>>>>
> >>>>>> Hi Peter,
> >>>>>>
> >>>>>> at a first glance it doesn't look like a typical driver issue.
> >>>>>> What you could do:
> >>>>>>
> >>>>>> - Test the r8169.c from 4.18 on top of 4.19.
> >>>>>>
> >>>>>> - Check whether disabling ASPM (/sys/module/pcie_aspm) has an effect.
> >>>>>>
> >>>>>> - Bisect between 4.18 and 4.19 to find the offending commit.
> >>>>>>
> >>>>>> Any specific reason why you think root cause is in the driver and not
> >>>>>> elsewhere in the network subsystem?
> >>>>>>
> >>>>>> Heiner
> >>>>>>
> >>>>>>
> >>>>>> On 28.01.2019 23:10, Peter Ceiley wrote:
> >>>>>>> Hi Heiner,
> >>>>>>>
> >>>>>>> Thanks for getting back to me.
> >>>>>>>
> >>>>>>> No, I don't use jumbo packets.
> >>>>>>>
> >>>>>>> Bandwidth is *generally* good, and iperf results to my NAS provide
> >>>>>>> over 900 Mbits/s in both circumstances. The issue seems to appear when
> >>>>>>> establishing a connection and is most notable, for example, on my
> >>>>>>> mounted NFS shares where it takes seconds (up to 10's of seconds on
> >>>>>>> larger directories) to list the contents of each directory. Once a
> >>>>>>> transfer begins on a file, I appear to get good bandwidth.
> >>>>>>>
> >>>>>>> I'm unsure of the best scientific data to provide you in order to
> >>>>>>> troubleshoot this issue. Running the following
> >>>>>>>
> >>>>>>> netstat -s |grep retransmitted
> >>>>>>>
> >>>>>>> shows a steady increase in retransmitted segments each time I list the
> >>>>>>> contents of a remote directory, for example, running 'ls' on a
> >>>>>>> directory containing 345 media files did the following using kernel
> >>>>>>> 4.19.18:
> >>>>>>>
> >>>>>>> increased retransmitted segments by 21 and the 'time' command showed
> >>>>>>> the following:
> >>>>>>> real 0m19.867s
> >>>>>>> user 0m0.012s
> >>>>>>> sys 0m0.036s
> >>>>>>>
> >>>>>>> The same command shows no retransmitted segments running kernel
> >>>>>>> 4.18.16 and 'time' showed:
> >>>>>>> real 0m0.300s
> >>>>>>> user 0m0.004s
> >>>>>>> sys 0m0.007s
> >>>>>>>
> >>>>>>> ifconfig does not show any RX/TX errors nor dropped packets in either case.
> >>>>>>>
> >>>>>>> dmesg XID:
> >>>>>>> [ 2.979984] r8169 0000:03:00.0 eth0: RTL8168g/8111g,
> >>>>>>> f8:b1:56:fe:67:e0, XID 4c000800, IRQ 32
> >>>>>>>
> >>>>>>> # lspci -vv
> >>>>>>> 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
> >>>>>>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
> >>>>>>> Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
> >>>>>>> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> >>>>>>> ParErr- Stepping- SERR- FastB2B- DisINTx+
> >>>>>>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> >>>>>>> <TAbort- <MAbort- >SERR- <PERR- INTx-
> >>>>>>> Latency: 0, Cache Line Size: 64 bytes
> >>>>>>> Interrupt: pin A routed to IRQ 19
> >>>>>>> Region 0: I/O ports at d000 [size=256]
> >>>>>>> Region 2: Memory at f7b00000 (64-bit, non-prefetchable) [size=4K]
> >>>>>>> Region 4: Memory at f2100000 (64-bit, prefetchable) [size=16K]
> >>>>>>> Capabilities: [40] Power Management version 3
> >>>>>>> Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
> >>>>>>> PME(D0+,D1+,D2+,D3hot+,D3cold+)
> >>>>>>> Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> >>>>>>> Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
> >>>>>>> Address: 0000000000000000 Data: 0000
> >>>>>>> Capabilities: [70] Express (v2) Endpoint, MSI 01
> >>>>>>> DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s
> >>>>>>> <512ns, L1 <64us
> >>>>>>> ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
> >>>>>>> SlotPowerLimit 10.000W
> >>>>>>> DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
> >>>>>>> RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
> >>>>>>> MaxPayload 128 bytes, MaxReadReq 4096 bytes
> >>>>>>> DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
> >>>>>>> LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
> >>>>>>> Latency L0s unlimited, L1 <64us
> >>>>>>> ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
> >>>>>>> LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
> >>>>>>> ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
> >>>>>>> LnkSta: Speed 2.5GT/s (ok), Width x1 (ok)
> >>>>>>> TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> >>>>>>> DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+,
> >>>>>>> OBFF Via message/WAKE#
> >>>>>>> AtomicOpsCap: 32bit- 64bit- 128bitCAS-
> >>>>>>> DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+,
> >>>>>>> OBFF Disabled
> >>>>>>> AtomicOpsCtl: ReqEn-
> >>>>>>> LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
> >>>>>>> Transmit Margin: Normal Operating Range,
> >>>>>>> EnterModifiedCompliance- ComplianceSOS-
> >>>>>>> Compliance De-emphasis: -6dB
> >>>>>>> LnkSta2: Current De-emphasis Level: -6dB,
> >>>>>>> EqualizationComplete-, EqualizationPhase1-
> >>>>>>> EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
> >>>>>>> Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
> >>>>>>> Vector table: BAR=4 offset=00000000
> >>>>>>> PBA: BAR=4 offset=00000800
> >>>>>>> Capabilities: [d0] Vital Product Data
> >>>>>>> pcilib: sysfs_read_vpd: read failed: Input/output error
> >>>>>>> Not readable
> >>>>>>> Capabilities: [100 v1] Advanced Error Reporting
> >>>>>>> UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> >>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> >>>>>>> UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> >>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> >>>>>>> UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
> >>>>>>> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> >>>>>>> CESta: RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ AdvNonFatalErr-
> >>>>>>> CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
> >>>>>>> AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn-
> >>>>>>> ECRCChkCap+ ECRCChkEn-
> >>>>>>> MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
> >>>>>>> HeaderLog: 00000000 00000000 00000000 00000000
> >>>>>>> Capabilities: [140 v1] Virtual Channel
> >>>>>>> Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
> >>>>>>> Arb: Fixed- WRR32- WRR64- WRR128-
> >>>>>>> Ctrl: ArbSelect=Fixed
> >>>>>>> Status: InProgress-
> >>>>>>> VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
> >>>>>>> Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
> >>>>>>> Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01
> >>>>>>> Status: NegoPending- InProgress-
> >>>>>>> Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
> >>>>>>> Capabilities: [170 v1] Latency Tolerance Reporting
> >>>>>>> Max snoop latency: 71680ns
> >>>>>>> Max no snoop latency: 71680ns
> >>>>>>> Kernel driver in use: r8169
> >>>>>>> Kernel modules: r8169
> >>>>>>>
> >>>>>>> Please let me know if you have any other ideas in terms of testing.
> >>>>>>>
> >>>>>>> Thanks!
> >>>>>>>
> >>>>>>> Peter.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Tue, 29 Jan 2019 at 05:28, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>>>>>>>
> >>>>>>>> On 28.01.2019 12:13, Peter Ceiley wrote:
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> I have been experiencing very poor network performance since Kernel
> >>>>>>>>> 4.19 and I'm confident it's related to the r8169 driver.
> >>>>>>>>>
> >>>>>>>>> I have no issue with kernel versions 4.18 and prior. I am experiencing
> >>>>>>>>> this issue in kernels 4.19 and 4.20 (currently running/testing with
> >>>>>>>>> 4.20.4 & 4.19.18).
> >>>>>>>>>
> >>>>>>>>> If someone could guide me in the right direction, I'm happy to help
> >>>>>>>>> troubleshoot this issue. Note that I have been keeping an eye on one
> >>>>>>>>> issue related to loading of the PHY driver, however, my symptoms
> >>>>>>>>> differ in that I still have a network connection. I have attempted to
> >>>>>>>>> reload the driver on a running system, but this does not improve the
> >>>>>>>>> situation.
> >>>>>>>>>
> >>>>>>>>> Using the proprietary r8168 driver returns my device to proper working order.
> >>>>>>>>>
> >>>>>>>>> lshw shows:
> >>>>>>>>> description: Ethernet interface
> >>>>>>>>> product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
> >>>>>>>>> vendor: Realtek Semiconductor Co., Ltd.
> >>>>>>>>> physical id: 0
> >>>>>>>>> bus info: pci@0000:03:00.0
> >>>>>>>>> logical name: enp3s0
> >>>>>>>>> version: 0c
> >>>>>>>>> serial:
> >>>>>>>>> size: 1Gbit/s
> >>>>>>>>> capacity: 1Gbit/s
> >>>>>>>>> width: 64 bits
> >>>>>>>>> clock: 33MHz
> >>>>>>>>> capabilities: pm msi pciexpress msix vpd bus_master cap_list
> >>>>>>>>> ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd
> >>>>>>>>> 1000bt-fd autonegotiation
> >>>>>>>>> configuration: autonegotiation=on broadcast=yes driver=r8169
> >>>>>>>>> duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25
> >>>>>>>>> latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
> >>>>>>>>> resources: irq:19 ioport:d000(size=256)
> >>>>>>>>> memory:f7b00000-f7b00fff memory:f2100000-f2103fff
> >>>>>>>>>
> >>>>>>>>> Kind Regards,
> >>>>>>>>>
> >>>>>>>>> Peter.
> >>>>>>>>>
> >>>>>>>> Hi Peter,
> >>>>>>>>
> >>>>>>>> the description "poor network performance" is quite vague, therefore:
> >>>>>>>>
> >>>>>>>> - Can you provide any measurements?
> >>>>>>>> - iperf results before and after
> >>>>>>>> - statistics about dropped packets (rx and/or tx)
> >>>>>>>> - Do you use jumbo packets?
> >>>>>>>>
> >>>>>>>> Also help would be a "lspci -vv" output for the network card and
> >>>>>>>> the dmesg output line with the chip XID.
> >>>>>>>>
> >>>>>>>> Heiner
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
>
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: r8169 Driver - Poor Network Performance Since Kernel 4.19
2019-02-05 18:50 ` Heiner Kallweit
2019-02-05 18:53 ` Heiner Kallweit
2019-02-11 6:23 ` David Chang
@ 2019-02-14 2:45 ` David Chang
2019-02-14 6:17 ` Heiner Kallweit
2 siblings, 1 reply; 25+ messages in thread
From: David Chang @ 2019-02-14 2:45 UTC (permalink / raw)
To: Heiner Kallweit; +Cc: Realtek linux nic maintainers, netdev, Martti Laaksonen
Hi Heiner,
On Feb 05, 2019 at 19:50:30 +0100, Heiner Kallweit wrote:
> Hi David,
>
> meanwhile there's the following bug report matching what reported.
> It's even the same chip version (RTL8168h).
> https://bugzilla.redhat.com/show_bug.cgi?id=1671958
>
> Symptom there is also a significant number of rx_missed packets.
> Could you try what I mentioned there last:
> Try building a kernel with the call to rtl_hw_aspm_clkreq_enable(tp, true) at the
> end of rtl_hw_start_8168h_1() being disabled.
After disabled the aspm function that you mentioned, we finally got the
positive testing result. And the rx_missed error was gone. If without
the patch, the receive side get back to bad performance.
kernel: r8169: loading out-of-tree module taints kernel.
kernel: r8169: module verification failed: signature and/or required key missing - tainting kernel
kernel: libphy: r8169: probed
kernel: r8169 0000:01:00.0 eth0: RTL8168h/8111h, ec:8e:b5:5a:2c:f5, XID 54100880, IRQ 128
kernel: r8169 0000:01:00.0 eth0: jumbo features [frames: 9200 bytes, tx checksumming: ko]
kernel: r8169 0000:01:00.0 enp1s0: renamed from eth0
kernel: Generic PHY r8169-100:00: attached PHY driver [Generic PHY] (mii_bus:phy_addr=r8169-100:00, irq=IGNORE)
kernel: r8169 0000:01:00.0 enp1s0: Link is Up - 1Gbps/Full - flow control off
NIC statistics:
tx_packets: 1653804
rx_packets: 1555966
tx_errors: 0
rx_errors: 0
rx_missed: 0
align_errors: 0
tx_single_collisions: 0
tx_multi_collisions: 0
unicast: 1555884
broadcast: 78
multicast: 4
tx_aborted: 0
tx_underrun: 0
iperf receive:
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
Accepted connection from 10.x.x.x, port 55516
[ 5] local 10.x.x.x port 5201 connected to 10.x.x.x port 58172
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 108 MBytes 906 Mbits/sec
[ 5] 1.00-2.00 sec 112 MBytes 941 Mbits/sec
[ 5] 2.00-3.00 sec 112 MBytes 940 Mbits/sec
[ 5] 3.00-4.00 sec 112 MBytes 941 Mbits/sec
[ 5] 4.00-5.00 sec 112 MBytes 941 Mbits/sec
[ 5] 5.00-6.00 sec 112 MBytes 942 Mbits/sec
[ 5] 6.00-7.00 sec 112 MBytes 939 Mbits/sec
[ 5] 7.00-8.00 sec 112 MBytes 941 Mbits/sec
[ 5] 8.00-9.00 sec 112 MBytes 938 Mbits/sec
[ 5] 9.00-10.00 sec 112 MBytes 941 Mbits/sec
[ 5] 10.00-11.00 sec 112 MBytes 941 Mbits/sec
[...]
[ 5] 50.00-51.00 sec 112 MBytes 941 Mbits/sec
[ 5] 51.00-52.00 sec 112 MBytes 941 Mbits/sec
[ 5] 52.00-53.00 sec 112 MBytes 942 Mbits/sec
[ 5] 53.00-54.00 sec 112 MBytes 941 Mbits/sec
[ 5] 54.00-55.00 sec 111 MBytes 934 Mbits/sec
[ 5] 55.00-56.00 sec 112 MBytes 942 Mbits/sec
[ 5] 56.00-57.00 sec 112 MBytes 937 Mbits/sec
[ 5] 57.00-58.00 sec 112 MBytes 941 Mbits/sec
[ 5] 58.00-59.00 sec 111 MBytes 932 Mbits/sec
[ 5] 59.00-60.00 sec 112 MBytes 942 Mbits/sec
[ 5] 60.00-60.04 sec 4.06 MBytes 939 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate
[ 5] 0.00-60.04 sec 6.57 GBytes 940 Mbits/sec receiver
regards,
David
>
> Heiner
>
>
> On 31.01.2019 03:32, David Chang wrote:
> > Hi,
> >
> > We had a similr case here.
> > - Realtek r8169 receive performance regression in kernel 4.19
> > https://bugzilla.suse.com/show_bug.cgi?id=1119649
> >
> > kernel: r8169 0000:01:00.0 eth0: RTL8168h/8111h, XID 54100880
> > The major symptom is there are many rx_missed count.
> >
> >
> > On Jan 30, 2019 at 20:15:45 +0100, Heiner Kallweit wrote:
> >> Hi Peter,
> >>
> >> recently I had somebody where pcie_aspm=off for whatever reason didn't
> >> do the trick, can you also check with pcie_aspm.policy=performance.
> >
> > We will give it a try later.
> >
> >> And please check with "ethtool -S <if>" whether the chip statistics
> >> show a significant number of errors.
> >>
> >> If this doesn't help you may have to bisect to find the offending commit.
> >
> > We had tried fallback driver to a few previous commits as following,
> > but with no luck.
> >
> > 9675931e6b65 r8169: re-enable MSI-X on RTL8168g (v4.19)
> > 098b01ad9837 r8169: don't include asm headers directly (v4.19-rc1)
> > a2965f12fde6 r8169: remove rtl8169_set_speed_xmii (v4.19-rc1)
> > 6fcf9b1d4d6c r8169: fix runtime suspend (v4.19-rc1)
> > e397286b8e89 r8169: remove TBI 1000BaseX support (v4.19-rc1)
> >
> > Thanks,
> > David Chang
> >
> >>
> >> Heiner
> >>
> >>
> >> On 30.01.2019 10:59, Peter Ceiley wrote:
> >>> Hi Heiner,
> >>>
> >>> I tried disabling the ASPM using the pcie_aspm=off kernel parameter
> >>> and this made no difference.
> >>>
> >>> I tried compiling the 4.18.16 r8169.c with the 4.19.18 source and
> >>> subsequently loaded the module in the running 4.19.18 kernel. I can
> >>> confirm that this immediately resolved the issue and access to the NFS
> >>> shares operated as expected.
> >>>
> >>> I presume this means it is an issue with the r8169 driver included in
> >>> 4.19 onwards?
> >>>
> >>> To answer your last questions:
> >>>
> >>> Base Board Information
> >>> Manufacturer: Alienware
> >>> Product Name: 0PGRP5
> >>> Version: A02
> >>>
> >>> ... and yes, the RTL8168 is the onboard network chip.
> >>>
> >>> Regards,
> >>>
> >>> Peter.
> >>>
> >>> On Tue, 29 Jan 2019 at 17:44, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>>>
> >>>> Hi Peter,
> >>>>
> >>>> I think the vendor driver doesn't enable ASPM per default.
> >>>> So it's worth a try to disable ASPM in the BIOS or via sysfs.
> >>>> Few older systems seem to have issues with ASPM, what kind of
> >>>> system / mainboard are you using? The RTL8168 is the onboard
> >>>> network chip?
> >>>>
> >>>> Rgds, Heiner
> >>>>
> >>>>
> >>>> On 29.01.2019 07:20, Peter Ceiley wrote:
> >>>>> Hi Heiner,
> >>>>>
> >>>>> Thanks, I'll do some more testing. It might not be the driver - I
> >>>>> assumed it was due to the fact that using the r8168 driver 'resolves'
> >>>>> the issue. I'll see if I can test the r8169.c on top of 4.19 - this is
> >>>>> a good idea.
> >>>>>
> >>>>> Cheers,
> >>>>>
> >>>>> Peter.
> >>>>>
> >>>>> On Tue, 29 Jan 2019 at 17:16, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>>>>>
> >>>>>> Hi Peter,
> >>>>>>
> >>>>>> at a first glance it doesn't look like a typical driver issue.
> >>>>>> What you could do:
> >>>>>>
> >>>>>> - Test the r8169.c from 4.18 on top of 4.19.
> >>>>>>
> >>>>>> - Check whether disabling ASPM (/sys/module/pcie_aspm) has an effect.
> >>>>>>
> >>>>>> - Bisect between 4.18 and 4.19 to find the offending commit.
> >>>>>>
> >>>>>> Any specific reason why you think root cause is in the driver and not
> >>>>>> elsewhere in the network subsystem?
> >>>>>>
> >>>>>> Heiner
> >>>>>>
> >>>>>>
> >>>>>> On 28.01.2019 23:10, Peter Ceiley wrote:
> >>>>>>> Hi Heiner,
> >>>>>>>
> >>>>>>> Thanks for getting back to me.
> >>>>>>>
> >>>>>>> No, I don't use jumbo packets.
> >>>>>>>
> >>>>>>> Bandwidth is *generally* good, and iperf results to my NAS provide
> >>>>>>> over 900 Mbits/s in both circumstances. The issue seems to appear when
> >>>>>>> establishing a connection and is most notable, for example, on my
> >>>>>>> mounted NFS shares where it takes seconds (up to 10's of seconds on
> >>>>>>> larger directories) to list the contents of each directory. Once a
> >>>>>>> transfer begins on a file, I appear to get good bandwidth.
> >>>>>>>
> >>>>>>> I'm unsure of the best scientific data to provide you in order to
> >>>>>>> troubleshoot this issue. Running the following
> >>>>>>>
> >>>>>>> netstat -s |grep retransmitted
> >>>>>>>
> >>>>>>> shows a steady increase in retransmitted segments each time I list the
> >>>>>>> contents of a remote directory, for example, running 'ls' on a
> >>>>>>> directory containing 345 media files did the following using kernel
> >>>>>>> 4.19.18:
> >>>>>>>
> >>>>>>> increased retransmitted segments by 21 and the 'time' command showed
> >>>>>>> the following:
> >>>>>>> real 0m19.867s
> >>>>>>> user 0m0.012s
> >>>>>>> sys 0m0.036s
> >>>>>>>
> >>>>>>> The same command shows no retransmitted segments running kernel
> >>>>>>> 4.18.16 and 'time' showed:
> >>>>>>> real 0m0.300s
> >>>>>>> user 0m0.004s
> >>>>>>> sys 0m0.007s
> >>>>>>>
> >>>>>>> ifconfig does not show any RX/TX errors nor dropped packets in either case.
> >>>>>>>
> >>>>>>> dmesg XID:
> >>>>>>> [ 2.979984] r8169 0000:03:00.0 eth0: RTL8168g/8111g,
> >>>>>>> f8:b1:56:fe:67:e0, XID 4c000800, IRQ 32
> >>>>>>>
> >>>>>>> # lspci -vv
> >>>>>>> 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
> >>>>>>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
> >>>>>>> Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
> >>>>>>> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> >>>>>>> ParErr- Stepping- SERR- FastB2B- DisINTx+
> >>>>>>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> >>>>>>> <TAbort- <MAbort- >SERR- <PERR- INTx-
> >>>>>>> Latency: 0, Cache Line Size: 64 bytes
> >>>>>>> Interrupt: pin A routed to IRQ 19
> >>>>>>> Region 0: I/O ports at d000 [size=256]
> >>>>>>> Region 2: Memory at f7b00000 (64-bit, non-prefetchable) [size=4K]
> >>>>>>> Region 4: Memory at f2100000 (64-bit, prefetchable) [size=16K]
> >>>>>>> Capabilities: [40] Power Management version 3
> >>>>>>> Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
> >>>>>>> PME(D0+,D1+,D2+,D3hot+,D3cold+)
> >>>>>>> Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> >>>>>>> Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
> >>>>>>> Address: 0000000000000000 Data: 0000
> >>>>>>> Capabilities: [70] Express (v2) Endpoint, MSI 01
> >>>>>>> DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s
> >>>>>>> <512ns, L1 <64us
> >>>>>>> ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
> >>>>>>> SlotPowerLimit 10.000W
> >>>>>>> DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
> >>>>>>> RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
> >>>>>>> MaxPayload 128 bytes, MaxReadReq 4096 bytes
> >>>>>>> DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
> >>>>>>> LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
> >>>>>>> Latency L0s unlimited, L1 <64us
> >>>>>>> ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
> >>>>>>> LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
> >>>>>>> ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
> >>>>>>> LnkSta: Speed 2.5GT/s (ok), Width x1 (ok)
> >>>>>>> TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> >>>>>>> DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+,
> >>>>>>> OBFF Via message/WAKE#
> >>>>>>> AtomicOpsCap: 32bit- 64bit- 128bitCAS-
> >>>>>>> DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+,
> >>>>>>> OBFF Disabled
> >>>>>>> AtomicOpsCtl: ReqEn-
> >>>>>>> LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
> >>>>>>> Transmit Margin: Normal Operating Range,
> >>>>>>> EnterModifiedCompliance- ComplianceSOS-
> >>>>>>> Compliance De-emphasis: -6dB
> >>>>>>> LnkSta2: Current De-emphasis Level: -6dB,
> >>>>>>> EqualizationComplete-, EqualizationPhase1-
> >>>>>>> EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
> >>>>>>> Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
> >>>>>>> Vector table: BAR=4 offset=00000000
> >>>>>>> PBA: BAR=4 offset=00000800
> >>>>>>> Capabilities: [d0] Vital Product Data
> >>>>>>> pcilib: sysfs_read_vpd: read failed: Input/output error
> >>>>>>> Not readable
> >>>>>>> Capabilities: [100 v1] Advanced Error Reporting
> >>>>>>> UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> >>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> >>>>>>> UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> >>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> >>>>>>> UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
> >>>>>>> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> >>>>>>> CESta: RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ AdvNonFatalErr-
> >>>>>>> CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
> >>>>>>> AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn-
> >>>>>>> ECRCChkCap+ ECRCChkEn-
> >>>>>>> MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
> >>>>>>> HeaderLog: 00000000 00000000 00000000 00000000
> >>>>>>> Capabilities: [140 v1] Virtual Channel
> >>>>>>> Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
> >>>>>>> Arb: Fixed- WRR32- WRR64- WRR128-
> >>>>>>> Ctrl: ArbSelect=Fixed
> >>>>>>> Status: InProgress-
> >>>>>>> VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
> >>>>>>> Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
> >>>>>>> Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01
> >>>>>>> Status: NegoPending- InProgress-
> >>>>>>> Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
> >>>>>>> Capabilities: [170 v1] Latency Tolerance Reporting
> >>>>>>> Max snoop latency: 71680ns
> >>>>>>> Max no snoop latency: 71680ns
> >>>>>>> Kernel driver in use: r8169
> >>>>>>> Kernel modules: r8169
> >>>>>>>
> >>>>>>> Please let me know if you have any other ideas in terms of testing.
> >>>>>>>
> >>>>>>> Thanks!
> >>>>>>>
> >>>>>>> Peter.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Tue, 29 Jan 2019 at 05:28, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>>>>>>>
> >>>>>>>> On 28.01.2019 12:13, Peter Ceiley wrote:
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> I have been experiencing very poor network performance since Kernel
> >>>>>>>>> 4.19 and I'm confident it's related to the r8169 driver.
> >>>>>>>>>
> >>>>>>>>> I have no issue with kernel versions 4.18 and prior. I am experiencing
> >>>>>>>>> this issue in kernels 4.19 and 4.20 (currently running/testing with
> >>>>>>>>> 4.20.4 & 4.19.18).
> >>>>>>>>>
> >>>>>>>>> If someone could guide me in the right direction, I'm happy to help
> >>>>>>>>> troubleshoot this issue. Note that I have been keeping an eye on one
> >>>>>>>>> issue related to loading of the PHY driver, however, my symptoms
> >>>>>>>>> differ in that I still have a network connection. I have attempted to
> >>>>>>>>> reload the driver on a running system, but this does not improve the
> >>>>>>>>> situation.
> >>>>>>>>>
> >>>>>>>>> Using the proprietary r8168 driver returns my device to proper working order.
> >>>>>>>>>
> >>>>>>>>> lshw shows:
> >>>>>>>>> description: Ethernet interface
> >>>>>>>>> product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
> >>>>>>>>> vendor: Realtek Semiconductor Co., Ltd.
> >>>>>>>>> physical id: 0
> >>>>>>>>> bus info: pci@0000:03:00.0
> >>>>>>>>> logical name: enp3s0
> >>>>>>>>> version: 0c
> >>>>>>>>> serial:
> >>>>>>>>> size: 1Gbit/s
> >>>>>>>>> capacity: 1Gbit/s
> >>>>>>>>> width: 64 bits
> >>>>>>>>> clock: 33MHz
> >>>>>>>>> capabilities: pm msi pciexpress msix vpd bus_master cap_list
> >>>>>>>>> ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd
> >>>>>>>>> 1000bt-fd autonegotiation
> >>>>>>>>> configuration: autonegotiation=on broadcast=yes driver=r8169
> >>>>>>>>> duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25
> >>>>>>>>> latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
> >>>>>>>>> resources: irq:19 ioport:d000(size=256)
> >>>>>>>>> memory:f7b00000-f7b00fff memory:f2100000-f2103fff
> >>>>>>>>>
> >>>>>>>>> Kind Regards,
> >>>>>>>>>
> >>>>>>>>> Peter.
> >>>>>>>>>
> >>>>>>>> Hi Peter,
> >>>>>>>>
> >>>>>>>> the description "poor network performance" is quite vague, therefore:
> >>>>>>>>
> >>>>>>>> - Can you provide any measurements?
> >>>>>>>> - iperf results before and after
> >>>>>>>> - statistics about dropped packets (rx and/or tx)
> >>>>>>>> - Do you use jumbo packets?
> >>>>>>>>
> >>>>>>>> Also help would be a "lspci -vv" output for the network card and
> >>>>>>>> the dmesg output line with the chip XID.
> >>>>>>>>
> >>>>>>>> Heiner
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
>
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: r8169 Driver - Poor Network Performance Since Kernel 4.19
2019-02-14 2:45 ` David Chang
@ 2019-02-14 6:17 ` Heiner Kallweit
2019-02-15 2:51 ` David Chang
0 siblings, 1 reply; 25+ messages in thread
From: Heiner Kallweit @ 2019-02-14 6:17 UTC (permalink / raw)
To: David Chang; +Cc: Realtek linux nic maintainers, netdev, Martti Laaksonen
Hi David,
On 14.02.2019 03:45, David Chang wrote:
> Hi Heiner,
>
> On Feb 05, 2019 at 19:50:30 +0100, Heiner Kallweit wrote:
>> Hi David,
>>
>> meanwhile there's the following bug report matching what reported.
>> It's even the same chip version (RTL8168h).
>> https://bugzilla.redhat.com/show_bug.cgi?id=1671958
>>
>> Symptom there is also a significant number of rx_missed packets.
>> Could you try what I mentioned there last:
>> Try building a kernel with the call to rtl_hw_aspm_clkreq_enable(tp, true) at the
>> end of rtl_hw_start_8168h_1() being disabled.
>
> After disabled the aspm function that you mentioned, we finally got the
> positive testing result. And the rx_missed error was gone. If without
> the patch, the receive side get back to bad performance.
>
Good to know, thanks. I also checked with Realtek, they confirmed that their Windows
driver uses some heuristics to disable ASPM under high load. So it seems like there
is some hw issue. Open so far is whether this affects certain chip versions only.
Let's see whether they can provide more information.
Disabling ASPM in general would hurt notebook users because based on some past
measurements we know ASPM can significantly save energy.
> kernel: r8169: loading out-of-tree module taints kernel.
> kernel: r8169: module verification failed: signature and/or required key missing - tainting kernel
> kernel: libphy: r8169: probed
> kernel: r8169 0000:01:00.0 eth0: RTL8168h/8111h, ec:8e:b5:5a:2c:f5, XID 54100880, IRQ 128
> kernel: r8169 0000:01:00.0 eth0: jumbo features [frames: 9200 bytes, tx checksumming: ko]
> kernel: r8169 0000:01:00.0 enp1s0: renamed from eth0
> kernel: Generic PHY r8169-100:00: attached PHY driver [Generic PHY] (mii_bus:phy_addr=r8169-100:00, irq=IGNORE)
> kernel: r8169 0000:01:00.0 enp1s0: Link is Up - 1Gbps/Full - flow control off
>
> NIC statistics:
> tx_packets: 1653804
> rx_packets: 1555966
> tx_errors: 0
> rx_errors: 0
> rx_missed: 0
> align_errors: 0
> tx_single_collisions: 0
> tx_multi_collisions: 0
> unicast: 1555884
> broadcast: 78
> multicast: 4
> tx_aborted: 0
> tx_underrun: 0
>
> iperf receive:
> -----------------------------------------------------------
> Server listening on 5201
> -----------------------------------------------------------
> Accepted connection from 10.x.x.x, port 55516
> [ 5] local 10.x.x.x port 5201 connected to 10.x.x.x port 58172
> [ ID] Interval Transfer Bitrate
> [ 5] 0.00-1.00 sec 108 MBytes 906 Mbits/sec
> [ 5] 1.00-2.00 sec 112 MBytes 941 Mbits/sec
> [ 5] 2.00-3.00 sec 112 MBytes 940 Mbits/sec
> [ 5] 3.00-4.00 sec 112 MBytes 941 Mbits/sec
> [ 5] 4.00-5.00 sec 112 MBytes 941 Mbits/sec
> [ 5] 5.00-6.00 sec 112 MBytes 942 Mbits/sec
> [ 5] 6.00-7.00 sec 112 MBytes 939 Mbits/sec
> [ 5] 7.00-8.00 sec 112 MBytes 941 Mbits/sec
> [ 5] 8.00-9.00 sec 112 MBytes 938 Mbits/sec
> [ 5] 9.00-10.00 sec 112 MBytes 941 Mbits/sec
> [ 5] 10.00-11.00 sec 112 MBytes 941 Mbits/sec
> [...]
> [ 5] 50.00-51.00 sec 112 MBytes 941 Mbits/sec
> [ 5] 51.00-52.00 sec 112 MBytes 941 Mbits/sec
> [ 5] 52.00-53.00 sec 112 MBytes 942 Mbits/sec
> [ 5] 53.00-54.00 sec 112 MBytes 941 Mbits/sec
> [ 5] 54.00-55.00 sec 111 MBytes 934 Mbits/sec
> [ 5] 55.00-56.00 sec 112 MBytes 942 Mbits/sec
> [ 5] 56.00-57.00 sec 112 MBytes 937 Mbits/sec
> [ 5] 57.00-58.00 sec 112 MBytes 941 Mbits/sec
> [ 5] 58.00-59.00 sec 111 MBytes 932 Mbits/sec
> [ 5] 59.00-60.00 sec 112 MBytes 942 Mbits/sec
> [ 5] 60.00-60.04 sec 4.06 MBytes 939 Mbits/sec
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval Transfer Bitrate
> [ 5] 0.00-60.04 sec 6.57 GBytes 940 Mbits/sec receiver
>
> regards,
> David
>
Heiner
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: r8169 Driver - Poor Network Performance Since Kernel 4.19
2019-02-14 6:17 ` Heiner Kallweit
@ 2019-02-15 2:51 ` David Chang
0 siblings, 0 replies; 25+ messages in thread
From: David Chang @ 2019-02-15 2:51 UTC (permalink / raw)
To: Heiner Kallweit; +Cc: Realtek linux nic maintainers, netdev, Martti Laaksonen
Hi Heiner,
On Feb 14, 2019 at 07:17:44 +0100, Heiner Kallweit wrote:
> Hi David,
>
> On 14.02.2019 03:45, David Chang wrote:
> > Hi Heiner,
> >
> > On Feb 05, 2019 at 19:50:30 +0100, Heiner Kallweit wrote:
> >> Hi David,
> >>
> >> meanwhile there's the following bug report matching what reported.
> >> It's even the same chip version (RTL8168h).
> >> https://bugzilla.redhat.com/show_bug.cgi?id=1671958
> >>
> >> Symptom there is also a significant number of rx_missed packets.
> >> Could you try what I mentioned there last:
> >> Try building a kernel with the call to rtl_hw_aspm_clkreq_enable(tp, true) at the
> >> end of rtl_hw_start_8168h_1() being disabled.
> >
> > After disabled the aspm function that you mentioned, we finally got the
> > positive testing result. And the rx_missed error was gone. If without
> > the patch, the receive side get back to bad performance.
> >
> Good to know, thanks. I also checked with Realtek, they confirmed that their Windows
> driver uses some heuristics to disable ASPM under high load. So it seems like there
> is some hw issue. Open so far is whether this affects certain chip versions only.
> Let's see whether they can provide more information.
Ok!
> Disabling ASPM in general would hurt notebook users because based on some past
> measurements we know ASPM can significantly save energy.
I understand, thanks!
regards,
David
>
> > kernel: r8169: loading out-of-tree module taints kernel.
> > kernel: r8169: module verification failed: signature and/or required key missing - tainting kernel
> > kernel: libphy: r8169: probed
> > kernel: r8169 0000:01:00.0 eth0: RTL8168h/8111h, ec:8e:b5:5a:2c:f5, XID 54100880, IRQ 128
> > kernel: r8169 0000:01:00.0 eth0: jumbo features [frames: 9200 bytes, tx checksumming: ko]
> > kernel: r8169 0000:01:00.0 enp1s0: renamed from eth0
> > kernel: Generic PHY r8169-100:00: attached PHY driver [Generic PHY] (mii_bus:phy_addr=r8169-100:00, irq=IGNORE)
> > kernel: r8169 0000:01:00.0 enp1s0: Link is Up - 1Gbps/Full - flow control off
> >
> > NIC statistics:
> > tx_packets: 1653804
> > rx_packets: 1555966
> > tx_errors: 0
> > rx_errors: 0
> > rx_missed: 0
> > align_errors: 0
> > tx_single_collisions: 0
> > tx_multi_collisions: 0
> > unicast: 1555884
> > broadcast: 78
> > multicast: 4
> > tx_aborted: 0
> > tx_underrun: 0
> >
> > iperf receive:
> > -----------------------------------------------------------
> > Server listening on 5201
> > -----------------------------------------------------------
> > Accepted connection from 10.x.x.x, port 55516
> > [ 5] local 10.x.x.x port 5201 connected to 10.x.x.x port 58172
> > [ ID] Interval Transfer Bitrate
> > [ 5] 0.00-1.00 sec 108 MBytes 906 Mbits/sec
> > [ 5] 1.00-2.00 sec 112 MBytes 941 Mbits/sec
> > [ 5] 2.00-3.00 sec 112 MBytes 940 Mbits/sec
> > [ 5] 3.00-4.00 sec 112 MBytes 941 Mbits/sec
> > [ 5] 4.00-5.00 sec 112 MBytes 941 Mbits/sec
> > [ 5] 5.00-6.00 sec 112 MBytes 942 Mbits/sec
> > [ 5] 6.00-7.00 sec 112 MBytes 939 Mbits/sec
> > [ 5] 7.00-8.00 sec 112 MBytes 941 Mbits/sec
> > [ 5] 8.00-9.00 sec 112 MBytes 938 Mbits/sec
> > [ 5] 9.00-10.00 sec 112 MBytes 941 Mbits/sec
> > [ 5] 10.00-11.00 sec 112 MBytes 941 Mbits/sec
> > [...]
> > [ 5] 50.00-51.00 sec 112 MBytes 941 Mbits/sec
> > [ 5] 51.00-52.00 sec 112 MBytes 941 Mbits/sec
> > [ 5] 52.00-53.00 sec 112 MBytes 942 Mbits/sec
> > [ 5] 53.00-54.00 sec 112 MBytes 941 Mbits/sec
> > [ 5] 54.00-55.00 sec 111 MBytes 934 Mbits/sec
> > [ 5] 55.00-56.00 sec 112 MBytes 942 Mbits/sec
> > [ 5] 56.00-57.00 sec 112 MBytes 937 Mbits/sec
> > [ 5] 57.00-58.00 sec 112 MBytes 941 Mbits/sec
> > [ 5] 58.00-59.00 sec 111 MBytes 932 Mbits/sec
> > [ 5] 59.00-60.00 sec 112 MBytes 942 Mbits/sec
> > [ 5] 60.00-60.04 sec 4.06 MBytes 939 Mbits/sec
> > - - - - - - - - - - - - - - - - - - - - - - - - -
> > [ ID] Interval Transfer Bitrate
> > [ 5] 0.00-60.04 sec 6.57 GBytes 940 Mbits/sec receiver
> >
> > regards,
> > David
> >
> Heiner
>
^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2019-02-15 2:52 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-28 11:13 r8169 Driver - Poor Network Performance Since Kernel 4.19 Peter Ceiley
2019-01-28 18:28 ` Heiner Kallweit
2019-01-28 22:10 ` Peter Ceiley
2019-01-29 6:16 ` Heiner Kallweit
2019-01-29 6:20 ` Peter Ceiley
2019-01-29 6:44 ` Heiner Kallweit
2019-01-30 9:59 ` Peter Ceiley
2019-01-30 19:15 ` Heiner Kallweit
2019-01-31 2:32 ` David Chang
2019-01-31 6:21 ` Heiner Kallweit
2019-01-31 6:35 ` Heiner Kallweit
2019-01-31 6:49 ` Heiner Kallweit
2019-01-31 7:23 ` David Chang
2019-01-31 12:09 ` Peter Ceiley
2019-01-31 18:28 ` Heiner Kallweit
2019-02-01 4:27 ` David Chang
2019-02-01 4:29 ` David Chang
2019-02-01 6:32 ` Heiner Kallweit
2019-02-02 12:25 ` Heiner Kallweit
2019-02-05 18:50 ` Heiner Kallweit
2019-02-05 18:53 ` Heiner Kallweit
2019-02-11 6:23 ` David Chang
2019-02-14 2:45 ` David Chang
2019-02-14 6:17 ` Heiner Kallweit
2019-02-15 2:51 ` David Chang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).