Netdev Archive on lore.kernel.org
 help / Atom feed
* r8169 Driver - Poor Network Performance Since Kernel 4.19
@ 2019-01-28 11:13 Peter Ceiley
  2019-01-28 18:28 ` Heiner Kallweit
  0 siblings, 1 reply; 25+ messages in thread
From: Peter Ceiley @ 2019-01-28 11:13 UTC (permalink / raw)
  To: Realtek linux nic maintainers, Heiner Kallweit; +Cc: netdev

Hi,

I have been experiencing very poor network performance since Kernel
4.19 and I'm confident it's related to the r8169 driver.

I have no issue with kernel versions 4.18 and prior. I am experiencing
this issue in kernels 4.19 and 4.20 (currently running/testing with
4.20.4 & 4.19.18).

If someone could guide me in the right direction, I'm happy to help
troubleshoot this issue. Note that I have been keeping an eye on one
issue related to loading of the PHY driver, however, my symptoms
differ in that I still have a network connection. I have attempted to
reload the driver on a running system, but this does not improve the
situation.

Using the proprietary r8168 driver returns my device to proper working order.

lshw shows:
       description: Ethernet interface
       product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
       vendor: Realtek Semiconductor Co., Ltd.
       physical id: 0
       bus info: pci@0000:03:00.0
       logical name: enp3s0
       version: 0c
       serial:
       size: 1Gbit/s
       capacity: 1Gbit/s
       width: 64 bits
       clock: 33MHz
       capabilities: pm msi pciexpress msix vpd bus_master cap_list
ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd
1000bt-fd autonegotiation
       configuration: autonegotiation=on broadcast=yes driver=r8169
duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25
latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
       resources: irq:19 ioport:d000(size=256)
memory:f7b00000-f7b00fff memory:f2100000-f2103fff

Kind Regards,

Peter.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: r8169 Driver - Poor Network Performance Since Kernel 4.19
  2019-01-28 11:13 r8169 Driver - Poor Network Performance Since Kernel 4.19 Peter Ceiley
@ 2019-01-28 18:28 ` Heiner Kallweit
  2019-01-28 22:10   ` Peter Ceiley
  0 siblings, 1 reply; 25+ messages in thread
From: Heiner Kallweit @ 2019-01-28 18:28 UTC (permalink / raw)
  To: Peter Ceiley, Realtek linux nic maintainers; +Cc: netdev

On 28.01.2019 12:13, Peter Ceiley wrote:
> Hi,
> 
> I have been experiencing very poor network performance since Kernel
> 4.19 and I'm confident it's related to the r8169 driver.
> 
> I have no issue with kernel versions 4.18 and prior. I am experiencing
> this issue in kernels 4.19 and 4.20 (currently running/testing with
> 4.20.4 & 4.19.18).
> 
> If someone could guide me in the right direction, I'm happy to help
> troubleshoot this issue. Note that I have been keeping an eye on one
> issue related to loading of the PHY driver, however, my symptoms
> differ in that I still have a network connection. I have attempted to
> reload the driver on a running system, but this does not improve the
> situation.
> 
> Using the proprietary r8168 driver returns my device to proper working order.
> 
> lshw shows:
>        description: Ethernet interface
>        product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
>        vendor: Realtek Semiconductor Co., Ltd.
>        physical id: 0
>        bus info: pci@0000:03:00.0
>        logical name: enp3s0
>        version: 0c
>        serial:
>        size: 1Gbit/s
>        capacity: 1Gbit/s
>        width: 64 bits
>        clock: 33MHz
>        capabilities: pm msi pciexpress msix vpd bus_master cap_list
> ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd
> 1000bt-fd autonegotiation
>        configuration: autonegotiation=on broadcast=yes driver=r8169
> duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25
> latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
>        resources: irq:19 ioport:d000(size=256)
> memory:f7b00000-f7b00fff memory:f2100000-f2103fff
> 
> Kind Regards,
> 
> Peter.
> 
Hi Peter,

the description "poor network performance" is quite vague, therefore:

- Can you provide any measurements?
- iperf results before and after
- statistics about dropped packets (rx and/or tx)
- Do you use jumbo packets?

Also help would be a "lspci -vv" output for the network card and
the dmesg output line with the chip XID.

Heiner

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: r8169 Driver - Poor Network Performance Since Kernel 4.19
  2019-01-28 18:28 ` Heiner Kallweit
@ 2019-01-28 22:10   ` Peter Ceiley
  2019-01-29  6:16     ` Heiner Kallweit
  0 siblings, 1 reply; 25+ messages in thread
From: Peter Ceiley @ 2019-01-28 22:10 UTC (permalink / raw)
  To: Heiner Kallweit, Realtek linux nic maintainers; +Cc: netdev

Hi Heiner,

Thanks for getting back to me.

No, I don't use jumbo packets.

Bandwidth is *generally* good, and iperf results to my NAS provide
over 900 Mbits/s in both circumstances. The issue seems to appear when
establishing a connection and is most notable, for example, on my
mounted NFS shares where it takes seconds (up to 10's of seconds on
larger directories) to list the contents of each directory. Once a
transfer begins on a file, I appear to get good bandwidth.

I'm unsure of the best scientific data to provide you in order to
troubleshoot this issue. Running the following

    netstat -s |grep retransmitted

shows a steady increase in retransmitted segments each time I list the
contents of a remote directory, for example, running 'ls' on a
directory containing 345 media files did the following using kernel
4.19.18:

increased retransmitted segments by 21 and the 'time' command showed
the following:
    real    0m19.867s
    user    0m0.012s
    sys    0m0.036s

The same command shows no retransmitted segments running kernel
4.18.16 and 'time' showed:
    real    0m0.300s
    user    0m0.004s
    sys    0m0.007s

ifconfig does not show any RX/TX errors nor dropped packets in either case.

dmesg XID:
[    2.979984] r8169 0000:03:00.0 eth0: RTL8168g/8111g,
f8:b1:56:fe:67:e0, XID 4c000800, IRQ 32

# lspci -vv
03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
    Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
    Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx+
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
    Latency: 0, Cache Line Size: 64 bytes
    Interrupt: pin A routed to IRQ 19
    Region 0: I/O ports at d000 [size=256]
    Region 2: Memory at f7b00000 (64-bit, non-prefetchable) [size=4K]
    Region 4: Memory at f2100000 (64-bit, prefetchable) [size=16K]
    Capabilities: [40] Power Management version 3
        Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
PME(D0+,D1+,D2+,D3hot+,D3cold+)
        Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
    Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
        Address: 0000000000000000  Data: 0000
    Capabilities: [70] Express (v2) Endpoint, MSI 01
        DevCap:    MaxPayload 128 bytes, PhantFunc 0, Latency L0s
<512ns, L1 <64us
            ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
SlotPowerLimit 10.000W
        DevCtl:    CorrErr- NonFatalErr- FatalErr- UnsupReq-
            RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
            MaxPayload 128 bytes, MaxReadReq 4096 bytes
        DevSta:    CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
        LnkCap:    Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
Latency L0s unlimited, L1 <64us
            ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
        LnkCtl:    ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
            ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
        LnkSta:    Speed 2.5GT/s (ok), Width x1 (ok)
            TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
        DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+,
OBFF Via message/WAKE#
             AtomicOpsCap: 32bit- 64bit- 128bitCAS-
        DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+,
OBFF Disabled
             AtomicOpsCtl: ReqEn-
        LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
             Transmit Margin: Normal Operating Range,
EnterModifiedCompliance- ComplianceSOS-
             Compliance De-emphasis: -6dB
        LnkSta2: Current De-emphasis Level: -6dB,
EqualizationComplete-, EqualizationPhase1-
             EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
    Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
        Vector table: BAR=4 offset=00000000
        PBA: BAR=4 offset=00000800
    Capabilities: [d0] Vital Product Data
pcilib: sysfs_read_vpd: read failed: Input/output error
        Not readable
    Capabilities: [100 v1] Advanced Error Reporting
        UESta:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
        UEMsk:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
        UESvrt:    DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
        CESta:    RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ AdvNonFatalErr-
        CEMsk:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
        AERCap:    First Error Pointer: 00, ECRCGenCap+ ECRCGenEn-
ECRCChkCap+ ECRCChkEn-
            MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
        HeaderLog: 00000000 00000000 00000000 00000000
    Capabilities: [140 v1] Virtual Channel
        Caps:    LPEVC=0 RefClk=100ns PATEntryBits=1
        Arb:    Fixed- WRR32- WRR64- WRR128-
        Ctrl:    ArbSelect=Fixed
        Status:    InProgress-
        VC0:    Caps:    PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
            Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
            Ctrl:    Enable+ ID=0 ArbSelect=Fixed TC/VC=01
            Status:    NegoPending- InProgress-
    Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
    Capabilities: [170 v1] Latency Tolerance Reporting
        Max snoop latency: 71680ns
        Max no snoop latency: 71680ns
    Kernel driver in use: r8169
    Kernel modules: r8169

Please let me know if you have any other ideas in terms of testing.

Thanks!

Peter.









On Tue, 29 Jan 2019 at 05:28, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>
> On 28.01.2019 12:13, Peter Ceiley wrote:
> > Hi,
> >
> > I have been experiencing very poor network performance since Kernel
> > 4.19 and I'm confident it's related to the r8169 driver.
> >
> > I have no issue with kernel versions 4.18 and prior. I am experiencing
> > this issue in kernels 4.19 and 4.20 (currently running/testing with
> > 4.20.4 & 4.19.18).
> >
> > If someone could guide me in the right direction, I'm happy to help
> > troubleshoot this issue. Note that I have been keeping an eye on one
> > issue related to loading of the PHY driver, however, my symptoms
> > differ in that I still have a network connection. I have attempted to
> > reload the driver on a running system, but this does not improve the
> > situation.
> >
> > Using the proprietary r8168 driver returns my device to proper working order.
> >
> > lshw shows:
> >        description: Ethernet interface
> >        product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
> >        vendor: Realtek Semiconductor Co., Ltd.
> >        physical id: 0
> >        bus info: pci@0000:03:00.0
> >        logical name: enp3s0
> >        version: 0c
> >        serial:
> >        size: 1Gbit/s
> >        capacity: 1Gbit/s
> >        width: 64 bits
> >        clock: 33MHz
> >        capabilities: pm msi pciexpress msix vpd bus_master cap_list
> > ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd
> > 1000bt-fd autonegotiation
> >        configuration: autonegotiation=on broadcast=yes driver=r8169
> > duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25
> > latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
> >        resources: irq:19 ioport:d000(size=256)
> > memory:f7b00000-f7b00fff memory:f2100000-f2103fff
> >
> > Kind Regards,
> >
> > Peter.
> >
> Hi Peter,
>
> the description "poor network performance" is quite vague, therefore:
>
> - Can you provide any measurements?
> - iperf results before and after
> - statistics about dropped packets (rx and/or tx)
> - Do you use jumbo packets?
>
> Also help would be a "lspci -vv" output for the network card and
> the dmesg output line with the chip XID.
>
> Heiner

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: r8169 Driver - Poor Network Performance Since Kernel 4.19
  2019-01-28 22:10   ` Peter Ceiley
@ 2019-01-29  6:16     ` Heiner Kallweit
  2019-01-29  6:20       ` Peter Ceiley
  0 siblings, 1 reply; 25+ messages in thread
From: Heiner Kallweit @ 2019-01-29  6:16 UTC (permalink / raw)
  To: Peter Ceiley, Realtek linux nic maintainers; +Cc: netdev

Hi Peter,

at a first glance it doesn't look like a typical driver issue.
What you could do:

- Test the r8169.c from 4.18 on top of 4.19.

- Check whether disabling ASPM (/sys/module/pcie_aspm) has an effect.

- Bisect between 4.18 and 4.19 to find the offending commit.

Any specific reason why you think root cause is in the driver and not
elsewhere in the network subsystem?

Heiner


On 28.01.2019 23:10, Peter Ceiley wrote:
> Hi Heiner,
> 
> Thanks for getting back to me.
> 
> No, I don't use jumbo packets.
> 
> Bandwidth is *generally* good, and iperf results to my NAS provide
> over 900 Mbits/s in both circumstances. The issue seems to appear when
> establishing a connection and is most notable, for example, on my
> mounted NFS shares where it takes seconds (up to 10's of seconds on
> larger directories) to list the contents of each directory. Once a
> transfer begins on a file, I appear to get good bandwidth.
> 
> I'm unsure of the best scientific data to provide you in order to
> troubleshoot this issue. Running the following
> 
>     netstat -s |grep retransmitted
> 
> shows a steady increase in retransmitted segments each time I list the
> contents of a remote directory, for example, running 'ls' on a
> directory containing 345 media files did the following using kernel
> 4.19.18:
> 
> increased retransmitted segments by 21 and the 'time' command showed
> the following:
>     real    0m19.867s
>     user    0m0.012s
>     sys    0m0.036s
> 
> The same command shows no retransmitted segments running kernel
> 4.18.16 and 'time' showed:
>     real    0m0.300s
>     user    0m0.004s
>     sys    0m0.007s
> 
> ifconfig does not show any RX/TX errors nor dropped packets in either case.
> 
> dmesg XID:
> [    2.979984] r8169 0000:03:00.0 eth0: RTL8168g/8111g,
> f8:b1:56:fe:67:e0, XID 4c000800, IRQ 32
> 
> # lspci -vv
> 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
>     Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
>     Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> ParErr- Stepping- SERR- FastB2B- DisINTx+
>     Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort- >SERR- <PERR- INTx-
>     Latency: 0, Cache Line Size: 64 bytes
>     Interrupt: pin A routed to IRQ 19
>     Region 0: I/O ports at d000 [size=256]
>     Region 2: Memory at f7b00000 (64-bit, non-prefetchable) [size=4K]
>     Region 4: Memory at f2100000 (64-bit, prefetchable) [size=16K]
>     Capabilities: [40] Power Management version 3
>         Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
> PME(D0+,D1+,D2+,D3hot+,D3cold+)
>         Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
>     Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
>         Address: 0000000000000000  Data: 0000
>     Capabilities: [70] Express (v2) Endpoint, MSI 01
>         DevCap:    MaxPayload 128 bytes, PhantFunc 0, Latency L0s
> <512ns, L1 <64us
>             ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
> SlotPowerLimit 10.000W
>         DevCtl:    CorrErr- NonFatalErr- FatalErr- UnsupReq-
>             RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
>             MaxPayload 128 bytes, MaxReadReq 4096 bytes
>         DevSta:    CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
>         LnkCap:    Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
> Latency L0s unlimited, L1 <64us
>             ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
>         LnkCtl:    ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
>             ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
>         LnkSta:    Speed 2.5GT/s (ok), Width x1 (ok)
>             TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
>         DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+,
> OBFF Via message/WAKE#
>              AtomicOpsCap: 32bit- 64bit- 128bitCAS-
>         DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+,
> OBFF Disabled
>              AtomicOpsCtl: ReqEn-
>         LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
>              Transmit Margin: Normal Operating Range,
> EnterModifiedCompliance- ComplianceSOS-
>              Compliance De-emphasis: -6dB
>         LnkSta2: Current De-emphasis Level: -6dB,
> EqualizationComplete-, EqualizationPhase1-
>              EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
>     Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
>         Vector table: BAR=4 offset=00000000
>         PBA: BAR=4 offset=00000800
>     Capabilities: [d0] Vital Product Data
> pcilib: sysfs_read_vpd: read failed: Input/output error
>         Not readable
>     Capabilities: [100 v1] Advanced Error Reporting
>         UESta:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>         UEMsk:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>         UESvrt:    DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>         CESta:    RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ AdvNonFatalErr-
>         CEMsk:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
>         AERCap:    First Error Pointer: 00, ECRCGenCap+ ECRCGenEn-
> ECRCChkCap+ ECRCChkEn-
>             MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
>         HeaderLog: 00000000 00000000 00000000 00000000
>     Capabilities: [140 v1] Virtual Channel
>         Caps:    LPEVC=0 RefClk=100ns PATEntryBits=1
>         Arb:    Fixed- WRR32- WRR64- WRR128-
>         Ctrl:    ArbSelect=Fixed
>         Status:    InProgress-
>         VC0:    Caps:    PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
>             Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
>             Ctrl:    Enable+ ID=0 ArbSelect=Fixed TC/VC=01
>             Status:    NegoPending- InProgress-
>     Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
>     Capabilities: [170 v1] Latency Tolerance Reporting
>         Max snoop latency: 71680ns
>         Max no snoop latency: 71680ns
>     Kernel driver in use: r8169
>     Kernel modules: r8169
> 
> Please let me know if you have any other ideas in terms of testing.
> 
> Thanks!
> 
> Peter.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On Tue, 29 Jan 2019 at 05:28, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>
>> On 28.01.2019 12:13, Peter Ceiley wrote:
>>> Hi,
>>>
>>> I have been experiencing very poor network performance since Kernel
>>> 4.19 and I'm confident it's related to the r8169 driver.
>>>
>>> I have no issue with kernel versions 4.18 and prior. I am experiencing
>>> this issue in kernels 4.19 and 4.20 (currently running/testing with
>>> 4.20.4 & 4.19.18).
>>>
>>> If someone could guide me in the right direction, I'm happy to help
>>> troubleshoot this issue. Note that I have been keeping an eye on one
>>> issue related to loading of the PHY driver, however, my symptoms
>>> differ in that I still have a network connection. I have attempted to
>>> reload the driver on a running system, but this does not improve the
>>> situation.
>>>
>>> Using the proprietary r8168 driver returns my device to proper working order.
>>>
>>> lshw shows:
>>>        description: Ethernet interface
>>>        product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
>>>        vendor: Realtek Semiconductor Co., Ltd.
>>>        physical id: 0
>>>        bus info: pci@0000:03:00.0
>>>        logical name: enp3s0
>>>        version: 0c
>>>        serial:
>>>        size: 1Gbit/s
>>>        capacity: 1Gbit/s
>>>        width: 64 bits
>>>        clock: 33MHz
>>>        capabilities: pm msi pciexpress msix vpd bus_master cap_list
>>> ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd
>>> 1000bt-fd autonegotiation
>>>        configuration: autonegotiation=on broadcast=yes driver=r8169
>>> duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25
>>> latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
>>>        resources: irq:19 ioport:d000(size=256)
>>> memory:f7b00000-f7b00fff memory:f2100000-f2103fff
>>>
>>> Kind Regards,
>>>
>>> Peter.
>>>
>> Hi Peter,
>>
>> the description "poor network performance" is quite vague, therefore:
>>
>> - Can you provide any measurements?
>> - iperf results before and after
>> - statistics about dropped packets (rx and/or tx)
>> - Do you use jumbo packets?
>>
>> Also help would be a "lspci -vv" output for the network card and
>> the dmesg output line with the chip XID.
>>
>> Heiner
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: r8169 Driver - Poor Network Performance Since Kernel 4.19
  2019-01-29  6:16     ` Heiner Kallweit
@ 2019-01-29  6:20       ` Peter Ceiley
  2019-01-29  6:44         ` Heiner Kallweit
  0 siblings, 1 reply; 25+ messages in thread
From: Peter Ceiley @ 2019-01-29  6:20 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Realtek linux nic maintainers, netdev

Hi Heiner,

Thanks, I'll do some more testing. It might not be the driver - I
assumed it was due to the fact that using the r8168 driver 'resolves'
the issue. I'll see if I can test the r8169.c on top of 4.19 - this is
a good idea.

Cheers,

Peter.

On Tue, 29 Jan 2019 at 17:16, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>
> Hi Peter,
>
> at a first glance it doesn't look like a typical driver issue.
> What you could do:
>
> - Test the r8169.c from 4.18 on top of 4.19.
>
> - Check whether disabling ASPM (/sys/module/pcie_aspm) has an effect.
>
> - Bisect between 4.18 and 4.19 to find the offending commit.
>
> Any specific reason why you think root cause is in the driver and not
> elsewhere in the network subsystem?
>
> Heiner
>
>
> On 28.01.2019 23:10, Peter Ceiley wrote:
> > Hi Heiner,
> >
> > Thanks for getting back to me.
> >
> > No, I don't use jumbo packets.
> >
> > Bandwidth is *generally* good, and iperf results to my NAS provide
> > over 900 Mbits/s in both circumstances. The issue seems to appear when
> > establishing a connection and is most notable, for example, on my
> > mounted NFS shares where it takes seconds (up to 10's of seconds on
> > larger directories) to list the contents of each directory. Once a
> > transfer begins on a file, I appear to get good bandwidth.
> >
> > I'm unsure of the best scientific data to provide you in order to
> > troubleshoot this issue. Running the following
> >
> >     netstat -s |grep retransmitted
> >
> > shows a steady increase in retransmitted segments each time I list the
> > contents of a remote directory, for example, running 'ls' on a
> > directory containing 345 media files did the following using kernel
> > 4.19.18:
> >
> > increased retransmitted segments by 21 and the 'time' command showed
> > the following:
> >     real    0m19.867s
> >     user    0m0.012s
> >     sys    0m0.036s
> >
> > The same command shows no retransmitted segments running kernel
> > 4.18.16 and 'time' showed:
> >     real    0m0.300s
> >     user    0m0.004s
> >     sys    0m0.007s
> >
> > ifconfig does not show any RX/TX errors nor dropped packets in either case.
> >
> > dmesg XID:
> > [    2.979984] r8169 0000:03:00.0 eth0: RTL8168g/8111g,
> > f8:b1:56:fe:67:e0, XID 4c000800, IRQ 32
> >
> > # lspci -vv
> > 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
> > RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
> >     Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
> >     Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> > ParErr- Stepping- SERR- FastB2B- DisINTx+
> >     Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> > <TAbort- <MAbort- >SERR- <PERR- INTx-
> >     Latency: 0, Cache Line Size: 64 bytes
> >     Interrupt: pin A routed to IRQ 19
> >     Region 0: I/O ports at d000 [size=256]
> >     Region 2: Memory at f7b00000 (64-bit, non-prefetchable) [size=4K]
> >     Region 4: Memory at f2100000 (64-bit, prefetchable) [size=16K]
> >     Capabilities: [40] Power Management version 3
> >         Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
> > PME(D0+,D1+,D2+,D3hot+,D3cold+)
> >         Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> >     Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
> >         Address: 0000000000000000  Data: 0000
> >     Capabilities: [70] Express (v2) Endpoint, MSI 01
> >         DevCap:    MaxPayload 128 bytes, PhantFunc 0, Latency L0s
> > <512ns, L1 <64us
> >             ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
> > SlotPowerLimit 10.000W
> >         DevCtl:    CorrErr- NonFatalErr- FatalErr- UnsupReq-
> >             RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
> >             MaxPayload 128 bytes, MaxReadReq 4096 bytes
> >         DevSta:    CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
> >         LnkCap:    Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
> > Latency L0s unlimited, L1 <64us
> >             ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
> >         LnkCtl:    ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
> >             ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
> >         LnkSta:    Speed 2.5GT/s (ok), Width x1 (ok)
> >             TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> >         DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+,
> > OBFF Via message/WAKE#
> >              AtomicOpsCap: 32bit- 64bit- 128bitCAS-
> >         DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+,
> > OBFF Disabled
> >              AtomicOpsCtl: ReqEn-
> >         LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
> >              Transmit Margin: Normal Operating Range,
> > EnterModifiedCompliance- ComplianceSOS-
> >              Compliance De-emphasis: -6dB
> >         LnkSta2: Current De-emphasis Level: -6dB,
> > EqualizationComplete-, EqualizationPhase1-
> >              EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
> >     Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
> >         Vector table: BAR=4 offset=00000000
> >         PBA: BAR=4 offset=00000800
> >     Capabilities: [d0] Vital Product Data
> > pcilib: sysfs_read_vpd: read failed: Input/output error
> >         Not readable
> >     Capabilities: [100 v1] Advanced Error Reporting
> >         UESta:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> > RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> >         UEMsk:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> > RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> >         UESvrt:    DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
> > RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> >         CESta:    RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ AdvNonFatalErr-
> >         CEMsk:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
> >         AERCap:    First Error Pointer: 00, ECRCGenCap+ ECRCGenEn-
> > ECRCChkCap+ ECRCChkEn-
> >             MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
> >         HeaderLog: 00000000 00000000 00000000 00000000
> >     Capabilities: [140 v1] Virtual Channel
> >         Caps:    LPEVC=0 RefClk=100ns PATEntryBits=1
> >         Arb:    Fixed- WRR32- WRR64- WRR128-
> >         Ctrl:    ArbSelect=Fixed
> >         Status:    InProgress-
> >         VC0:    Caps:    PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
> >             Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
> >             Ctrl:    Enable+ ID=0 ArbSelect=Fixed TC/VC=01
> >             Status:    NegoPending- InProgress-
> >     Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
> >     Capabilities: [170 v1] Latency Tolerance Reporting
> >         Max snoop latency: 71680ns
> >         Max no snoop latency: 71680ns
> >     Kernel driver in use: r8169
> >     Kernel modules: r8169
> >
> > Please let me know if you have any other ideas in terms of testing.
> >
> > Thanks!
> >
> > Peter.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Tue, 29 Jan 2019 at 05:28, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>
> >> On 28.01.2019 12:13, Peter Ceiley wrote:
> >>> Hi,
> >>>
> >>> I have been experiencing very poor network performance since Kernel
> >>> 4.19 and I'm confident it's related to the r8169 driver.
> >>>
> >>> I have no issue with kernel versions 4.18 and prior. I am experiencing
> >>> this issue in kernels 4.19 and 4.20 (currently running/testing with
> >>> 4.20.4 & 4.19.18).
> >>>
> >>> If someone could guide me in the right direction, I'm happy to help
> >>> troubleshoot this issue. Note that I have been keeping an eye on one
> >>> issue related to loading of the PHY driver, however, my symptoms
> >>> differ in that I still have a network connection. I have attempted to
> >>> reload the driver on a running system, but this does not improve the
> >>> situation.
> >>>
> >>> Using the proprietary r8168 driver returns my device to proper working order.
> >>>
> >>> lshw shows:
> >>>        description: Ethernet interface
> >>>        product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
> >>>        vendor: Realtek Semiconductor Co., Ltd.
> >>>        physical id: 0
> >>>        bus info: pci@0000:03:00.0
> >>>        logical name: enp3s0
> >>>        version: 0c
> >>>        serial:
> >>>        size: 1Gbit/s
> >>>        capacity: 1Gbit/s
> >>>        width: 64 bits
> >>>        clock: 33MHz
> >>>        capabilities: pm msi pciexpress msix vpd bus_master cap_list
> >>> ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd
> >>> 1000bt-fd autonegotiation
> >>>        configuration: autonegotiation=on broadcast=yes driver=r8169
> >>> duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25
> >>> latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
> >>>        resources: irq:19 ioport:d000(size=256)
> >>> memory:f7b00000-f7b00fff memory:f2100000-f2103fff
> >>>
> >>> Kind Regards,
> >>>
> >>> Peter.
> >>>
> >> Hi Peter,
> >>
> >> the description "poor network performance" is quite vague, therefore:
> >>
> >> - Can you provide any measurements?
> >> - iperf results before and after
> >> - statistics about dropped packets (rx and/or tx)
> >> - Do you use jumbo packets?
> >>
> >> Also help would be a "lspci -vv" output for the network card and
> >> the dmesg output line with the chip XID.
> >>
> >> Heiner
> >
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: r8169 Driver - Poor Network Performance Since Kernel 4.19
  2019-01-29  6:20       ` Peter Ceiley
@ 2019-01-29  6:44         ` Heiner Kallweit
  2019-01-30  9:59           ` Peter Ceiley
  0 siblings, 1 reply; 25+ messages in thread
From: Heiner Kallweit @ 2019-01-29  6:44 UTC (permalink / raw)
  To: Peter Ceiley; +Cc: Realtek linux nic maintainers, netdev

Hi Peter,

I think the vendor driver doesn't enable ASPM per default.
So it's worth a try to disable ASPM in the BIOS or via sysfs.
Few older systems seem to have issues with ASPM, what kind of
system / mainboard are you using? The RTL8168 is the onboard
network chip?

Rgds, Heiner


On 29.01.2019 07:20, Peter Ceiley wrote:
> Hi Heiner,
> 
> Thanks, I'll do some more testing. It might not be the driver - I
> assumed it was due to the fact that using the r8168 driver 'resolves'
> the issue. I'll see if I can test the r8169.c on top of 4.19 - this is
> a good idea.
> 
> Cheers,
> 
> Peter.
> 
> On Tue, 29 Jan 2019 at 17:16, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>
>> Hi Peter,
>>
>> at a first glance it doesn't look like a typical driver issue.
>> What you could do:
>>
>> - Test the r8169.c from 4.18 on top of 4.19.
>>
>> - Check whether disabling ASPM (/sys/module/pcie_aspm) has an effect.
>>
>> - Bisect between 4.18 and 4.19 to find the offending commit.
>>
>> Any specific reason why you think root cause is in the driver and not
>> elsewhere in the network subsystem?
>>
>> Heiner
>>
>>
>> On 28.01.2019 23:10, Peter Ceiley wrote:
>>> Hi Heiner,
>>>
>>> Thanks for getting back to me.
>>>
>>> No, I don't use jumbo packets.
>>>
>>> Bandwidth is *generally* good, and iperf results to my NAS provide
>>> over 900 Mbits/s in both circumstances. The issue seems to appear when
>>> establishing a connection and is most notable, for example, on my
>>> mounted NFS shares where it takes seconds (up to 10's of seconds on
>>> larger directories) to list the contents of each directory. Once a
>>> transfer begins on a file, I appear to get good bandwidth.
>>>
>>> I'm unsure of the best scientific data to provide you in order to
>>> troubleshoot this issue. Running the following
>>>
>>>     netstat -s |grep retransmitted
>>>
>>> shows a steady increase in retransmitted segments each time I list the
>>> contents of a remote directory, for example, running 'ls' on a
>>> directory containing 345 media files did the following using kernel
>>> 4.19.18:
>>>
>>> increased retransmitted segments by 21 and the 'time' command showed
>>> the following:
>>>     real    0m19.867s
>>>     user    0m0.012s
>>>     sys    0m0.036s
>>>
>>> The same command shows no retransmitted segments running kernel
>>> 4.18.16 and 'time' showed:
>>>     real    0m0.300s
>>>     user    0m0.004s
>>>     sys    0m0.007s
>>>
>>> ifconfig does not show any RX/TX errors nor dropped packets in either case.
>>>
>>> dmesg XID:
>>> [    2.979984] r8169 0000:03:00.0 eth0: RTL8168g/8111g,
>>> f8:b1:56:fe:67:e0, XID 4c000800, IRQ 32
>>>
>>> # lspci -vv
>>> 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
>>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
>>>     Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
>>>     Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
>>> ParErr- Stepping- SERR- FastB2B- DisINTx+
>>>     Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
>>> <TAbort- <MAbort- >SERR- <PERR- INTx-
>>>     Latency: 0, Cache Line Size: 64 bytes
>>>     Interrupt: pin A routed to IRQ 19
>>>     Region 0: I/O ports at d000 [size=256]
>>>     Region 2: Memory at f7b00000 (64-bit, non-prefetchable) [size=4K]
>>>     Region 4: Memory at f2100000 (64-bit, prefetchable) [size=16K]
>>>     Capabilities: [40] Power Management version 3
>>>         Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
>>> PME(D0+,D1+,D2+,D3hot+,D3cold+)
>>>         Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
>>>     Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
>>>         Address: 0000000000000000  Data: 0000
>>>     Capabilities: [70] Express (v2) Endpoint, MSI 01
>>>         DevCap:    MaxPayload 128 bytes, PhantFunc 0, Latency L0s
>>> <512ns, L1 <64us
>>>             ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
>>> SlotPowerLimit 10.000W
>>>         DevCtl:    CorrErr- NonFatalErr- FatalErr- UnsupReq-
>>>             RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
>>>             MaxPayload 128 bytes, MaxReadReq 4096 bytes
>>>         DevSta:    CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
>>>         LnkCap:    Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
>>> Latency L0s unlimited, L1 <64us
>>>             ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
>>>         LnkCtl:    ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
>>>             ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
>>>         LnkSta:    Speed 2.5GT/s (ok), Width x1 (ok)
>>>             TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
>>>         DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+,
>>> OBFF Via message/WAKE#
>>>              AtomicOpsCap: 32bit- 64bit- 128bitCAS-
>>>         DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+,
>>> OBFF Disabled
>>>              AtomicOpsCtl: ReqEn-
>>>         LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
>>>              Transmit Margin: Normal Operating Range,
>>> EnterModifiedCompliance- ComplianceSOS-
>>>              Compliance De-emphasis: -6dB
>>>         LnkSta2: Current De-emphasis Level: -6dB,
>>> EqualizationComplete-, EqualizationPhase1-
>>>              EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
>>>     Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
>>>         Vector table: BAR=4 offset=00000000
>>>         PBA: BAR=4 offset=00000800
>>>     Capabilities: [d0] Vital Product Data
>>> pcilib: sysfs_read_vpd: read failed: Input/output error
>>>         Not readable
>>>     Capabilities: [100 v1] Advanced Error Reporting
>>>         UESta:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>>         UEMsk:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>>         UESvrt:    DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
>>> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>>>         CESta:    RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ AdvNonFatalErr-
>>>         CEMsk:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
>>>         AERCap:    First Error Pointer: 00, ECRCGenCap+ ECRCGenEn-
>>> ECRCChkCap+ ECRCChkEn-
>>>             MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
>>>         HeaderLog: 00000000 00000000 00000000 00000000
>>>     Capabilities: [140 v1] Virtual Channel
>>>         Caps:    LPEVC=0 RefClk=100ns PATEntryBits=1
>>>         Arb:    Fixed- WRR32- WRR64- WRR128-
>>>         Ctrl:    ArbSelect=Fixed
>>>         Status:    InProgress-
>>>         VC0:    Caps:    PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
>>>             Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
>>>             Ctrl:    Enable+ ID=0 ArbSelect=Fixed TC/VC=01
>>>             Status:    NegoPending- InProgress-
>>>     Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
>>>     Capabilities: [170 v1] Latency Tolerance Reporting
>>>         Max snoop latency: 71680ns
>>>         Max no snoop latency: 71680ns
>>>     Kernel driver in use: r8169
>>>     Kernel modules: r8169
>>>
>>> Please let me know if you have any other ideas in terms of testing.
>>>
>>> Thanks!
>>>
>>> Peter.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Tue, 29 Jan 2019 at 05:28, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>
>>>> On 28.01.2019 12:13, Peter Ceiley wrote:
>>>>> Hi,
>>>>>
>>>>> I have been experiencing very poor network performance since Kernel
>>>>> 4.19 and I'm confident it's related to the r8169 driver.
>>>>>
>>>>> I have no issue with kernel versions 4.18 and prior. I am experiencing
>>>>> this issue in kernels 4.19 and 4.20 (currently running/testing with
>>>>> 4.20.4 & 4.19.18).
>>>>>
>>>>> If someone could guide me in the right direction, I'm happy to help
>>>>> troubleshoot this issue. Note that I have been keeping an eye on one
>>>>> issue related to loading of the PHY driver, however, my symptoms
>>>>> differ in that I still have a network connection. I have attempted to
>>>>> reload the driver on a running system, but this does not improve the
>>>>> situation.
>>>>>
>>>>> Using the proprietary r8168 driver returns my device to proper working order.
>>>>>
>>>>> lshw shows:
>>>>>        description: Ethernet interface
>>>>>        product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
>>>>>        vendor: Realtek Semiconductor Co., Ltd.
>>>>>        physical id: 0
>>>>>        bus info: pci@0000:03:00.0
>>>>>        logical name: enp3s0
>>>>>        version: 0c
>>>>>        serial:
>>>>>        size: 1Gbit/s
>>>>>        capacity: 1Gbit/s
>>>>>        width: 64 bits
>>>>>        clock: 33MHz
>>>>>        capabilities: pm msi pciexpress msix vpd bus_master cap_list
>>>>> ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd
>>>>> 1000bt-fd autonegotiation
>>>>>        configuration: autonegotiation=on broadcast=yes driver=r8169
>>>>> duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25
>>>>> latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
>>>>>        resources: irq:19 ioport:d000(size=256)
>>>>> memory:f7b00000-f7b00fff memory:f2100000-f2103fff
>>>>>
>>>>> Kind Regards,
>>>>>
>>>>> Peter.
>>>>>
>>>> Hi Peter,
>>>>
>>>> the description "poor network performance" is quite vague, therefore:
>>>>
>>>> - Can you provide any measurements?
>>>> - iperf results before and after
>>>> - statistics about dropped packets (rx and/or tx)
>>>> - Do you use jumbo packets?
>>>>
>>>> Also help would be a "lspci -vv" output for the network card and
>>>> the dmesg output line with the chip XID.
>>>>
>>>> Heiner
>>>
>>
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: r8169 Driver - Poor Network Performance Since Kernel 4.19
  2019-01-29  6:44         ` Heiner Kallweit
@ 2019-01-30  9:59           ` Peter Ceiley
  2019-01-30 19:15             ` Heiner Kallweit
  0 siblings, 1 reply; 25+ messages in thread
From: Peter Ceiley @ 2019-01-30  9:59 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Realtek linux nic maintainers, netdev

Hi Heiner,

I tried disabling the ASPM using the pcie_aspm=off kernel parameter
and this made no difference.

I tried compiling the 4.18.16 r8169.c with the 4.19.18 source and
subsequently loaded the module in the running 4.19.18 kernel. I can
confirm that this immediately resolved the issue and access to the NFS
shares operated as expected.

I presume this means it is an issue with the r8169 driver included in
4.19 onwards?

To answer your last questions:

Base Board Information
    Manufacturer: Alienware
    Product Name: 0PGRP5
    Version: A02

... and yes, the RTL8168 is the onboard network chip.

Regards,

Peter.

On Tue, 29 Jan 2019 at 17:44, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>
> Hi Peter,
>
> I think the vendor driver doesn't enable ASPM per default.
> So it's worth a try to disable ASPM in the BIOS or via sysfs.
> Few older systems seem to have issues with ASPM, what kind of
> system / mainboard are you using? The RTL8168 is the onboard
> network chip?
>
> Rgds, Heiner
>
>
> On 29.01.2019 07:20, Peter Ceiley wrote:
> > Hi Heiner,
> >
> > Thanks, I'll do some more testing. It might not be the driver - I
> > assumed it was due to the fact that using the r8168 driver 'resolves'
> > the issue. I'll see if I can test the r8169.c on top of 4.19 - this is
> > a good idea.
> >
> > Cheers,
> >
> > Peter.
> >
> > On Tue, 29 Jan 2019 at 17:16, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>
> >> Hi Peter,
> >>
> >> at a first glance it doesn't look like a typical driver issue.
> >> What you could do:
> >>
> >> - Test the r8169.c from 4.18 on top of 4.19.
> >>
> >> - Check whether disabling ASPM (/sys/module/pcie_aspm) has an effect.
> >>
> >> - Bisect between 4.18 and 4.19 to find the offending commit.
> >>
> >> Any specific reason why you think root cause is in the driver and not
> >> elsewhere in the network subsystem?
> >>
> >> Heiner
> >>
> >>
> >> On 28.01.2019 23:10, Peter Ceiley wrote:
> >>> Hi Heiner,
> >>>
> >>> Thanks for getting back to me.
> >>>
> >>> No, I don't use jumbo packets.
> >>>
> >>> Bandwidth is *generally* good, and iperf results to my NAS provide
> >>> over 900 Mbits/s in both circumstances. The issue seems to appear when
> >>> establishing a connection and is most notable, for example, on my
> >>> mounted NFS shares where it takes seconds (up to 10's of seconds on
> >>> larger directories) to list the contents of each directory. Once a
> >>> transfer begins on a file, I appear to get good bandwidth.
> >>>
> >>> I'm unsure of the best scientific data to provide you in order to
> >>> troubleshoot this issue. Running the following
> >>>
> >>>     netstat -s |grep retransmitted
> >>>
> >>> shows a steady increase in retransmitted segments each time I list the
> >>> contents of a remote directory, for example, running 'ls' on a
> >>> directory containing 345 media files did the following using kernel
> >>> 4.19.18:
> >>>
> >>> increased retransmitted segments by 21 and the 'time' command showed
> >>> the following:
> >>>     real    0m19.867s
> >>>     user    0m0.012s
> >>>     sys    0m0.036s
> >>>
> >>> The same command shows no retransmitted segments running kernel
> >>> 4.18.16 and 'time' showed:
> >>>     real    0m0.300s
> >>>     user    0m0.004s
> >>>     sys    0m0.007s
> >>>
> >>> ifconfig does not show any RX/TX errors nor dropped packets in either case.
> >>>
> >>> dmesg XID:
> >>> [    2.979984] r8169 0000:03:00.0 eth0: RTL8168g/8111g,
> >>> f8:b1:56:fe:67:e0, XID 4c000800, IRQ 32
> >>>
> >>> # lspci -vv
> >>> 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
> >>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
> >>>     Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
> >>>     Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> >>> ParErr- Stepping- SERR- FastB2B- DisINTx+
> >>>     Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> >>> <TAbort- <MAbort- >SERR- <PERR- INTx-
> >>>     Latency: 0, Cache Line Size: 64 bytes
> >>>     Interrupt: pin A routed to IRQ 19
> >>>     Region 0: I/O ports at d000 [size=256]
> >>>     Region 2: Memory at f7b00000 (64-bit, non-prefetchable) [size=4K]
> >>>     Region 4: Memory at f2100000 (64-bit, prefetchable) [size=16K]
> >>>     Capabilities: [40] Power Management version 3
> >>>         Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
> >>> PME(D0+,D1+,D2+,D3hot+,D3cold+)
> >>>         Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> >>>     Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
> >>>         Address: 0000000000000000  Data: 0000
> >>>     Capabilities: [70] Express (v2) Endpoint, MSI 01
> >>>         DevCap:    MaxPayload 128 bytes, PhantFunc 0, Latency L0s
> >>> <512ns, L1 <64us
> >>>             ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
> >>> SlotPowerLimit 10.000W
> >>>         DevCtl:    CorrErr- NonFatalErr- FatalErr- UnsupReq-
> >>>             RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
> >>>             MaxPayload 128 bytes, MaxReadReq 4096 bytes
> >>>         DevSta:    CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
> >>>         LnkCap:    Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
> >>> Latency L0s unlimited, L1 <64us
> >>>             ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
> >>>         LnkCtl:    ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
> >>>             ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
> >>>         LnkSta:    Speed 2.5GT/s (ok), Width x1 (ok)
> >>>             TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> >>>         DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+,
> >>> OBFF Via message/WAKE#
> >>>              AtomicOpsCap: 32bit- 64bit- 128bitCAS-
> >>>         DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+,
> >>> OBFF Disabled
> >>>              AtomicOpsCtl: ReqEn-
> >>>         LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
> >>>              Transmit Margin: Normal Operating Range,
> >>> EnterModifiedCompliance- ComplianceSOS-
> >>>              Compliance De-emphasis: -6dB
> >>>         LnkSta2: Current De-emphasis Level: -6dB,
> >>> EqualizationComplete-, EqualizationPhase1-
> >>>              EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
> >>>     Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
> >>>         Vector table: BAR=4 offset=00000000
> >>>         PBA: BAR=4 offset=00000800
> >>>     Capabilities: [d0] Vital Product Data
> >>> pcilib: sysfs_read_vpd: read failed: Input/output error
> >>>         Not readable
> >>>     Capabilities: [100 v1] Advanced Error Reporting
> >>>         UESta:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> >>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> >>>         UEMsk:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> >>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> >>>         UESvrt:    DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
> >>> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> >>>         CESta:    RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ AdvNonFatalErr-
> >>>         CEMsk:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
> >>>         AERCap:    First Error Pointer: 00, ECRCGenCap+ ECRCGenEn-
> >>> ECRCChkCap+ ECRCChkEn-
> >>>             MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
> >>>         HeaderLog: 00000000 00000000 00000000 00000000
> >>>     Capabilities: [140 v1] Virtual Channel
> >>>         Caps:    LPEVC=0 RefClk=100ns PATEntryBits=1
> >>>         Arb:    Fixed- WRR32- WRR64- WRR128-
> >>>         Ctrl:    ArbSelect=Fixed
> >>>         Status:    InProgress-
> >>>         VC0:    Caps:    PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
> >>>             Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
> >>>             Ctrl:    Enable+ ID=0 ArbSelect=Fixed TC/VC=01
> >>>             Status:    NegoPending- InProgress-
> >>>     Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
> >>>     Capabilities: [170 v1] Latency Tolerance Reporting
> >>>         Max snoop latency: 71680ns
> >>>         Max no snoop latency: 71680ns
> >>>     Kernel driver in use: r8169
> >>>     Kernel modules: r8169
> >>>
> >>> Please let me know if you have any other ideas in terms of testing.
> >>>
> >>> Thanks!
> >>>
> >>> Peter.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Tue, 29 Jan 2019 at 05:28, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>>>
> >>>> On 28.01.2019 12:13, Peter Ceiley wrote:
> >>>>> Hi,
> >>>>>
> >>>>> I have been experiencing very poor network performance since Kernel
> >>>>> 4.19 and I'm confident it's related to the r8169 driver.
> >>>>>
> >>>>> I have no issue with kernel versions 4.18 and prior. I am experiencing
> >>>>> this issue in kernels 4.19 and 4.20 (currently running/testing with
> >>>>> 4.20.4 & 4.19.18).
> >>>>>
> >>>>> If someone could guide me in the right direction, I'm happy to help
> >>>>> troubleshoot this issue. Note that I have been keeping an eye on one
> >>>>> issue related to loading of the PHY driver, however, my symptoms
> >>>>> differ in that I still have a network connection. I have attempted to
> >>>>> reload the driver on a running system, but this does not improve the
> >>>>> situation.
> >>>>>
> >>>>> Using the proprietary r8168 driver returns my device to proper working order.
> >>>>>
> >>>>> lshw shows:
> >>>>>        description: Ethernet interface
> >>>>>        product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
> >>>>>        vendor: Realtek Semiconductor Co., Ltd.
> >>>>>        physical id: 0
> >>>>>        bus info: pci@0000:03:00.0
> >>>>>        logical name: enp3s0
> >>>>>        version: 0c
> >>>>>        serial:
> >>>>>        size: 1Gbit/s
> >>>>>        capacity: 1Gbit/s
> >>>>>        width: 64 bits
> >>>>>        clock: 33MHz
> >>>>>        capabilities: pm msi pciexpress msix vpd bus_master cap_list
> >>>>> ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd
> >>>>> 1000bt-fd autonegotiation
> >>>>>        configuration: autonegotiation=on broadcast=yes driver=r8169
> >>>>> duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25
> >>>>> latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
> >>>>>        resources: irq:19 ioport:d000(size=256)
> >>>>> memory:f7b00000-f7b00fff memory:f2100000-f2103fff
> >>>>>
> >>>>> Kind Regards,
> >>>>>
> >>>>> Peter.
> >>>>>
> >>>> Hi Peter,
> >>>>
> >>>> the description "poor network performance" is quite vague, therefore:
> >>>>
> >>>> - Can you provide any measurements?
> >>>> - iperf results before and after
> >>>> - statistics about dropped packets (rx and/or tx)
> >>>> - Do you use jumbo packets?
> >>>>
> >>>> Also help would be a "lspci -vv" output for the network card and
> >>>> the dmesg output line with the chip XID.
> >>>>
> >>>> Heiner
> >>>
> >>
> >
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: r8169 Driver - Poor Network Performance Since Kernel 4.19
  2019-01-30  9:59           ` Peter Ceiley
@ 2019-01-30 19:15             ` Heiner Kallweit
  2019-01-31  2:32               ` David Chang
  0 siblings, 1 reply; 25+ messages in thread
From: Heiner Kallweit @ 2019-01-30 19:15 UTC (permalink / raw)
  To: Peter Ceiley; +Cc: Realtek linux nic maintainers, netdev

Hi Peter,

recently I had somebody where pcie_aspm=off for whatever reason didn't
do the trick, can you also check with pcie_aspm.policy=performance.

And please check with "ethtool -S <if>" whether the chip statistics
show a significant number of errors.

If this doesn't help you may have to bisect to find the offending commit.

Heiner


On 30.01.2019 10:59, Peter Ceiley wrote:
> Hi Heiner,
> 
> I tried disabling the ASPM using the pcie_aspm=off kernel parameter
> and this made no difference.
> 
> I tried compiling the 4.18.16 r8169.c with the 4.19.18 source and
> subsequently loaded the module in the running 4.19.18 kernel. I can
> confirm that this immediately resolved the issue and access to the NFS
> shares operated as expected.
> 
> I presume this means it is an issue with the r8169 driver included in
> 4.19 onwards?
> 
> To answer your last questions:
> 
> Base Board Information
>     Manufacturer: Alienware
>     Product Name: 0PGRP5
>     Version: A02
> 
> ... and yes, the RTL8168 is the onboard network chip.
> 
> Regards,
> 
> Peter.
> 
> On Tue, 29 Jan 2019 at 17:44, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>
>> Hi Peter,
>>
>> I think the vendor driver doesn't enable ASPM per default.
>> So it's worth a try to disable ASPM in the BIOS or via sysfs.
>> Few older systems seem to have issues with ASPM, what kind of
>> system / mainboard are you using? The RTL8168 is the onboard
>> network chip?
>>
>> Rgds, Heiner
>>
>>
>> On 29.01.2019 07:20, Peter Ceiley wrote:
>>> Hi Heiner,
>>>
>>> Thanks, I'll do some more testing. It might not be the driver - I
>>> assumed it was due to the fact that using the r8168 driver 'resolves'
>>> the issue. I'll see if I can test the r8169.c on top of 4.19 - this is
>>> a good idea.
>>>
>>> Cheers,
>>>
>>> Peter.
>>>
>>> On Tue, 29 Jan 2019 at 17:16, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>
>>>> Hi Peter,
>>>>
>>>> at a first glance it doesn't look like a typical driver issue.
>>>> What you could do:
>>>>
>>>> - Test the r8169.c from 4.18 on top of 4.19.
>>>>
>>>> - Check whether disabling ASPM (/sys/module/pcie_aspm) has an effect.
>>>>
>>>> - Bisect between 4.18 and 4.19 to find the offending commit.
>>>>
>>>> Any specific reason why you think root cause is in the driver and not
>>>> elsewhere in the network subsystem?
>>>>
>>>> Heiner
>>>>
>>>>
>>>> On 28.01.2019 23:10, Peter Ceiley wrote:
>>>>> Hi Heiner,
>>>>>
>>>>> Thanks for getting back to me.
>>>>>
>>>>> No, I don't use jumbo packets.
>>>>>
>>>>> Bandwidth is *generally* good, and iperf results to my NAS provide
>>>>> over 900 Mbits/s in both circumstances. The issue seems to appear when
>>>>> establishing a connection and is most notable, for example, on my
>>>>> mounted NFS shares where it takes seconds (up to 10's of seconds on
>>>>> larger directories) to list the contents of each directory. Once a
>>>>> transfer begins on a file, I appear to get good bandwidth.
>>>>>
>>>>> I'm unsure of the best scientific data to provide you in order to
>>>>> troubleshoot this issue. Running the following
>>>>>
>>>>>     netstat -s |grep retransmitted
>>>>>
>>>>> shows a steady increase in retransmitted segments each time I list the
>>>>> contents of a remote directory, for example, running 'ls' on a
>>>>> directory containing 345 media files did the following using kernel
>>>>> 4.19.18:
>>>>>
>>>>> increased retransmitted segments by 21 and the 'time' command showed
>>>>> the following:
>>>>>     real    0m19.867s
>>>>>     user    0m0.012s
>>>>>     sys    0m0.036s
>>>>>
>>>>> The same command shows no retransmitted segments running kernel
>>>>> 4.18.16 and 'time' showed:
>>>>>     real    0m0.300s
>>>>>     user    0m0.004s
>>>>>     sys    0m0.007s
>>>>>
>>>>> ifconfig does not show any RX/TX errors nor dropped packets in either case.
>>>>>
>>>>> dmesg XID:
>>>>> [    2.979984] r8169 0000:03:00.0 eth0: RTL8168g/8111g,
>>>>> f8:b1:56:fe:67:e0, XID 4c000800, IRQ 32
>>>>>
>>>>> # lspci -vv
>>>>> 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
>>>>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
>>>>>     Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
>>>>>     Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
>>>>> ParErr- Stepping- SERR- FastB2B- DisINTx+
>>>>>     Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
>>>>> <TAbort- <MAbort- >SERR- <PERR- INTx-
>>>>>     Latency: 0, Cache Line Size: 64 bytes
>>>>>     Interrupt: pin A routed to IRQ 19
>>>>>     Region 0: I/O ports at d000 [size=256]
>>>>>     Region 2: Memory at f7b00000 (64-bit, non-prefetchable) [size=4K]
>>>>>     Region 4: Memory at f2100000 (64-bit, prefetchable) [size=16K]
>>>>>     Capabilities: [40] Power Management version 3
>>>>>         Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
>>>>> PME(D0+,D1+,D2+,D3hot+,D3cold+)
>>>>>         Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
>>>>>     Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
>>>>>         Address: 0000000000000000  Data: 0000
>>>>>     Capabilities: [70] Express (v2) Endpoint, MSI 01
>>>>>         DevCap:    MaxPayload 128 bytes, PhantFunc 0, Latency L0s
>>>>> <512ns, L1 <64us
>>>>>             ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
>>>>> SlotPowerLimit 10.000W
>>>>>         DevCtl:    CorrErr- NonFatalErr- FatalErr- UnsupReq-
>>>>>             RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
>>>>>             MaxPayload 128 bytes, MaxReadReq 4096 bytes
>>>>>         DevSta:    CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
>>>>>         LnkCap:    Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
>>>>> Latency L0s unlimited, L1 <64us
>>>>>             ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
>>>>>         LnkCtl:    ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
>>>>>             ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
>>>>>         LnkSta:    Speed 2.5GT/s (ok), Width x1 (ok)
>>>>>             TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
>>>>>         DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+,
>>>>> OBFF Via message/WAKE#
>>>>>              AtomicOpsCap: 32bit- 64bit- 128bitCAS-
>>>>>         DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+,
>>>>> OBFF Disabled
>>>>>              AtomicOpsCtl: ReqEn-
>>>>>         LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
>>>>>              Transmit Margin: Normal Operating Range,
>>>>> EnterModifiedCompliance- ComplianceSOS-
>>>>>              Compliance De-emphasis: -6dB
>>>>>         LnkSta2: Current De-emphasis Level: -6dB,
>>>>> EqualizationComplete-, EqualizationPhase1-
>>>>>              EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
>>>>>     Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
>>>>>         Vector table: BAR=4 offset=00000000
>>>>>         PBA: BAR=4 offset=00000800
>>>>>     Capabilities: [d0] Vital Product Data
>>>>> pcilib: sysfs_read_vpd: read failed: Input/output error
>>>>>         Not readable
>>>>>     Capabilities: [100 v1] Advanced Error Reporting
>>>>>         UESta:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>>>>         UEMsk:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>>>>         UESvrt:    DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
>>>>> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>>>>>         CESta:    RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ AdvNonFatalErr-
>>>>>         CEMsk:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
>>>>>         AERCap:    First Error Pointer: 00, ECRCGenCap+ ECRCGenEn-
>>>>> ECRCChkCap+ ECRCChkEn-
>>>>>             MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
>>>>>         HeaderLog: 00000000 00000000 00000000 00000000
>>>>>     Capabilities: [140 v1] Virtual Channel
>>>>>         Caps:    LPEVC=0 RefClk=100ns PATEntryBits=1
>>>>>         Arb:    Fixed- WRR32- WRR64- WRR128-
>>>>>         Ctrl:    ArbSelect=Fixed
>>>>>         Status:    InProgress-
>>>>>         VC0:    Caps:    PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
>>>>>             Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
>>>>>             Ctrl:    Enable+ ID=0 ArbSelect=Fixed TC/VC=01
>>>>>             Status:    NegoPending- InProgress-
>>>>>     Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
>>>>>     Capabilities: [170 v1] Latency Tolerance Reporting
>>>>>         Max snoop latency: 71680ns
>>>>>         Max no snoop latency: 71680ns
>>>>>     Kernel driver in use: r8169
>>>>>     Kernel modules: r8169
>>>>>
>>>>> Please let me know if you have any other ideas in terms of testing.
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Peter.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, 29 Jan 2019 at 05:28, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>>>
>>>>>> On 28.01.2019 12:13, Peter Ceiley wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I have been experiencing very poor network performance since Kernel
>>>>>>> 4.19 and I'm confident it's related to the r8169 driver.
>>>>>>>
>>>>>>> I have no issue with kernel versions 4.18 and prior. I am experiencing
>>>>>>> this issue in kernels 4.19 and 4.20 (currently running/testing with
>>>>>>> 4.20.4 & 4.19.18).
>>>>>>>
>>>>>>> If someone could guide me in the right direction, I'm happy to help
>>>>>>> troubleshoot this issue. Note that I have been keeping an eye on one
>>>>>>> issue related to loading of the PHY driver, however, my symptoms
>>>>>>> differ in that I still have a network connection. I have attempted to
>>>>>>> reload the driver on a running system, but this does not improve the
>>>>>>> situation.
>>>>>>>
>>>>>>> Using the proprietary r8168 driver returns my device to proper working order.
>>>>>>>
>>>>>>> lshw shows:
>>>>>>>        description: Ethernet interface
>>>>>>>        product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
>>>>>>>        vendor: Realtek Semiconductor Co., Ltd.
>>>>>>>        physical id: 0
>>>>>>>        bus info: pci@0000:03:00.0
>>>>>>>        logical name: enp3s0
>>>>>>>        version: 0c
>>>>>>>        serial:
>>>>>>>        size: 1Gbit/s
>>>>>>>        capacity: 1Gbit/s
>>>>>>>        width: 64 bits
>>>>>>>        clock: 33MHz
>>>>>>>        capabilities: pm msi pciexpress msix vpd bus_master cap_list
>>>>>>> ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd
>>>>>>> 1000bt-fd autonegotiation
>>>>>>>        configuration: autonegotiation=on broadcast=yes driver=r8169
>>>>>>> duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25
>>>>>>> latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
>>>>>>>        resources: irq:19 ioport:d000(size=256)
>>>>>>> memory:f7b00000-f7b00fff memory:f2100000-f2103fff
>>>>>>>
>>>>>>> Kind Regards,
>>>>>>>
>>>>>>> Peter.
>>>>>>>
>>>>>> Hi Peter,
>>>>>>
>>>>>> the description "poor network performance" is quite vague, therefore:
>>>>>>
>>>>>> - Can you provide any measurements?
>>>>>> - iperf results before and after
>>>>>> - statistics about dropped packets (rx and/or tx)
>>>>>> - Do you use jumbo packets?
>>>>>>
>>>>>> Also help would be a "lspci -vv" output for the network card and
>>>>>> the dmesg output line with the chip XID.
>>>>>>
>>>>>> Heiner
>>>>>
>>>>
>>>
>>
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: r8169 Driver - Poor Network Performance Since Kernel 4.19
  2019-01-30 19:15             ` Heiner Kallweit
@ 2019-01-31  2:32               ` David Chang
  2019-01-31  6:21                 ` Heiner Kallweit
                                   ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: David Chang @ 2019-01-31  2:32 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Peter Ceiley, Realtek linux nic maintainers, netdev

Hi,

We had a similr case here.
- Realtek r8169 receive performance regression in kernel 4.19
  https://bugzilla.suse.com/show_bug.cgi?id=1119649

kernel: r8169 0000:01:00.0 eth0: RTL8168h/8111h, XID 54100880
The major symptom is there are many rx_missed count.


On Jan 30, 2019 at 20:15:45 +0100, Heiner Kallweit wrote:
> Hi Peter,
> 
> recently I had somebody where pcie_aspm=off for whatever reason didn't
> do the trick, can you also check with pcie_aspm.policy=performance.

We will give it a try later.

> And please check with "ethtool -S <if>" whether the chip statistics
> show a significant number of errors.
> 
> If this doesn't help you may have to bisect to find the offending commit.

We had tried fallback driver to a few previous commits as following,
but with no luck.

9675931e6b65 r8169: re-enable MSI-X on RTL8168g (v4.19)
098b01ad9837 r8169: don't include asm headers directly (v4.19-rc1)
a2965f12fde6 r8169: remove rtl8169_set_speed_xmii (v4.19-rc1)
6fcf9b1d4d6c r8169: fix runtime suspend (v4.19-rc1)
e397286b8e89 r8169: remove TBI 1000BaseX support (v4.19-rc1)

Thanks,
David Chang

> 
> Heiner
> 
> 
> On 30.01.2019 10:59, Peter Ceiley wrote:
> > Hi Heiner,
> > 
> > I tried disabling the ASPM using the pcie_aspm=off kernel parameter
> > and this made no difference.
> > 
> > I tried compiling the 4.18.16 r8169.c with the 4.19.18 source and
> > subsequently loaded the module in the running 4.19.18 kernel. I can
> > confirm that this immediately resolved the issue and access to the NFS
> > shares operated as expected.
> > 
> > I presume this means it is an issue with the r8169 driver included in
> > 4.19 onwards?
> > 
> > To answer your last questions:
> > 
> > Base Board Information
> >     Manufacturer: Alienware
> >     Product Name: 0PGRP5
> >     Version: A02
> > 
> > ... and yes, the RTL8168 is the onboard network chip.
> > 
> > Regards,
> > 
> > Peter.
> > 
> > On Tue, 29 Jan 2019 at 17:44, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>
> >> Hi Peter,
> >>
> >> I think the vendor driver doesn't enable ASPM per default.
> >> So it's worth a try to disable ASPM in the BIOS or via sysfs.
> >> Few older systems seem to have issues with ASPM, what kind of
> >> system / mainboard are you using? The RTL8168 is the onboard
> >> network chip?
> >>
> >> Rgds, Heiner
> >>
> >>
> >> On 29.01.2019 07:20, Peter Ceiley wrote:
> >>> Hi Heiner,
> >>>
> >>> Thanks, I'll do some more testing. It might not be the driver - I
> >>> assumed it was due to the fact that using the r8168 driver 'resolves'
> >>> the issue. I'll see if I can test the r8169.c on top of 4.19 - this is
> >>> a good idea.
> >>>
> >>> Cheers,
> >>>
> >>> Peter.
> >>>
> >>> On Tue, 29 Jan 2019 at 17:16, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>>>
> >>>> Hi Peter,
> >>>>
> >>>> at a first glance it doesn't look like a typical driver issue.
> >>>> What you could do:
> >>>>
> >>>> - Test the r8169.c from 4.18 on top of 4.19.
> >>>>
> >>>> - Check whether disabling ASPM (/sys/module/pcie_aspm) has an effect.
> >>>>
> >>>> - Bisect between 4.18 and 4.19 to find the offending commit.
> >>>>
> >>>> Any specific reason why you think root cause is in the driver and not
> >>>> elsewhere in the network subsystem?
> >>>>
> >>>> Heiner
> >>>>
> >>>>
> >>>> On 28.01.2019 23:10, Peter Ceiley wrote:
> >>>>> Hi Heiner,
> >>>>>
> >>>>> Thanks for getting back to me.
> >>>>>
> >>>>> No, I don't use jumbo packets.
> >>>>>
> >>>>> Bandwidth is *generally* good, and iperf results to my NAS provide
> >>>>> over 900 Mbits/s in both circumstances. The issue seems to appear when
> >>>>> establishing a connection and is most notable, for example, on my
> >>>>> mounted NFS shares where it takes seconds (up to 10's of seconds on
> >>>>> larger directories) to list the contents of each directory. Once a
> >>>>> transfer begins on a file, I appear to get good bandwidth.
> >>>>>
> >>>>> I'm unsure of the best scientific data to provide you in order to
> >>>>> troubleshoot this issue. Running the following
> >>>>>
> >>>>>     netstat -s |grep retransmitted
> >>>>>
> >>>>> shows a steady increase in retransmitted segments each time I list the
> >>>>> contents of a remote directory, for example, running 'ls' on a
> >>>>> directory containing 345 media files did the following using kernel
> >>>>> 4.19.18:
> >>>>>
> >>>>> increased retransmitted segments by 21 and the 'time' command showed
> >>>>> the following:
> >>>>>     real    0m19.867s
> >>>>>     user    0m0.012s
> >>>>>     sys    0m0.036s
> >>>>>
> >>>>> The same command shows no retransmitted segments running kernel
> >>>>> 4.18.16 and 'time' showed:
> >>>>>     real    0m0.300s
> >>>>>     user    0m0.004s
> >>>>>     sys    0m0.007s
> >>>>>
> >>>>> ifconfig does not show any RX/TX errors nor dropped packets in either case.
> >>>>>
> >>>>> dmesg XID:
> >>>>> [    2.979984] r8169 0000:03:00.0 eth0: RTL8168g/8111g,
> >>>>> f8:b1:56:fe:67:e0, XID 4c000800, IRQ 32
> >>>>>
> >>>>> # lspci -vv
> >>>>> 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
> >>>>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
> >>>>>     Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
> >>>>>     Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> >>>>> ParErr- Stepping- SERR- FastB2B- DisINTx+
> >>>>>     Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> >>>>> <TAbort- <MAbort- >SERR- <PERR- INTx-
> >>>>>     Latency: 0, Cache Line Size: 64 bytes
> >>>>>     Interrupt: pin A routed to IRQ 19
> >>>>>     Region 0: I/O ports at d000 [size=256]
> >>>>>     Region 2: Memory at f7b00000 (64-bit, non-prefetchable) [size=4K]
> >>>>>     Region 4: Memory at f2100000 (64-bit, prefetchable) [size=16K]
> >>>>>     Capabilities: [40] Power Management version 3
> >>>>>         Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
> >>>>> PME(D0+,D1+,D2+,D3hot+,D3cold+)
> >>>>>         Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> >>>>>     Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
> >>>>>         Address: 0000000000000000  Data: 0000
> >>>>>     Capabilities: [70] Express (v2) Endpoint, MSI 01
> >>>>>         DevCap:    MaxPayload 128 bytes, PhantFunc 0, Latency L0s
> >>>>> <512ns, L1 <64us
> >>>>>             ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
> >>>>> SlotPowerLimit 10.000W
> >>>>>         DevCtl:    CorrErr- NonFatalErr- FatalErr- UnsupReq-
> >>>>>             RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
> >>>>>             MaxPayload 128 bytes, MaxReadReq 4096 bytes
> >>>>>         DevSta:    CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
> >>>>>         LnkCap:    Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
> >>>>> Latency L0s unlimited, L1 <64us
> >>>>>             ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
> >>>>>         LnkCtl:    ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
> >>>>>             ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
> >>>>>         LnkSta:    Speed 2.5GT/s (ok), Width x1 (ok)
> >>>>>             TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> >>>>>         DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+,
> >>>>> OBFF Via message/WAKE#
> >>>>>              AtomicOpsCap: 32bit- 64bit- 128bitCAS-
> >>>>>         DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+,
> >>>>> OBFF Disabled
> >>>>>              AtomicOpsCtl: ReqEn-
> >>>>>         LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
> >>>>>              Transmit Margin: Normal Operating Range,
> >>>>> EnterModifiedCompliance- ComplianceSOS-
> >>>>>              Compliance De-emphasis: -6dB
> >>>>>         LnkSta2: Current De-emphasis Level: -6dB,
> >>>>> EqualizationComplete-, EqualizationPhase1-
> >>>>>              EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
> >>>>>     Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
> >>>>>         Vector table: BAR=4 offset=00000000
> >>>>>         PBA: BAR=4 offset=00000800
> >>>>>     Capabilities: [d0] Vital Product Data
> >>>>> pcilib: sysfs_read_vpd: read failed: Input/output error
> >>>>>         Not readable
> >>>>>     Capabilities: [100 v1] Advanced Error Reporting
> >>>>>         UESta:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> >>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> >>>>>         UEMsk:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> >>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> >>>>>         UESvrt:    DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
> >>>>> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> >>>>>         CESta:    RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ AdvNonFatalErr-
> >>>>>         CEMsk:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
> >>>>>         AERCap:    First Error Pointer: 00, ECRCGenCap+ ECRCGenEn-
> >>>>> ECRCChkCap+ ECRCChkEn-
> >>>>>             MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
> >>>>>         HeaderLog: 00000000 00000000 00000000 00000000
> >>>>>     Capabilities: [140 v1] Virtual Channel
> >>>>>         Caps:    LPEVC=0 RefClk=100ns PATEntryBits=1
> >>>>>         Arb:    Fixed- WRR32- WRR64- WRR128-
> >>>>>         Ctrl:    ArbSelect=Fixed
> >>>>>         Status:    InProgress-
> >>>>>         VC0:    Caps:    PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
> >>>>>             Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
> >>>>>             Ctrl:    Enable+ ID=0 ArbSelect=Fixed TC/VC=01
> >>>>>             Status:    NegoPending- InProgress-
> >>>>>     Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
> >>>>>     Capabilities: [170 v1] Latency Tolerance Reporting
> >>>>>         Max snoop latency: 71680ns
> >>>>>         Max no snoop latency: 71680ns
> >>>>>     Kernel driver in use: r8169
> >>>>>     Kernel modules: r8169
> >>>>>
> >>>>> Please let me know if you have any other ideas in terms of testing.
> >>>>>
> >>>>> Thanks!
> >>>>>
> >>>>> Peter.
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Tue, 29 Jan 2019 at 05:28, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>>>>>
> >>>>>> On 28.01.2019 12:13, Peter Ceiley wrote:
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> I have been experiencing very poor network performance since Kernel
> >>>>>>> 4.19 and I'm confident it's related to the r8169 driver.
> >>>>>>>
> >>>>>>> I have no issue with kernel versions 4.18 and prior. I am experiencing
> >>>>>>> this issue in kernels 4.19 and 4.20 (currently running/testing with
> >>>>>>> 4.20.4 & 4.19.18).
> >>>>>>>
> >>>>>>> If someone could guide me in the right direction, I'm happy to help
> >>>>>>> troubleshoot this issue. Note that I have been keeping an eye on one
> >>>>>>> issue related to loading of the PHY driver, however, my symptoms
> >>>>>>> differ in that I still have a network connection. I have attempted to
> >>>>>>> reload the driver on a running system, but this does not improve the
> >>>>>>> situation.
> >>>>>>>
> >>>>>>> Using the proprietary r8168 driver returns my device to proper working order.
> >>>>>>>
> >>>>>>> lshw shows:
> >>>>>>>        description: Ethernet interface
> >>>>>>>        product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
> >>>>>>>        vendor: Realtek Semiconductor Co., Ltd.
> >>>>>>>        physical id: 0
> >>>>>>>        bus info: pci@0000:03:00.0
> >>>>>>>        logical name: enp3s0
> >>>>>>>        version: 0c
> >>>>>>>        serial:
> >>>>>>>        size: 1Gbit/s
> >>>>>>>        capacity: 1Gbit/s
> >>>>>>>        width: 64 bits
> >>>>>>>        clock: 33MHz
> >>>>>>>        capabilities: pm msi pciexpress msix vpd bus_master cap_list
> >>>>>>> ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd
> >>>>>>> 1000bt-fd autonegotiation
> >>>>>>>        configuration: autonegotiation=on broadcast=yes driver=r8169
> >>>>>>> duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25
> >>>>>>> latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
> >>>>>>>        resources: irq:19 ioport:d000(size=256)
> >>>>>>> memory:f7b00000-f7b00fff memory:f2100000-f2103fff
> >>>>>>>
> >>>>>>> Kind Regards,
> >>>>>>>
> >>>>>>> Peter.
> >>>>>>>
> >>>>>> Hi Peter,
> >>>>>>
> >>>>>> the description "poor network performance" is quite vague, therefore:
> >>>>>>
> >>>>>> - Can you provide any measurements?
> >>>>>> - iperf results before and after
> >>>>>> - statistics about dropped packets (rx and/or tx)
> >>>>>> - Do you use jumbo packets?
> >>>>>>
> >>>>>> Also help would be a "lspci -vv" output for the network card and
> >>>>>> the dmesg output line with the chip XID.
> >>>>>>
> >>>>>> Heiner
> >>>>>
> >>>>
> >>>
> >>
> > 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: r8169 Driver - Poor Network Performance Since Kernel 4.19
  2019-01-31  2:32               ` David Chang
@ 2019-01-31  6:21                 ` Heiner Kallweit
  2019-01-31  6:35                   ` Heiner Kallweit
  2019-02-02 12:25                 ` Heiner Kallweit
  2019-02-05 18:50                 ` Heiner Kallweit
  2 siblings, 1 reply; 25+ messages in thread
From: Heiner Kallweit @ 2019-01-31  6:21 UTC (permalink / raw)
  To: David Chang; +Cc: Peter Ceiley, Realtek linux nic maintainers, netdev

David, thanks for the link to the bug ticket.
I think only a proper bisect can help to find the offending commit.

Heiner


On 31.01.2019 03:32, David Chang wrote:
> Hi,
> 
> We had a similr case here.
> - Realtek r8169 receive performance regression in kernel 4.19
>   https://bugzilla.suse.com/show_bug.cgi?id=1119649
> 
> kernel: r8169 0000:01:00.0 eth0: RTL8168h/8111h, XID 54100880
> The major symptom is there are many rx_missed count.
> 
> 
> On Jan 30, 2019 at 20:15:45 +0100, Heiner Kallweit wrote:
>> Hi Peter,
>>
>> recently I had somebody where pcie_aspm=off for whatever reason didn't
>> do the trick, can you also check with pcie_aspm.policy=performance.
> 
> We will give it a try later.
> 
>> And please check with "ethtool -S <if>" whether the chip statistics
>> show a significant number of errors.
>>
>> If this doesn't help you may have to bisect to find the offending commit.
> 
> We had tried fallback driver to a few previous commits as following,
> but with no luck.
> 
> 9675931e6b65 r8169: re-enable MSI-X on RTL8168g (v4.19)
> 098b01ad9837 r8169: don't include asm headers directly (v4.19-rc1)
> a2965f12fde6 r8169: remove rtl8169_set_speed_xmii (v4.19-rc1)
> 6fcf9b1d4d6c r8169: fix runtime suspend (v4.19-rc1)
> e397286b8e89 r8169: remove TBI 1000BaseX support (v4.19-rc1)
> 
> Thanks,
> David Chang
> 
>>
>> Heiner
>>
>>
>> On 30.01.2019 10:59, Peter Ceiley wrote:
>>> Hi Heiner,
>>>
>>> I tried disabling the ASPM using the pcie_aspm=off kernel parameter
>>> and this made no difference.
>>>
>>> I tried compiling the 4.18.16 r8169.c with the 4.19.18 source and
>>> subsequently loaded the module in the running 4.19.18 kernel. I can
>>> confirm that this immediately resolved the issue and access to the NFS
>>> shares operated as expected.
>>>
>>> I presume this means it is an issue with the r8169 driver included in
>>> 4.19 onwards?
>>>
>>> To answer your last questions:
>>>
>>> Base Board Information
>>>     Manufacturer: Alienware
>>>     Product Name: 0PGRP5
>>>     Version: A02
>>>
>>> ... and yes, the RTL8168 is the onboard network chip.
>>>
>>> Regards,
>>>
>>> Peter.
>>>
>>> On Tue, 29 Jan 2019 at 17:44, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>
>>>> Hi Peter,
>>>>
>>>> I think the vendor driver doesn't enable ASPM per default.
>>>> So it's worth a try to disable ASPM in the BIOS or via sysfs.
>>>> Few older systems seem to have issues with ASPM, what kind of
>>>> system / mainboard are you using? The RTL8168 is the onboard
>>>> network chip?
>>>>
>>>> Rgds, Heiner
>>>>
>>>>
>>>> On 29.01.2019 07:20, Peter Ceiley wrote:
>>>>> Hi Heiner,
>>>>>
>>>>> Thanks, I'll do some more testing. It might not be the driver - I
>>>>> assumed it was due to the fact that using the r8168 driver 'resolves'
>>>>> the issue. I'll see if I can test the r8169.c on top of 4.19 - this is
>>>>> a good idea.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Peter.
>>>>>
>>>>> On Tue, 29 Jan 2019 at 17:16, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>>>
>>>>>> Hi Peter,
>>>>>>
>>>>>> at a first glance it doesn't look like a typical driver issue.
>>>>>> What you could do:
>>>>>>
>>>>>> - Test the r8169.c from 4.18 on top of 4.19.
>>>>>>
>>>>>> - Check whether disabling ASPM (/sys/module/pcie_aspm) has an effect.
>>>>>>
>>>>>> - Bisect between 4.18 and 4.19 to find the offending commit.
>>>>>>
>>>>>> Any specific reason why you think root cause is in the driver and not
>>>>>> elsewhere in the network subsystem?
>>>>>>
>>>>>> Heiner
>>>>>>
>>>>>>
>>>>>> On 28.01.2019 23:10, Peter Ceiley wrote:
>>>>>>> Hi Heiner,
>>>>>>>
>>>>>>> Thanks for getting back to me.
>>>>>>>
>>>>>>> No, I don't use jumbo packets.
>>>>>>>
>>>>>>> Bandwidth is *generally* good, and iperf results to my NAS provide
>>>>>>> over 900 Mbits/s in both circumstances. The issue seems to appear when
>>>>>>> establishing a connection and is most notable, for example, on my
>>>>>>> mounted NFS shares where it takes seconds (up to 10's of seconds on
>>>>>>> larger directories) to list the contents of each directory. Once a
>>>>>>> transfer begins on a file, I appear to get good bandwidth.
>>>>>>>
>>>>>>> I'm unsure of the best scientific data to provide you in order to
>>>>>>> troubleshoot this issue. Running the following
>>>>>>>
>>>>>>>     netstat -s |grep retransmitted
>>>>>>>
>>>>>>> shows a steady increase in retransmitted segments each time I list the
>>>>>>> contents of a remote directory, for example, running 'ls' on a
>>>>>>> directory containing 345 media files did the following using kernel
>>>>>>> 4.19.18:
>>>>>>>
>>>>>>> increased retransmitted segments by 21 and the 'time' command showed
>>>>>>> the following:
>>>>>>>     real    0m19.867s
>>>>>>>     user    0m0.012s
>>>>>>>     sys    0m0.036s
>>>>>>>
>>>>>>> The same command shows no retransmitted segments running kernel
>>>>>>> 4.18.16 and 'time' showed:
>>>>>>>     real    0m0.300s
>>>>>>>     user    0m0.004s
>>>>>>>     sys    0m0.007s
>>>>>>>
>>>>>>> ifconfig does not show any RX/TX errors nor dropped packets in either case.
>>>>>>>
>>>>>>> dmesg XID:
>>>>>>> [    2.979984] r8169 0000:03:00.0 eth0: RTL8168g/8111g,
>>>>>>> f8:b1:56:fe:67:e0, XID 4c000800, IRQ 32
>>>>>>>
>>>>>>> # lspci -vv
>>>>>>> 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
>>>>>>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
>>>>>>>     Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
>>>>>>>     Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
>>>>>>> ParErr- Stepping- SERR- FastB2B- DisINTx+
>>>>>>>     Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
>>>>>>> <TAbort- <MAbort- >SERR- <PERR- INTx-
>>>>>>>     Latency: 0, Cache Line Size: 64 bytes
>>>>>>>     Interrupt: pin A routed to IRQ 19
>>>>>>>     Region 0: I/O ports at d000 [size=256]
>>>>>>>     Region 2: Memory at f7b00000 (64-bit, non-prefetchable) [size=4K]
>>>>>>>     Region 4: Memory at f2100000 (64-bit, prefetchable) [size=16K]
>>>>>>>     Capabilities: [40] Power Management version 3
>>>>>>>         Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
>>>>>>> PME(D0+,D1+,D2+,D3hot+,D3cold+)
>>>>>>>         Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
>>>>>>>     Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
>>>>>>>         Address: 0000000000000000  Data: 0000
>>>>>>>     Capabilities: [70] Express (v2) Endpoint, MSI 01
>>>>>>>         DevCap:    MaxPayload 128 bytes, PhantFunc 0, Latency L0s
>>>>>>> <512ns, L1 <64us
>>>>>>>             ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
>>>>>>> SlotPowerLimit 10.000W
>>>>>>>         DevCtl:    CorrErr- NonFatalErr- FatalErr- UnsupReq-
>>>>>>>             RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
>>>>>>>             MaxPayload 128 bytes, MaxReadReq 4096 bytes
>>>>>>>         DevSta:    CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
>>>>>>>         LnkCap:    Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
>>>>>>> Latency L0s unlimited, L1 <64us
>>>>>>>             ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
>>>>>>>         LnkCtl:    ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
>>>>>>>             ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
>>>>>>>         LnkSta:    Speed 2.5GT/s (ok), Width x1 (ok)
>>>>>>>             TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
>>>>>>>         DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+,
>>>>>>> OBFF Via message/WAKE#
>>>>>>>              AtomicOpsCap: 32bit- 64bit- 128bitCAS-
>>>>>>>         DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+,
>>>>>>> OBFF Disabled
>>>>>>>              AtomicOpsCtl: ReqEn-
>>>>>>>         LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
>>>>>>>              Transmit Margin: Normal Operating Range,
>>>>>>> EnterModifiedCompliance- ComplianceSOS-
>>>>>>>              Compliance De-emphasis: -6dB
>>>>>>>         LnkSta2: Current De-emphasis Level: -6dB,
>>>>>>> EqualizationComplete-, EqualizationPhase1-
>>>>>>>              EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
>>>>>>>     Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
>>>>>>>         Vector table: BAR=4 offset=00000000
>>>>>>>         PBA: BAR=4 offset=00000800
>>>>>>>     Capabilities: [d0] Vital Product Data
>>>>>>> pcilib: sysfs_read_vpd: read failed: Input/output error
>>>>>>>         Not readable
>>>>>>>     Capabilities: [100 v1] Advanced Error Reporting
>>>>>>>         UESta:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>>>>>>         UEMsk:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>>>>>>         UESvrt:    DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
>>>>>>> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>>>>>>>         CESta:    RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ AdvNonFatalErr-
>>>>>>>         CEMsk:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
>>>>>>>         AERCap:    First Error Pointer: 00, ECRCGenCap+ ECRCGenEn-
>>>>>>> ECRCChkCap+ ECRCChkEn-
>>>>>>>             MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
>>>>>>>         HeaderLog: 00000000 00000000 00000000 00000000
>>>>>>>     Capabilities: [140 v1] Virtual Channel
>>>>>>>         Caps:    LPEVC=0 RefClk=100ns PATEntryBits=1
>>>>>>>         Arb:    Fixed- WRR32- WRR64- WRR128-
>>>>>>>         Ctrl:    ArbSelect=Fixed
>>>>>>>         Status:    InProgress-
>>>>>>>         VC0:    Caps:    PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
>>>>>>>             Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
>>>>>>>             Ctrl:    Enable+ ID=0 ArbSelect=Fixed TC/VC=01
>>>>>>>             Status:    NegoPending- InProgress-
>>>>>>>     Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
>>>>>>>     Capabilities: [170 v1] Latency Tolerance Reporting
>>>>>>>         Max snoop latency: 71680ns
>>>>>>>         Max no snoop latency: 71680ns
>>>>>>>     Kernel driver in use: r8169
>>>>>>>     Kernel modules: r8169
>>>>>>>
>>>>>>> Please let me know if you have any other ideas in terms of testing.
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>> Peter.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, 29 Jan 2019 at 05:28, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>>>>>
>>>>>>>> On 28.01.2019 12:13, Peter Ceiley wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I have been experiencing very poor network performance since Kernel
>>>>>>>>> 4.19 and I'm confident it's related to the r8169 driver.
>>>>>>>>>
>>>>>>>>> I have no issue with kernel versions 4.18 and prior. I am experiencing
>>>>>>>>> this issue in kernels 4.19 and 4.20 (currently running/testing with
>>>>>>>>> 4.20.4 & 4.19.18).
>>>>>>>>>
>>>>>>>>> If someone could guide me in the right direction, I'm happy to help
>>>>>>>>> troubleshoot this issue. Note that I have been keeping an eye on one
>>>>>>>>> issue related to loading of the PHY driver, however, my symptoms
>>>>>>>>> differ in that I still have a network connection. I have attempted to
>>>>>>>>> reload the driver on a running system, but this does not improve the
>>>>>>>>> situation.
>>>>>>>>>
>>>>>>>>> Using the proprietary r8168 driver returns my device to proper working order.
>>>>>>>>>
>>>>>>>>> lshw shows:
>>>>>>>>>        description: Ethernet interface
>>>>>>>>>        product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
>>>>>>>>>        vendor: Realtek Semiconductor Co., Ltd.
>>>>>>>>>        physical id: 0
>>>>>>>>>        bus info: pci@0000:03:00.0
>>>>>>>>>        logical name: enp3s0
>>>>>>>>>        version: 0c
>>>>>>>>>        serial:
>>>>>>>>>        size: 1Gbit/s
>>>>>>>>>        capacity: 1Gbit/s
>>>>>>>>>        width: 64 bits
>>>>>>>>>        clock: 33MHz
>>>>>>>>>        capabilities: pm msi pciexpress msix vpd bus_master cap_list
>>>>>>>>> ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd
>>>>>>>>> 1000bt-fd autonegotiation
>>>>>>>>>        configuration: autonegotiation=on broadcast=yes driver=r8169
>>>>>>>>> duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25
>>>>>>>>> latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
>>>>>>>>>        resources: irq:19 ioport:d000(size=256)
>>>>>>>>> memory:f7b00000-f7b00fff memory:f2100000-f2103fff
>>>>>>>>>
>>>>>>>>> Kind Regards,
>>>>>>>>>
>>>>>>>>> Peter.
>>>>>>>>>
>>>>>>>> Hi Peter,
>>>>>>>>
>>>>>>>> the description "poor network performance" is quite vague, therefore:
>>>>>>>>
>>>>>>>> - Can you provide any measurements?
>>>>>>>> - iperf results before and after
>>>>>>>> - statistics about dropped packets (rx and/or tx)
>>>>>>>> - Do you use jumbo packets?
>>>>>>>>
>>>>>>>> Also help would be a "lspci -vv" output for the network card and
>>>>>>>> the dmesg output line with the chip XID.
>>>>>>>>
>>>>>>>> Heiner
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: r8169 Driver - Poor Network Performance Since Kernel 4.19
  2019-01-31  6:21                 ` Heiner Kallweit
@ 2019-01-31  6:35                   ` Heiner Kallweit
  2019-01-31  6:49                     ` Heiner Kallweit
                                       ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Heiner Kallweit @ 2019-01-31  6:35 UTC (permalink / raw)
  To: David Chang; +Cc: Peter Ceiley, Realtek linux nic maintainers, netdev

Hi David, two more things:

1. Could you please test a recent linux-next kernel?
2. Please get a register dump (ethtool -d <if>) from 4.18 and 4.19
   and compare them.

Heiner


On 31.01.2019 07:21, Heiner Kallweit wrote:
> David, thanks for the link to the bug ticket.
> I think only a proper bisect can help to find the offending commit.
> 
> Heiner
> 
> 
> On 31.01.2019 03:32, David Chang wrote:
>> Hi,
>>
>> We had a similr case here.
>> - Realtek r8169 receive performance regression in kernel 4.19
>>   https://bugzilla.suse.com/show_bug.cgi?id=1119649
>>
>> kernel: r8169 0000:01:00.0 eth0: RTL8168h/8111h, XID 54100880
>> The major symptom is there are many rx_missed count.
>>
>>
>> On Jan 30, 2019 at 20:15:45 +0100, Heiner Kallweit wrote:
>>> Hi Peter,
>>>
>>> recently I had somebody where pcie_aspm=off for whatever reason didn't
>>> do the trick, can you also check with pcie_aspm.policy=performance.
>>
>> We will give it a try later.
>>
>>> And please check with "ethtool -S <if>" whether the chip statistics
>>> show a significant number of errors.
>>>
>>> If this doesn't help you may have to bisect to find the offending commit.
>>
>> We had tried fallback driver to a few previous commits as following,
>> but with no luck.
>>
>> 9675931e6b65 r8169: re-enable MSI-X on RTL8168g (v4.19)
>> 098b01ad9837 r8169: don't include asm headers directly (v4.19-rc1)
>> a2965f12fde6 r8169: remove rtl8169_set_speed_xmii (v4.19-rc1)
>> 6fcf9b1d4d6c r8169: fix runtime suspend (v4.19-rc1)
>> e397286b8e89 r8169: remove TBI 1000BaseX support (v4.19-rc1)
>>
>> Thanks,
>> David Chang
>>
>>>
>>> Heiner
>>>
>>>
>>> On 30.01.2019 10:59, Peter Ceiley wrote:
>>>> Hi Heiner,
>>>>
>>>> I tried disabling the ASPM using the pcie_aspm=off kernel parameter
>>>> and this made no difference.
>>>>
>>>> I tried compiling the 4.18.16 r8169.c with the 4.19.18 source and
>>>> subsequently loaded the module in the running 4.19.18 kernel. I can
>>>> confirm that this immediately resolved the issue and access to the NFS
>>>> shares operated as expected.
>>>>
>>>> I presume this means it is an issue with the r8169 driver included in
>>>> 4.19 onwards?
>>>>
>>>> To answer your last questions:
>>>>
>>>> Base Board Information
>>>>     Manufacturer: Alienware
>>>>     Product Name: 0PGRP5
>>>>     Version: A02
>>>>
>>>> ... and yes, the RTL8168 is the onboard network chip.
>>>>
>>>> Regards,
>>>>
>>>> Peter.
>>>>
>>>> On Tue, 29 Jan 2019 at 17:44, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>>
>>>>> Hi Peter,
>>>>>
>>>>> I think the vendor driver doesn't enable ASPM per default.
>>>>> So it's worth a try to disable ASPM in the BIOS or via sysfs.
>>>>> Few older systems seem to have issues with ASPM, what kind of
>>>>> system / mainboard are you using? The RTL8168 is the onboard
>>>>> network chip?
>>>>>
>>>>> Rgds, Heiner
>>>>>
>>>>>
>>>>> On 29.01.2019 07:20, Peter Ceiley wrote:
>>>>>> Hi Heiner,
>>>>>>
>>>>>> Thanks, I'll do some more testing. It might not be the driver - I
>>>>>> assumed it was due to the fact that using the r8168 driver 'resolves'
>>>>>> the issue. I'll see if I can test the r8169.c on top of 4.19 - this is
>>>>>> a good idea.
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Peter.
>>>>>>
>>>>>> On Tue, 29 Jan 2019 at 17:16, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>>>>
>>>>>>> Hi Peter,
>>>>>>>
>>>>>>> at a first glance it doesn't look like a typical driver issue.
>>>>>>> What you could do:
>>>>>>>
>>>>>>> - Test the r8169.c from 4.18 on top of 4.19.
>>>>>>>
>>>>>>> - Check whether disabling ASPM (/sys/module/pcie_aspm) has an effect.
>>>>>>>
>>>>>>> - Bisect between 4.18 and 4.19 to find the offending commit.
>>>>>>>
>>>>>>> Any specific reason why you think root cause is in the driver and not
>>>>>>> elsewhere in the network subsystem?
>>>>>>>
>>>>>>> Heiner
>>>>>>>
>>>>>>>
>>>>>>> On 28.01.2019 23:10, Peter Ceiley wrote:
>>>>>>>> Hi Heiner,
>>>>>>>>
>>>>>>>> Thanks for getting back to me.
>>>>>>>>
>>>>>>>> No, I don't use jumbo packets.
>>>>>>>>
>>>>>>>> Bandwidth is *generally* good, and iperf results to my NAS provide
>>>>>>>> over 900 Mbits/s in both circumstances. The issue seems to appear when
>>>>>>>> establishing a connection and is most notable, for example, on my
>>>>>>>> mounted NFS shares where it takes seconds (up to 10's of seconds on
>>>>>>>> larger directories) to list the contents of each directory. Once a
>>>>>>>> transfer begins on a file, I appear to get good bandwidth.
>>>>>>>>
>>>>>>>> I'm unsure of the best scientific data to provide you in order to
>>>>>>>> troubleshoot this issue. Running the following
>>>>>>>>
>>>>>>>>     netstat -s |grep retransmitted
>>>>>>>>
>>>>>>>> shows a steady increase in retransmitted segments each time I list the
>>>>>>>> contents of a remote directory, for example, running 'ls' on a
>>>>>>>> directory containing 345 media files did the following using kernel
>>>>>>>> 4.19.18:
>>>>>>>>
>>>>>>>> increased retransmitted segments by 21 and the 'time' command showed
>>>>>>>> the following:
>>>>>>>>     real    0m19.867s
>>>>>>>>     user    0m0.012s
>>>>>>>>     sys    0m0.036s
>>>>>>>>
>>>>>>>> The same command shows no retransmitted segments running kernel
>>>>>>>> 4.18.16 and 'time' showed:
>>>>>>>>     real    0m0.300s
>>>>>>>>     user    0m0.004s
>>>>>>>>     sys    0m0.007s
>>>>>>>>
>>>>>>>> ifconfig does not show any RX/TX errors nor dropped packets in either case.
>>>>>>>>
>>>>>>>> dmesg XID:
>>>>>>>> [    2.979984] r8169 0000:03:00.0 eth0: RTL8168g/8111g,
>>>>>>>> f8:b1:56:fe:67:e0, XID 4c000800, IRQ 32
>>>>>>>>
>>>>>>>> # lspci -vv
>>>>>>>> 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
>>>>>>>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
>>>>>>>>     Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
>>>>>>>>     Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
>>>>>>>> ParErr- Stepping- SERR- FastB2B- DisINTx+
>>>>>>>>     Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
>>>>>>>> <TAbort- <MAbort- >SERR- <PERR- INTx-
>>>>>>>>     Latency: 0, Cache Line Size: 64 bytes
>>>>>>>>     Interrupt: pin A routed to IRQ 19
>>>>>>>>     Region 0: I/O ports at d000 [size=256]
>>>>>>>>     Region 2: Memory at f7b00000 (64-bit, non-prefetchable) [size=4K]
>>>>>>>>     Region 4: Memory at f2100000 (64-bit, prefetchable) [size=16K]
>>>>>>>>     Capabilities: [40] Power Management version 3
>>>>>>>>         Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
>>>>>>>> PME(D0+,D1+,D2+,D3hot+,D3cold+)
>>>>>>>>         Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
>>>>>>>>     Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
>>>>>>>>         Address: 0000000000000000  Data: 0000
>>>>>>>>     Capabilities: [70] Express (v2) Endpoint, MSI 01
>>>>>>>>         DevCap:    MaxPayload 128 bytes, PhantFunc 0, Latency L0s
>>>>>>>> <512ns, L1 <64us
>>>>>>>>             ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
>>>>>>>> SlotPowerLimit 10.000W
>>>>>>>>         DevCtl:    CorrErr- NonFatalErr- FatalErr- UnsupReq-
>>>>>>>>             RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
>>>>>>>>             MaxPayload 128 bytes, MaxReadReq 4096 bytes
>>>>>>>>         DevSta:    CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
>>>>>>>>         LnkCap:    Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
>>>>>>>> Latency L0s unlimited, L1 <64us
>>>>>>>>             ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
>>>>>>>>         LnkCtl:    ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
>>>>>>>>             ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
>>>>>>>>         LnkSta:    Speed 2.5GT/s (ok), Width x1 (ok)
>>>>>>>>             TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
>>>>>>>>         DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+,
>>>>>>>> OBFF Via message/WAKE#
>>>>>>>>              AtomicOpsCap: 32bit- 64bit- 128bitCAS-
>>>>>>>>         DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+,
>>>>>>>> OBFF Disabled
>>>>>>>>              AtomicOpsCtl: ReqEn-
>>>>>>>>         LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
>>>>>>>>              Transmit Margin: Normal Operating Range,
>>>>>>>> EnterModifiedCompliance- ComplianceSOS-
>>>>>>>>              Compliance De-emphasis: -6dB
>>>>>>>>         LnkSta2: Current De-emphasis Level: -6dB,
>>>>>>>> EqualizationComplete-, EqualizationPhase1-
>>>>>>>>              EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
>>>>>>>>     Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
>>>>>>>>         Vector table: BAR=4 offset=00000000
>>>>>>>>         PBA: BAR=4 offset=00000800
>>>>>>>>     Capabilities: [d0] Vital Product Data
>>>>>>>> pcilib: sysfs_read_vpd: read failed: Input/output error
>>>>>>>>         Not readable
>>>>>>>>     Capabilities: [100 v1] Advanced Error Reporting
>>>>>>>>         UESta:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
>>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>>>>>>>         UEMsk:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
>>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>>>>>>>         UESvrt:    DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
>>>>>>>> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>>>>>>>>         CESta:    RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ AdvNonFatalErr-
>>>>>>>>         CEMsk:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
>>>>>>>>         AERCap:    First Error Pointer: 00, ECRCGenCap+ ECRCGenEn-
>>>>>>>> ECRCChkCap+ ECRCChkEn-
>>>>>>>>             MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
>>>>>>>>         HeaderLog: 00000000 00000000 00000000 00000000
>>>>>>>>     Capabilities: [140 v1] Virtual Channel
>>>>>>>>         Caps:    LPEVC=0 RefClk=100ns PATEntryBits=1
>>>>>>>>         Arb:    Fixed- WRR32- WRR64- WRR128-
>>>>>>>>         Ctrl:    ArbSelect=Fixed
>>>>>>>>         Status:    InProgress-
>>>>>>>>         VC0:    Caps:    PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
>>>>>>>>             Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
>>>>>>>>             Ctrl:    Enable+ ID=0 ArbSelect=Fixed TC/VC=01
>>>>>>>>             Status:    NegoPending- InProgress-
>>>>>>>>     Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
>>>>>>>>     Capabilities: [170 v1] Latency Tolerance Reporting
>>>>>>>>         Max snoop latency: 71680ns
>>>>>>>>         Max no snoop latency: 71680ns
>>>>>>>>     Kernel driver in use: r8169
>>>>>>>>     Kernel modules: r8169
>>>>>>>>
>>>>>>>> Please let me know if you have any other ideas in terms of testing.
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>> Peter.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, 29 Jan 2019 at 05:28, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> On 28.01.2019 12:13, Peter Ceiley wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I have been experiencing very poor network performance since Kernel
>>>>>>>>>> 4.19 and I'm confident it's related to the r8169 driver.
>>>>>>>>>>
>>>>>>>>>> I have no issue with kernel versions 4.18 and prior. I am experiencing
>>>>>>>>>> this issue in kernels 4.19 and 4.20 (currently running/testing with
>>>>>>>>>> 4.20.4 & 4.19.18).
>>>>>>>>>>
>>>>>>>>>> If someone could guide me in the right direction, I'm happy to help
>>>>>>>>>> troubleshoot this issue. Note that I have been keeping an eye on one
>>>>>>>>>> issue related to loading of the PHY driver, however, my symptoms
>>>>>>>>>> differ in that I still have a network connection. I have attempted to
>>>>>>>>>> reload the driver on a running system, but this does not improve the
>>>>>>>>>> situation.
>>>>>>>>>>
>>>>>>>>>> Using the proprietary r8168 driver returns my device to proper working order.
>>>>>>>>>>
>>>>>>>>>> lshw shows:
>>>>>>>>>>        description: Ethernet interface
>>>>>>>>>>        product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
>>>>>>>>>>        vendor: Realtek Semiconductor Co., Ltd.
>>>>>>>>>>        physical id: 0
>>>>>>>>>>        bus info: pci@0000:03:00.0
>>>>>>>>>>        logical name: enp3s0
>>>>>>>>>>        version: 0c
>>>>>>>>>>        serial:
>>>>>>>>>>        size: 1Gbit/s
>>>>>>>>>>        capacity: 1Gbit/s
>>>>>>>>>>        width: 64 bits
>>>>>>>>>>        clock: 33MHz
>>>>>>>>>>        capabilities: pm msi pciexpress msix vpd bus_master cap_list
>>>>>>>>>> ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd
>>>>>>>>>> 1000bt-fd autonegotiation
>>>>>>>>>>        configuration: autonegotiation=on broadcast=yes driver=r8169
>>>>>>>>>> duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25
>>>>>>>>>> latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
>>>>>>>>>>        resources: irq:19 ioport:d000(size=256)
>>>>>>>>>> memory:f7b00000-f7b00fff memory:f2100000-f2103fff
>>>>>>>>>>
>>>>>>>>>> Kind Regards,
>>>>>>>>>>
>>>>>>>>>> Peter.
>>>>>>>>>>
>>>>>>>>> Hi Peter,
>>>>>>>>>
>>>>>>>>> the description "poor network performance" is quite vague, therefore:
>>>>>>>>>
>>>>>>>>> - Can you provide any measurements?
>>>>>>>>> - iperf results before and after
>>>>>>>>> - statistics about dropped packets (rx and/or tx)
>>>>>>>>> - Do you use jumbo packets?
>>>>>>>>>
>>>>>>>>> Also help would be a "lspci -vv" output for the network card and
>>>>>>>>> the dmesg output line with the chip XID.
>>>>>>>>>
>>>>>>>>> Heiner
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: r8169 Driver - Poor Network Performance Since Kernel 4.19
  2019-01-31  6:35                   ` Heiner Kallweit
@ 2019-01-31  6:49                     ` Heiner Kallweit
  2019-01-31  7:23                     ` David Chang
  2019-02-01  4:29                     ` David Chang
  2 siblings, 0 replies; 25+ messages in thread
From: Heiner Kallweit @ 2019-01-31  6:49 UTC (permalink / raw)
  To: David Chang; +Cc: Peter Ceiley, Realtek linux nic maintainers, netdev

And one more inquiry ..

So far I read about the issue only in combination with NFS.
Does the issue also occur with iperf or some other type of
high network load?

Heiner


On 31.01.2019 07:35, Heiner Kallweit wrote:
> Hi David, two more things:
> 
> 1. Could you please test a recent linux-next kernel?
> 2. Please get a register dump (ethtool -d <if>) from 4.18 and 4.19
>    and compare them.
> 
> Heiner
> 
> 
> On 31.01.2019 07:21, Heiner Kallweit wrote:
>> David, thanks for the link to the bug ticket.
>> I think only a proper bisect can help to find the offending commit.
>>
>> Heiner
>>
>>
>> On 31.01.2019 03:32, David Chang wrote:
>>> Hi,
>>>
>>> We had a similr case here.
>>> - Realtek r8169 receive performance regression in kernel 4.19
>>>   https://bugzilla.suse.com/show_bug.cgi?id=1119649
>>>
>>> kernel: r8169 0000:01:00.0 eth0: RTL8168h/8111h, XID 54100880
>>> The major symptom is there are many rx_missed count.
>>>
>>>
>>> On Jan 30, 2019 at 20:15:45 +0100, Heiner Kallweit wrote:
>>>> Hi Peter,
>>>>
>>>> recently I had somebody where pcie_aspm=off for whatever reason didn't
>>>> do the trick, can you also check with pcie_aspm.policy=performance.
>>>
>>> We will give it a try later.
>>>
>>>> And please check with "ethtool -S <if>" whether the chip statistics
>>>> show a significant number of errors.
>>>>
>>>> If this doesn't help you may have to bisect to find the offending commit.
>>>
>>> We had tried fallback driver to a few previous commits as following,
>>> but with no luck.
>>>
>>> 9675931e6b65 r8169: re-enable MSI-X on RTL8168g (v4.19)
>>> 098b01ad9837 r8169: don't include asm headers directly (v4.19-rc1)
>>> a2965f12fde6 r8169: remove rtl8169_set_speed_xmii (v4.19-rc1)
>>> 6fcf9b1d4d6c r8169: fix runtime suspend (v4.19-rc1)
>>> e397286b8e89 r8169: remove TBI 1000BaseX support (v4.19-rc1)
>>>
>>> Thanks,
>>> David Chang
>>>
>>>>
>>>> Heiner
>>>>
>>>>
>>>> On 30.01.2019 10:59, Peter Ceiley wrote:
>>>>> Hi Heiner,
>>>>>
>>>>> I tried disabling the ASPM using the pcie_aspm=off kernel parameter
>>>>> and this made no difference.
>>>>>
>>>>> I tried compiling the 4.18.16 r8169.c with the 4.19.18 source and
>>>>> subsequently loaded the module in the running 4.19.18 kernel. I can
>>>>> confirm that this immediately resolved the issue and access to the NFS
>>>>> shares operated as expected.
>>>>>
>>>>> I presume this means it is an issue with the r8169 driver included in
>>>>> 4.19 onwards?
>>>>>
>>>>> To answer your last questions:
>>>>>
>>>>> Base Board Information
>>>>>     Manufacturer: Alienware
>>>>>     Product Name: 0PGRP5
>>>>>     Version: A02
>>>>>
>>>>> ... and yes, the RTL8168 is the onboard network chip.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Peter.
>>>>>
>>>>> On Tue, 29 Jan 2019 at 17:44, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>>>
>>>>>> Hi Peter,
>>>>>>
>>>>>> I think the vendor driver doesn't enable ASPM per default.
>>>>>> So it's worth a try to disable ASPM in the BIOS or via sysfs.
>>>>>> Few older systems seem to have issues with ASPM, what kind of
>>>>>> system / mainboard are you using? The RTL8168 is the onboard
>>>>>> network chip?
>>>>>>
>>>>>> Rgds, Heiner
>>>>>>
>>>>>>
>>>>>> On 29.01.2019 07:20, Peter Ceiley wrote:
>>>>>>> Hi Heiner,
>>>>>>>
>>>>>>> Thanks, I'll do some more testing. It might not be the driver - I
>>>>>>> assumed it was due to the fact that using the r8168 driver 'resolves'
>>>>>>> the issue. I'll see if I can test the r8169.c on top of 4.19 - this is
>>>>>>> a good idea.
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> Peter.
>>>>>>>
>>>>>>> On Tue, 29 Jan 2019 at 17:16, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>>>>>
>>>>>>>> Hi Peter,
>>>>>>>>
>>>>>>>> at a first glance it doesn't look like a typical driver issue.
>>>>>>>> What you could do:
>>>>>>>>
>>>>>>>> - Test the r8169.c from 4.18 on top of 4.19.
>>>>>>>>
>>>>>>>> - Check whether disabling ASPM (/sys/module/pcie_aspm) has an effect.
>>>>>>>>
>>>>>>>> - Bisect between 4.18 and 4.19 to find the offending commit.
>>>>>>>>
>>>>>>>> Any specific reason why you think root cause is in the driver and not
>>>>>>>> elsewhere in the network subsystem?
>>>>>>>>
>>>>>>>> Heiner
>>>>>>>>
>>>>>>>>
>>>>>>>> On 28.01.2019 23:10, Peter Ceiley wrote:
>>>>>>>>> Hi Heiner,
>>>>>>>>>
>>>>>>>>> Thanks for getting back to me.
>>>>>>>>>
>>>>>>>>> No, I don't use jumbo packets.
>>>>>>>>>
>>>>>>>>> Bandwidth is *generally* good, and iperf results to my NAS provide
>>>>>>>>> over 900 Mbits/s in both circumstances. The issue seems to appear when
>>>>>>>>> establishing a connection and is most notable, for example, on my
>>>>>>>>> mounted NFS shares where it takes seconds (up to 10's of seconds on
>>>>>>>>> larger directories) to list the contents of each directory. Once a
>>>>>>>>> transfer begins on a file, I appear to get good bandwidth.
>>>>>>>>>
>>>>>>>>> I'm unsure of the best scientific data to provide you in order to
>>>>>>>>> troubleshoot this issue. Running the following
>>>>>>>>>
>>>>>>>>>     netstat -s |grep retransmitted
>>>>>>>>>
>>>>>>>>> shows a steady increase in retransmitted segments each time I list the
>>>>>>>>> contents of a remote directory, for example, running 'ls' on a
>>>>>>>>> directory containing 345 media files did the following using kernel
>>>>>>>>> 4.19.18:
>>>>>>>>>
>>>>>>>>> increased retransmitted segments by 21 and the 'time' command showed
>>>>>>>>> the following:
>>>>>>>>>     real    0m19.867s
>>>>>>>>>     user    0m0.012s
>>>>>>>>>     sys    0m0.036s
>>>>>>>>>
>>>>>>>>> The same command shows no retransmitted segments running kernel
>>>>>>>>> 4.18.16 and 'time' showed:
>>>>>>>>>     real    0m0.300s
>>>>>>>>>     user    0m0.004s
>>>>>>>>>     sys    0m0.007s
>>>>>>>>>
>>>>>>>>> ifconfig does not show any RX/TX errors nor dropped packets in either case.
>>>>>>>>>
>>>>>>>>> dmesg XID:
>>>>>>>>> [    2.979984] r8169 0000:03:00.0 eth0: RTL8168g/8111g,
>>>>>>>>> f8:b1:56:fe:67:e0, XID 4c000800, IRQ 32
>>>>>>>>>
>>>>>>>>> # lspci -vv
>>>>>>>>> 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
>>>>>>>>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
>>>>>>>>>     Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
>>>>>>>>>     Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
>>>>>>>>> ParErr- Stepping- SERR- FastB2B- DisINTx+
>>>>>>>>>     Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
>>>>>>>>> <TAbort- <MAbort- >SERR- <PERR- INTx-
>>>>>>>>>     Latency: 0, Cache Line Size: 64 bytes
>>>>>>>>>     Interrupt: pin A routed to IRQ 19
>>>>>>>>>     Region 0: I/O ports at d000 [size=256]
>>>>>>>>>     Region 2: Memory at f7b00000 (64-bit, non-prefetchable) [size=4K]
>>>>>>>>>     Region 4: Memory at f2100000 (64-bit, prefetchable) [size=16K]
>>>>>>>>>     Capabilities: [40] Power Management version 3
>>>>>>>>>         Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
>>>>>>>>> PME(D0+,D1+,D2+,D3hot+,D3cold+)
>>>>>>>>>         Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
>>>>>>>>>     Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
>>>>>>>>>         Address: 0000000000000000  Data: 0000
>>>>>>>>>     Capabilities: [70] Express (v2) Endpoint, MSI 01
>>>>>>>>>         DevCap:    MaxPayload 128 bytes, PhantFunc 0, Latency L0s
>>>>>>>>> <512ns, L1 <64us
>>>>>>>>>             ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
>>>>>>>>> SlotPowerLimit 10.000W
>>>>>>>>>         DevCtl:    CorrErr- NonFatalErr- FatalErr- UnsupReq-
>>>>>>>>>             RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
>>>>>>>>>             MaxPayload 128 bytes, MaxReadReq 4096 bytes
>>>>>>>>>         DevSta:    CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
>>>>>>>>>         LnkCap:    Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
>>>>>>>>> Latency L0s unlimited, L1 <64us
>>>>>>>>>             ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
>>>>>>>>>         LnkCtl:    ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
>>>>>>>>>             ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
>>>>>>>>>         LnkSta:    Speed 2.5GT/s (ok), Width x1 (ok)
>>>>>>>>>             TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
>>>>>>>>>         DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+,
>>>>>>>>> OBFF Via message/WAKE#
>>>>>>>>>              AtomicOpsCap: 32bit- 64bit- 128bitCAS-
>>>>>>>>>         DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+,
>>>>>>>>> OBFF Disabled
>>>>>>>>>              AtomicOpsCtl: ReqEn-
>>>>>>>>>         LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
>>>>>>>>>              Transmit Margin: Normal Operating Range,
>>>>>>>>> EnterModifiedCompliance- ComplianceSOS-
>>>>>>>>>              Compliance De-emphasis: -6dB
>>>>>>>>>         LnkSta2: Current De-emphasis Level: -6dB,
>>>>>>>>> EqualizationComplete-, EqualizationPhase1-
>>>>>>>>>              EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
>>>>>>>>>     Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
>>>>>>>>>         Vector table: BAR=4 offset=00000000
>>>>>>>>>         PBA: BAR=4 offset=00000800
>>>>>>>>>     Capabilities: [d0] Vital Product Data
>>>>>>>>> pcilib: sysfs_read_vpd: read failed: Input/output error
>>>>>>>>>         Not readable
>>>>>>>>>     Capabilities: [100 v1] Advanced Error Reporting
>>>>>>>>>         UESta:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
>>>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>>>>>>>>         UEMsk:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
>>>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>>>>>>>>         UESvrt:    DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
>>>>>>>>> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>>>>>>>>>         CESta:    RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ AdvNonFatalErr-
>>>>>>>>>         CEMsk:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
>>>>>>>>>         AERCap:    First Error Pointer: 00, ECRCGenCap+ ECRCGenEn-
>>>>>>>>> ECRCChkCap+ ECRCChkEn-
>>>>>>>>>             MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
>>>>>>>>>         HeaderLog: 00000000 00000000 00000000 00000000
>>>>>>>>>     Capabilities: [140 v1] Virtual Channel
>>>>>>>>>         Caps:    LPEVC=0 RefClk=100ns PATEntryBits=1
>>>>>>>>>         Arb:    Fixed- WRR32- WRR64- WRR128-
>>>>>>>>>         Ctrl:    ArbSelect=Fixed
>>>>>>>>>         Status:    InProgress-
>>>>>>>>>         VC0:    Caps:    PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
>>>>>>>>>             Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
>>>>>>>>>             Ctrl:    Enable+ ID=0 ArbSelect=Fixed TC/VC=01
>>>>>>>>>             Status:    NegoPending- InProgress-
>>>>>>>>>     Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
>>>>>>>>>     Capabilities: [170 v1] Latency Tolerance Reporting
>>>>>>>>>         Max snoop latency: 71680ns
>>>>>>>>>         Max no snoop latency: 71680ns
>>>>>>>>>     Kernel driver in use: r8169
>>>>>>>>>     Kernel modules: r8169
>>>>>>>>>
>>>>>>>>> Please let me know if you have any other ideas in terms of testing.
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>>
>>>>>>>>> Peter.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, 29 Jan 2019 at 05:28, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>> On 28.01.2019 12:13, Peter Ceiley wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I have been experiencing very poor network performance since Kernel
>>>>>>>>>>> 4.19 and I'm confident it's related to the r8169 driver.
>>>>>>>>>>>
>>>>>>>>>>> I have no issue with kernel versions 4.18 and prior. I am experiencing
>>>>>>>>>>> this issue in kernels 4.19 and 4.20 (currently running/testing with
>>>>>>>>>>> 4.20.4 & 4.19.18).
>>>>>>>>>>>
>>>>>>>>>>> If someone could guide me in the right direction, I'm happy to help
>>>>>>>>>>> troubleshoot this issue. Note that I have been keeping an eye on one
>>>>>>>>>>> issue related to loading of the PHY driver, however, my symptoms
>>>>>>>>>>> differ in that I still have a network connection. I have attempted to
>>>>>>>>>>> reload the driver on a running system, but this does not improve the
>>>>>>>>>>> situation.
>>>>>>>>>>>
>>>>>>>>>>> Using the proprietary r8168 driver returns my device to proper working order.
>>>>>>>>>>>
>>>>>>>>>>> lshw shows:
>>>>>>>>>>>        description: Ethernet interface
>>>>>>>>>>>        product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
>>>>>>>>>>>        vendor: Realtek Semiconductor Co., Ltd.
>>>>>>>>>>>        physical id: 0
>>>>>>>>>>>        bus info: pci@0000:03:00.0
>>>>>>>>>>>        logical name: enp3s0
>>>>>>>>>>>        version: 0c
>>>>>>>>>>>        serial:
>>>>>>>>>>>        size: 1Gbit/s
>>>>>>>>>>>        capacity: 1Gbit/s
>>>>>>>>>>>        width: 64 bits
>>>>>>>>>>>        clock: 33MHz
>>>>>>>>>>>        capabilities: pm msi pciexpress msix vpd bus_master cap_list
>>>>>>>>>>> ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd
>>>>>>>>>>> 1000bt-fd autonegotiation
>>>>>>>>>>>        configuration: autonegotiation=on broadcast=yes driver=r8169
>>>>>>>>>>> duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25
>>>>>>>>>>> latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
>>>>>>>>>>>        resources: irq:19 ioport:d000(size=256)
>>>>>>>>>>> memory:f7b00000-f7b00fff memory:f2100000-f2103fff
>>>>>>>>>>>
>>>>>>>>>>> Kind Regards,
>>>>>>>>>>>
>>>>>>>>>>> Peter.
>>>>>>>>>>>
>>>>>>>>>> Hi Peter,
>>>>>>>>>>
>>>>>>>>>> the description "poor network performance" is quite vague, therefore:
>>>>>>>>>>
>>>>>>>>>> - Can you provide any measurements?
>>>>>>>>>> - iperf results before and after
>>>>>>>>>> - statistics about dropped packets (rx and/or tx)
>>>>>>>>>> - Do you use jumbo packets?
>>>>>>>>>>
>>>>>>>>>> Also help would be a "lspci -vv" output for the network card and
>>>>>>>>>> the dmesg output line with the chip XID.
>>>>>>>>>>
>>>>>>>>>> Heiner
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: r8169 Driver - Poor Network Performance Since Kernel 4.19
  2019-01-31  6:35                   ` Heiner Kallweit
  2019-01-31  6:49                     ` Heiner Kallweit
@ 2019-01-31  7:23                     ` David Chang
  2019-01-31 12:09                       ` Peter Ceiley
  2019-02-01  4:29                     ` David Chang
  2 siblings, 1 reply; 25+ messages in thread
From: David Chang @ 2019-01-31  7:23 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Peter Ceiley, Realtek linux nic maintainers, netdev

Hi Heiner,

On Jan 31, 2019 at 07:35:30 +0100, Heiner Kallweit wrote:
> Hi David, two more things:
> 
> 1. Could you please test a recent linux-next kernel?
> 2. Please get a register dump (ethtool -d <if>) from 4.18 and 4.19
>    and compare them.

I'm sorry that I do not have the issue machine handy. I would ask
our user to do the test. Thanks!

Regards,
David

> 
> Heiner
> 
> 
> On 31.01.2019 07:21, Heiner Kallweit wrote:
> > David, thanks for the link to the bug ticket.
> > I think only a proper bisect can help to find the offending commit.
> > 
> > Heiner
> > 
> > 
> > On 31.01.2019 03:32, David Chang wrote:
> >> Hi,
> >>
> >> We had a similr case here.
> >> - Realtek r8169 receive performance regression in kernel 4.19
> >>   https://bugzilla.suse.com/show_bug.cgi?id=1119649
> >>
> >> kernel: r8169 0000:01:00.0 eth0: RTL8168h/8111h, XID 54100880
> >> The major symptom is there are many rx_missed count.
> >>
> >>
> >> On Jan 30, 2019 at 20:15:45 +0100, Heiner Kallweit wrote:
> >>> Hi Peter,
> >>>
> >>> recently I had somebody where pcie_aspm=off for whatever reason didn't
> >>> do the trick, can you also check with pcie_aspm.policy=performance.
> >>
> >> We will give it a try later.
> >>
> >>> And please check with "ethtool -S <if>" whether the chip statistics
> >>> show a significant number of errors.
> >>>
> >>> If this doesn't help you may have to bisect to find the offending commit.
> >>
> >> We had tried fallback driver to a few previous commits as following,
> >> but with no luck.
> >>
> >> 9675931e6b65 r8169: re-enable MSI-X on RTL8168g (v4.19)
> >> 098b01ad9837 r8169: don't include asm headers directly (v4.19-rc1)
> >> a2965f12fde6 r8169: remove rtl8169_set_speed_xmii (v4.19-rc1)
> >> 6fcf9b1d4d6c r8169: fix runtime suspend (v4.19-rc1)
> >> e397286b8e89 r8169: remove TBI 1000BaseX support (v4.19-rc1)
> >>
> >> Thanks,
> >> David Chang
> >>
> >>>
> >>> Heiner
> >>>
> >>>
> >>> On 30.01.2019 10:59, Peter Ceiley wrote:
> >>>> Hi Heiner,
> >>>>
> >>>> I tried disabling the ASPM using the pcie_aspm=off kernel parameter
> >>>> and this made no difference.
> >>>>
> >>>> I tried compiling the 4.18.16 r8169.c with the 4.19.18 source and
> >>>> subsequently loaded the module in the running 4.19.18 kernel. I can
> >>>> confirm that this immediately resolved the issue and access to the NFS
> >>>> shares operated as expected.
> >>>>
> >>>> I presume this means it is an issue with the r8169 driver included in
> >>>> 4.19 onwards?
> >>>>
> >>>> To answer your last questions:
> >>>>
> >>>> Base Board Information
> >>>>     Manufacturer: Alienware
> >>>>     Product Name: 0PGRP5
> >>>>     Version: A02
> >>>>
> >>>> ... and yes, the RTL8168 is the onboard network chip.
> >>>>
> >>>> Regards,
> >>>>
> >>>> Peter.
> >>>>
> >>>> On Tue, 29 Jan 2019 at 17:44, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>>>>
> >>>>> Hi Peter,
> >>>>>
> >>>>> I think the vendor driver doesn't enable ASPM per default.
> >>>>> So it's worth a try to disable ASPM in the BIOS or via sysfs.
> >>>>> Few older systems seem to have issues with ASPM, what kind of
> >>>>> system / mainboard are you using? The RTL8168 is the onboard
> >>>>> network chip?
> >>>>>
> >>>>> Rgds, Heiner
> >>>>>
> >>>>>
> >>>>> On 29.01.2019 07:20, Peter Ceiley wrote:
> >>>>>> Hi Heiner,
> >>>>>>
> >>>>>> Thanks, I'll do some more testing. It might not be the driver - I
> >>>>>> assumed it was due to the fact that using the r8168 driver 'resolves'
> >>>>>> the issue. I'll see if I can test the r8169.c on top of 4.19 - this is
> >>>>>> a good idea.
> >>>>>>
> >>>>>> Cheers,
> >>>>>>
> >>>>>> Peter.
> >>>>>>
> >>>>>> On Tue, 29 Jan 2019 at 17:16, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>>>>>>
> >>>>>>> Hi Peter,
> >>>>>>>
> >>>>>>> at a first glance it doesn't look like a typical driver issue.
> >>>>>>> What you could do:
> >>>>>>>
> >>>>>>> - Test the r8169.c from 4.18 on top of 4.19.
> >>>>>>>
> >>>>>>> - Check whether disabling ASPM (/sys/module/pcie_aspm) has an effect.
> >>>>>>>
> >>>>>>> - Bisect between 4.18 and 4.19 to find the offending commit.
> >>>>>>>
> >>>>>>> Any specific reason why you think root cause is in the driver and not
> >>>>>>> elsewhere in the network subsystem?
> >>>>>>>
> >>>>>>> Heiner
> >>>>>>>
> >>>>>>>
> >>>>>>> On 28.01.2019 23:10, Peter Ceiley wrote:
> >>>>>>>> Hi Heiner,
> >>>>>>>>
> >>>>>>>> Thanks for getting back to me.
> >>>>>>>>
> >>>>>>>> No, I don't use jumbo packets.
> >>>>>>>>
> >>>>>>>> Bandwidth is *generally* good, and iperf results to my NAS provide
> >>>>>>>> over 900 Mbits/s in both circumstances. The issue seems to appear when
> >>>>>>>> establishing a connection and is most notable, for example, on my
> >>>>>>>> mounted NFS shares where it takes seconds (up to 10's of seconds on
> >>>>>>>> larger directories) to list the contents of each directory. Once a
> >>>>>>>> transfer begins on a file, I appear to get good bandwidth.
> >>>>>>>>
> >>>>>>>> I'm unsure of the best scientific data to provide you in order to
> >>>>>>>> troubleshoot this issue. Running the following
> >>>>>>>>
> >>>>>>>>     netstat -s |grep retransmitted
> >>>>>>>>
> >>>>>>>> shows a steady increase in retransmitted segments each time I list the
> >>>>>>>> contents of a remote directory, for example, running 'ls' on a
> >>>>>>>> directory containing 345 media files did the following using kernel
> >>>>>>>> 4.19.18:
> >>>>>>>>
> >>>>>>>> increased retransmitted segments by 21 and the 'time' command showed
> >>>>>>>> the following:
> >>>>>>>>     real    0m19.867s
> >>>>>>>>     user    0m0.012s
> >>>>>>>>     sys    0m0.036s
> >>>>>>>>
> >>>>>>>> The same command shows no retransmitted segments running kernel
> >>>>>>>> 4.18.16 and 'time' showed:
> >>>>>>>>     real    0m0.300s
> >>>>>>>>     user    0m0.004s
> >>>>>>>>     sys    0m0.007s
> >>>>>>>>
> >>>>>>>> ifconfig does not show any RX/TX errors nor dropped packets in either case.
> >>>>>>>>
> >>>>>>>> dmesg XID:
> >>>>>>>> [    2.979984] r8169 0000:03:00.0 eth0: RTL8168g/8111g,
> >>>>>>>> f8:b1:56:fe:67:e0, XID 4c000800, IRQ 32
> >>>>>>>>
> >>>>>>>> # lspci -vv
> >>>>>>>> 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
> >>>>>>>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
> >>>>>>>>     Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
> >>>>>>>>     Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> >>>>>>>> ParErr- Stepping- SERR- FastB2B- DisINTx+
> >>>>>>>>     Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> >>>>>>>> <TAbort- <MAbort- >SERR- <PERR- INTx-
> >>>>>>>>     Latency: 0, Cache Line Size: 64 bytes
> >>>>>>>>     Interrupt: pin A routed to IRQ 19
> >>>>>>>>     Region 0: I/O ports at d000 [size=256]
> >>>>>>>>     Region 2: Memory at f7b00000 (64-bit, non-prefetchable) [size=4K]
> >>>>>>>>     Region 4: Memory at f2100000 (64-bit, prefetchable) [size=16K]
> >>>>>>>>     Capabilities: [40] Power Management version 3
> >>>>>>>>         Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
> >>>>>>>> PME(D0+,D1+,D2+,D3hot+,D3cold+)
> >>>>>>>>         Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> >>>>>>>>     Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
> >>>>>>>>         Address: 0000000000000000  Data: 0000
> >>>>>>>>     Capabilities: [70] Express (v2) Endpoint, MSI 01
> >>>>>>>>         DevCap:    MaxPayload 128 bytes, PhantFunc 0, Latency L0s
> >>>>>>>> <512ns, L1 <64us
> >>>>>>>>             ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
> >>>>>>>> SlotPowerLimit 10.000W
> >>>>>>>>         DevCtl:    CorrErr- NonFatalErr- FatalErr- UnsupReq-
> >>>>>>>>             RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
> >>>>>>>>             MaxPayload 128 bytes, MaxReadReq 4096 bytes
> >>>>>>>>         DevSta:    CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
> >>>>>>>>         LnkCap:    Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
> >>>>>>>> Latency L0s unlimited, L1 <64us
> >>>>>>>>             ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
> >>>>>>>>         LnkCtl:    ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
> >>>>>>>>             ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
> >>>>>>>>         LnkSta:    Speed 2.5GT/s (ok), Width x1 (ok)
> >>>>>>>>             TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> >>>>>>>>         DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+,
> >>>>>>>> OBFF Via message/WAKE#
> >>>>>>>>              AtomicOpsCap: 32bit- 64bit- 128bitCAS-
> >>>>>>>>         DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+,
> >>>>>>>> OBFF Disabled
> >>>>>>>>              AtomicOpsCtl: ReqEn-
> >>>>>>>>         LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
> >>>>>>>>              Transmit Margin: Normal Operating Range,
> >>>>>>>> EnterModifiedCompliance- ComplianceSOS-
> >>>>>>>>              Compliance De-emphasis: -6dB
> >>>>>>>>         LnkSta2: Current De-emphasis Level: -6dB,
> >>>>>>>> EqualizationComplete-, EqualizationPhase1-
> >>>>>>>>              EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
> >>>>>>>>     Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
> >>>>>>>>         Vector table: BAR=4 offset=00000000
> >>>>>>>>         PBA: BAR=4 offset=00000800
> >>>>>>>>     Capabilities: [d0] Vital Product Data
> >>>>>>>> pcilib: sysfs_read_vpd: read failed: Input/output error
> >>>>>>>>         Not readable
> >>>>>>>>     Capabilities: [100 v1] Advanced Error Reporting
> >>>>>>>>         UESta:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> >>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> >>>>>>>>         UEMsk:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> >>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> >>>>>>>>         UESvrt:    DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
> >>>>>>>> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> >>>>>>>>         CESta:    RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ AdvNonFatalErr-
> >>>>>>>>         CEMsk:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
> >>>>>>>>         AERCap:    First Error Pointer: 00, ECRCGenCap+ ECRCGenEn-
> >>>>>>>> ECRCChkCap+ ECRCChkEn-
> >>>>>>>>             MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
> >>>>>>>>         HeaderLog: 00000000 00000000 00000000 00000000
> >>>>>>>>     Capabilities: [140 v1] Virtual Channel
> >>>>>>>>         Caps:    LPEVC=0 RefClk=100ns PATEntryBits=1
> >>>>>>>>         Arb:    Fixed- WRR32- WRR64- WRR128-
> >>>>>>>>         Ctrl:    ArbSelect=Fixed
> >>>>>>>>         Status:    InProgress-
> >>>>>>>>         VC0:    Caps:    PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
> >>>>>>>>             Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
> >>>>>>>>             Ctrl:    Enable+ ID=0 ArbSelect=Fixed TC/VC=01
> >>>>>>>>             Status:    NegoPending- InProgress-
> >>>>>>>>     Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
> >>>>>>>>     Capabilities: [170 v1] Latency Tolerance Reporting
> >>>>>>>>         Max snoop latency: 71680ns
> >>>>>>>>         Max no snoop latency: 71680ns
> >>>>>>>>     Kernel driver in use: r8169
> >>>>>>>>     Kernel modules: r8169
> >>>>>>>>
> >>>>>>>> Please let me know if you have any other ideas in terms of testing.
> >>>>>>>>
> >>>>>>>> Thanks!
> >>>>>>>>
> >>>>>>>> Peter.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Tue, 29 Jan 2019 at 05:28, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>>>>>>>>
> >>>>>>>>> On 28.01.2019 12:13, Peter Ceiley wrote:
> >>>>>>>>>> Hi,
> >>>>>>>>>>
> >>>>>>>>>> I have been experiencing very poor network performance since Kernel
> >>>>>>>>>> 4.19 and I'm confident it's related to the r8169 driver.
> >>>>>>>>>>
> >>>>>>>>>> I have no issue with kernel versions 4.18 and prior. I am experiencing
> >>>>>>>>>> this issue in kernels 4.19 and 4.20 (currently running/testing with
> >>>>>>>>>> 4.20.4 & 4.19.18).
> >>>>>>>>>>
> >>>>>>>>>> If someone could guide me in the right direction, I'm happy to help
> >>>>>>>>>> troubleshoot this issue. Note that I have been keeping an eye on one
> >>>>>>>>>> issue related to loading of the PHY driver, however, my symptoms
> >>>>>>>>>> differ in that I still have a network connection. I have attempted to
> >>>>>>>>>> reload the driver on a running system, but this does not improve the
> >>>>>>>>>> situation.
> >>>>>>>>>>
> >>>>>>>>>> Using the proprietary r8168 driver returns my device to proper working order.
> >>>>>>>>>>
> >>>>>>>>>> lshw shows:
> >>>>>>>>>>        description: Ethernet interface
> >>>>>>>>>>        product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
> >>>>>>>>>>        vendor: Realtek Semiconductor Co., Ltd.
> >>>>>>>>>>        physical id: 0
> >>>>>>>>>>        bus info: pci@0000:03:00.0
> >>>>>>>>>>        logical name: enp3s0
> >>>>>>>>>>        version: 0c
> >>>>>>>>>>        serial:
> >>>>>>>>>>        size: 1Gbit/s
> >>>>>>>>>>        capacity: 1Gbit/s
> >>>>>>>>>>        width: 64 bits
> >>>>>>>>>>        clock: 33MHz
> >>>>>>>>>>        capabilities: pm msi pciexpress msix vpd bus_master cap_list
> >>>>>>>>>> ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd
> >>>>>>>>>> 1000bt-fd autonegotiation
> >>>>>>>>>>        configuration: autonegotiation=on broadcast=yes driver=r8169
> >>>>>>>>>> duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25
> >>>>>>>>>> latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
> >>>>>>>>>>        resources: irq:19 ioport:d000(size=256)
> >>>>>>>>>> memory:f7b00000-f7b00fff memory:f2100000-f2103fff
> >>>>>>>>>>
> >>>>>>>>>> Kind Regards,
> >>>>>>>>>>
> >>>>>>>>>> Peter.
> >>>>>>>>>>
> >>>>>>>>> Hi Peter,
> >>>>>>>>>
> >>>>>>>>> the description "poor network performance" is quite vague, therefore:
> >>>>>>>>>
> >>>>>>>>> - Can you provide any measurements?
> >>>>>>>>> - iperf results before and after
> >>>>>>>>> - statistics about dropped packets (rx and/or tx)
> >>>>>>>>> - Do you use jumbo packets?
> >>>>>>>>>
> >>>>>>>>> Also help would be a "lspci -vv" output for the network card and
> >>>>>>>>> the dmesg output line with the chip XID.
> >>>>>>>>>
> >>>>>>>>> Heiner
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> > 
> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: r8169 Driver - Poor Network Performance Since Kernel 4.19
  2019-01-31  7:23                     ` David Chang
@ 2019-01-31 12:09                       ` Peter Ceiley
  2019-01-31 18:28                         ` Heiner Kallweit
  0 siblings, 1 reply; 25+ messages in thread
From: Peter Ceiley @ 2019-01-31 12:09 UTC (permalink / raw)
  To: David Chang; +Cc: Heiner Kallweit, Realtek linux nic maintainers, netdev

Hi Heiner,

A quick update on my testing with different pcie_aspm settings:

pcie_aspm=off | no change
pcie_aspm.policy=default | no change
pcie_aspm.policy=performance | issue resolved
pcie_aspm.policy=powersave | issue resolved
pcie_aspm.policy=powersupersave | issue resolved

It seems the new driver does not play nicely with the default ASPM policy.

As requested, I've included an output of ethtool below when experiencing
the issue - note that no errors are recorded.

# ethtool -S enp3s0
NIC statistics:
     tx_packets: 2749
     rx_packets: 4089
     tx_errors: 0
     rx_errors: 0
     rx_missed: 0
     align_errors: 0
     tx_single_collisions: 0
     tx_multi_collisions: 0
     unicast: 4078
     broadcast: 9
     multicast: 2
     tx_aborted: 0
     tx_underrun: 0

David, I hope this helps for your user as well. I appreciate you sharing
the bug ticket - thanks.

Heiner, thanks very much for your help to date.

Regards,

Peter.

On Thu, 31 Jan 2019 at 18:23, David Chang <dchang@suse.com> wrote:
>
> Hi Heiner,
>
> On Jan 31, 2019 at 07:35:30 +0100, Heiner Kallweit wrote:
> > Hi David, two more things:
> >
> > 1. Could you please test a recent linux-next kernel?
> > 2. Please get a register dump (ethtool -d <if>) from 4.18 and 4.19
> >    and compare them.
>
> I'm sorry that I do not have the issue machine handy. I would ask
> our user to do the test. Thanks!
>
> Regards,
> David
>
> >
> > Heiner
> >
> >
> > On 31.01.2019 07:21, Heiner Kallweit wrote:
> > > David, thanks for the link to the bug ticket.
> > > I think only a proper bisect can help to find the offending commit.
> > >
> > > Heiner
> > >
> > >
> > > On 31.01.2019 03:32, David Chang wrote:
> > >> Hi,
> > >>
> > >> We had a similr case here.
> > >> - Realtek r8169 receive performance regression in kernel 4.19
> > >>   https://bugzilla.suse.com/show_bug.cgi?id=1119649
> > >>
> > >> kernel: r8169 0000:01:00.0 eth0: RTL8168h/8111h, XID 54100880
> > >> The major symptom is there are many rx_missed count.
> > >>
> > >>
> > >> On Jan 30, 2019 at 20:15:45 +0100, Heiner Kallweit wrote:
> > >>> Hi Peter,
> > >>>
> > >>> recently I had somebody where pcie_aspm=off for whatever reason didn't
> > >>> do the trick, can you also check with pcie_aspm.policy=performance.
> > >>
> > >> We will give it a try later.
> > >>
> > >>> And please check with "ethtool -S <if>" whether the chip statistics
> > >>> show a significant number of errors.
> > >>>
> > >>> If this doesn't help you may have to bisect to find the offending commit.
> > >>
> > >> We had tried fallback driver to a few previous commits as following,
> > >> but with no luck.
> > >>
> > >> 9675931e6b65 r8169: re-enable MSI-X on RTL8168g (v4.19)
> > >> 098b01ad9837 r8169: don't include asm headers directly (v4.19-rc1)
> > >> a2965f12fde6 r8169: remove rtl8169_set_speed_xmii (v4.19-rc1)
> > >> 6fcf9b1d4d6c r8169: fix runtime suspend (v4.19-rc1)
> > >> e397286b8e89 r8169: remove TBI 1000BaseX support (v4.19-rc1)
> > >>
> > >> Thanks,
> > >> David Chang
> > >>
> > >>>
> > >>> Heiner
> > >>>
> > >>>
> > >>> On 30.01.2019 10:59, Peter Ceiley wrote:
> > >>>> Hi Heiner,
> > >>>>
> > >>>> I tried disabling the ASPM using the pcie_aspm=off kernel parameter
> > >>>> and this made no difference.
> > >>>>
> > >>>> I tried compiling the 4.18.16 r8169.c with the 4.19.18 source and
> > >>>> subsequently loaded the module in the running 4.19.18 kernel. I can
> > >>>> confirm that this immediately resolved the issue and access to the NFS
> > >>>> shares operated as expected.
> > >>>>
> > >>>> I presume this means it is an issue with the r8169 driver included in
> > >>>> 4.19 onwards?
> > >>>>
> > >>>> To answer your last questions:
> > >>>>
> > >>>> Base Board Information
> > >>>>     Manufacturer: Alienware
> > >>>>     Product Name: 0PGRP5
> > >>>>     Version: A02
> > >>>>
> > >>>> ... and yes, the RTL8168 is the onboard network chip.
> > >>>>
> > >>>> Regards,
> > >>>>
> > >>>> Peter.
> > >>>>
> > >>>> On Tue, 29 Jan 2019 at 17:44, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> > >>>>>
> > >>>>> Hi Peter,
> > >>>>>
> > >>>>> I think the vendor driver doesn't enable ASPM per default.
> > >>>>> So it's worth a try to disable ASPM in the BIOS or via sysfs.
> > >>>>> Few older systems seem to have issues with ASPM, what kind of
> > >>>>> system / mainboard are you using? The RTL8168 is the onboard
> > >>>>> network chip?
> > >>>>>
> > >>>>> Rgds, Heiner
> > >>>>>
> > >>>>>
> > >>>>> On 29.01.2019 07:20, Peter Ceiley wrote:
> > >>>>>> Hi Heiner,
> > >>>>>>
> > >>>>>> Thanks, I'll do some more testing. It might not be the driver - I
> > >>>>>> assumed it was due to the fact that using the r8168 driver 'resolves'
> > >>>>>> the issue. I'll see if I can test the r8169.c on top of 4.19 - this is
> > >>>>>> a good idea.
> > >>>>>>
> > >>>>>> Cheers,
> > >>>>>>
> > >>>>>> Peter.
> > >>>>>>
> > >>>>>> On Tue, 29 Jan 2019 at 17:16, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> > >>>>>>>
> > >>>>>>> Hi Peter,
> > >>>>>>>
> > >>>>>>> at a first glance it doesn't look like a typical driver issue.
> > >>>>>>> What you could do:
> > >>>>>>>
> > >>>>>>> - Test the r8169.c from 4.18 on top of 4.19.
> > >>>>>>>
> > >>>>>>> - Check whether disabling ASPM (/sys/module/pcie_aspm) has an effect.
> > >>>>>>>
> > >>>>>>> - Bisect between 4.18 and 4.19 to find the offending commit.
> > >>>>>>>
> > >>>>>>> Any specific reason why you think root cause is in the driver and not
> > >>>>>>> elsewhere in the network subsystem?
> > >>>>>>>
> > >>>>>>> Heiner
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On 28.01.2019 23:10, Peter Ceiley wrote:
> > >>>>>>>> Hi Heiner,
> > >>>>>>>>
> > >>>>>>>> Thanks for getting back to me.
> > >>>>>>>>
> > >>>>>>>> No, I don't use jumbo packets.
> > >>>>>>>>
> > >>>>>>>> Bandwidth is *generally* good, and iperf results to my NAS provide
> > >>>>>>>> over 900 Mbits/s in both circumstances. The issue seems to appear when
> > >>>>>>>> establishing a connection and is most notable, for example, on my
> > >>>>>>>> mounted NFS shares where it takes seconds (up to 10's of seconds on
> > >>>>>>>> larger directories) to list the contents of each directory. Once a
> > >>>>>>>> transfer begins on a file, I appear to get good bandwidth.
> > >>>>>>>>
> > >>>>>>>> I'm unsure of the best scientific data to provide you in order to
> > >>>>>>>> troubleshoot this issue. Running the following
> > >>>>>>>>
> > >>>>>>>>     netstat -s |grep retransmitted
> > >>>>>>>>
> > >>>>>>>> shows a steady increase in retransmitted segments each time I list the
> > >>>>>>>> contents of a remote directory, for example, running 'ls' on a
> > >>>>>>>> directory containing 345 media files did the following using kernel
> > >>>>>>>> 4.19.18:
> > >>>>>>>>
> > >>>>>>>> increased retransmitted segments by 21 and the 'time' command showed
> > >>>>>>>> the following:
> > >>>>>>>>     real    0m19.867s
> > >>>>>>>>     user    0m0.012s
> > >>>>>>>>     sys    0m0.036s
> > >>>>>>>>
> > >>>>>>>> The same command shows no retransmitted segments running kernel
> > >>>>>>>> 4.18.16 and 'time' showed:
> > >>>>>>>>     real    0m0.300s
> > >>>>>>>>     user    0m0.004s
> > >>>>>>>>     sys    0m0.007s
> > >>>>>>>>
> > >>>>>>>> ifconfig does not show any RX/TX errors nor dropped packets in either case.
> > >>>>>>>>
> > >>>>>>>> dmesg XID:
> > >>>>>>>> [    2.979984] r8169 0000:03:00.0 eth0: RTL8168g/8111g,
> > >>>>>>>> f8:b1:56:fe:67:e0, XID 4c000800, IRQ 32
> > >>>>>>>>
> > >>>>>>>> # lspci -vv
> > >>>>>>>> 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
> > >>>>>>>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
> > >>>>>>>>     Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
> > >>>>>>>>     Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> > >>>>>>>> ParErr- Stepping- SERR- FastB2B- DisINTx+
> > >>>>>>>>     Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> > >>>>>>>> <TAbort- <MAbort- >SERR- <PERR- INTx-
> > >>>>>>>>     Latency: 0, Cache Line Size: 64 bytes
> > >>>>>>>>     Interrupt: pin A routed to IRQ 19
> > >>>>>>>>     Region 0: I/O ports at d000 [size=256]
> > >>>>>>>>     Region 2: Memory at f7b00000 (64-bit, non-prefetchable) [size=4K]
> > >>>>>>>>     Region 4: Memory at f2100000 (64-bit, prefetchable) [size=16K]
> > >>>>>>>>     Capabilities: [40] Power Management version 3
> > >>>>>>>>         Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
> > >>>>>>>> PME(D0+,D1+,D2+,D3hot+,D3cold+)
> > >>>>>>>>         Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> > >>>>>>>>     Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
> > >>>>>>>>         Address: 0000000000000000  Data: 0000
> > >>>>>>>>     Capabilities: [70] Express (v2) Endpoint, MSI 01
> > >>>>>>>>         DevCap:    MaxPayload 128 bytes, PhantFunc 0, Latency L0s
> > >>>>>>>> <512ns, L1 <64us
> > >>>>>>>>             ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
> > >>>>>>>> SlotPowerLimit 10.000W
> > >>>>>>>>         DevCtl:    CorrErr- NonFatalErr- FatalErr- UnsupReq-
> > >>>>>>>>             RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
> > >>>>>>>>             MaxPayload 128 bytes, MaxReadReq 4096 bytes
> > >>>>>>>>         DevSta:    CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
> > >>>>>>>>         LnkCap:    Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
> > >>>>>>>> Latency L0s unlimited, L1 <64us
> > >>>>>>>>             ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
> > >>>>>>>>         LnkCtl:    ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
> > >>>>>>>>             ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
> > >>>>>>>>         LnkSta:    Speed 2.5GT/s (ok), Width x1 (ok)
> > >>>>>>>>             TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> > >>>>>>>>         DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+,
> > >>>>>>>> OBFF Via message/WAKE#
> > >>>>>>>>              AtomicOpsCap: 32bit- 64bit- 128bitCAS-
> > >>>>>>>>         DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+,
> > >>>>>>>> OBFF Disabled
> > >>>>>>>>              AtomicOpsCtl: ReqEn-
> > >>>>>>>>         LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
> > >>>>>>>>              Transmit Margin: Normal Operating Range,
> > >>>>>>>> EnterModifiedCompliance- ComplianceSOS-
> > >>>>>>>>              Compliance De-emphasis: -6dB
> > >>>>>>>>         LnkSta2: Current De-emphasis Level: -6dB,
> > >>>>>>>> EqualizationComplete-, EqualizationPhase1-
> > >>>>>>>>              EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
> > >>>>>>>>     Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
> > >>>>>>>>         Vector table: BAR=4 offset=00000000
> > >>>>>>>>         PBA: BAR=4 offset=00000800
> > >>>>>>>>     Capabilities: [d0] Vital Product Data
> > >>>>>>>> pcilib: sysfs_read_vpd: read failed: Input/output error
> > >>>>>>>>         Not readable
> > >>>>>>>>     Capabilities: [100 v1] Advanced Error Reporting
> > >>>>>>>>         UESta:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> > >>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> > >>>>>>>>         UEMsk:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> > >>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> > >>>>>>>>         UESvrt:    DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
> > >>>>>>>> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> > >>>>>>>>         CESta:    RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ AdvNonFatalErr-
> > >>>>>>>>         CEMsk:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
> > >>>>>>>>         AERCap:    First Error Pointer: 00, ECRCGenCap+ ECRCGenEn-
> > >>>>>>>> ECRCChkCap+ ECRCChkEn-
> > >>>>>>>>             MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
> > >>>>>>>>         HeaderLog: 00000000 00000000 00000000 00000000
> > >>>>>>>>     Capabilities: [140 v1] Virtual Channel
> > >>>>>>>>         Caps:    LPEVC=0 RefClk=100ns PATEntryBits=1
> > >>>>>>>>         Arb:    Fixed- WRR32- WRR64- WRR128-
> > >>>>>>>>         Ctrl:    ArbSelect=Fixed
> > >>>>>>>>         Status:    InProgress-
> > >>>>>>>>         VC0:    Caps:    PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
> > >>>>>>>>             Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
> > >>>>>>>>             Ctrl:    Enable+ ID=0 ArbSelect=Fixed TC/VC=01
> > >>>>>>>>             Status:    NegoPending- InProgress-
> > >>>>>>>>     Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
> > >>>>>>>>     Capabilities: [170 v1] Latency Tolerance Reporting
> > >>>>>>>>         Max snoop latency: 71680ns
> > >>>>>>>>         Max no snoop latency: 71680ns
> > >>>>>>>>     Kernel driver in use: r8169
> > >>>>>>>>     Kernel modules: r8169
> > >>>>>>>>
> > >>>>>>>> Please let me know if you have any other ideas in terms of testing.
> > >>>>>>>>
> > >>>>>>>> Thanks!
> > >>>>>>>>
> > >>>>>>>> Peter.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> On Tue, 29 Jan 2019 at 05:28, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> > >>>>>>>>>
> > >>>>>>>>> On 28.01.2019 12:13, Peter Ceiley wrote:
> > >>>>>>>>>> Hi,
> > >>>>>>>>>>
> > >>>>>>>>>> I have been experiencing very poor network performance since Kernel
> > >>>>>>>>>> 4.19 and I'm confident it's related to the r8169 driver.
> > >>>>>>>>>>
> > >>>>>>>>>> I have no issue with kernel versions 4.18 and prior. I am experiencing
> > >>>>>>>>>> this issue in kernels 4.19 and 4.20 (currently running/testing with
> > >>>>>>>>>> 4.20.4 & 4.19.18).
> > >>>>>>>>>>
> > >>>>>>>>>> If someone could guide me in the right direction, I'm happy to help
> > >>>>>>>>>> troubleshoot this issue. Note that I have been keeping an eye on one
> > >>>>>>>>>> issue related to loading of the PHY driver, however, my symptoms
> > >>>>>>>>>> differ in that I still have a network connection. I have attempted to
> > >>>>>>>>>> reload the driver on a running system, but this does not improve the
> > >>>>>>>>>> situation.
> > >>>>>>>>>>
> > >>>>>>>>>> Using the proprietary r8168 driver returns my device to proper working order.
> > >>>>>>>>>>
> > >>>>>>>>>> lshw shows:
> > >>>>>>>>>>        description: Ethernet interface
> > >>>>>>>>>>        product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
> > >>>>>>>>>>        vendor: Realtek Semiconductor Co., Ltd.
> > >>>>>>>>>>        physical id: 0
> > >>>>>>>>>>        bus info: pci@0000:03:00.0
> > >>>>>>>>>>        logical name: enp3s0
> > >>>>>>>>>>        version: 0c
> > >>>>>>>>>>        serial:
> > >>>>>>>>>>        size: 1Gbit/s
> > >>>>>>>>>>        capacity: 1Gbit/s
> > >>>>>>>>>>        width: 64 bits
> > >>>>>>>>>>        clock: 33MHz
> > >>>>>>>>>>        capabilities: pm msi pciexpress msix vpd bus_master cap_list
> > >>>>>>>>>> ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd
> > >>>>>>>>>> 1000bt-fd autonegotiation
> > >>>>>>>>>>        configuration: autonegotiation=on broadcast=yes driver=r8169
> > >>>>>>>>>> duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25
> > >>>>>>>>>> latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
> > >>>>>>>>>>        resources: irq:19 ioport:d000(size=256)
> > >>>>>>>>>> memory:f7b00000-f7b00fff memory:f2100000-f2103fff
> > >>>>>>>>>>
> > >>>>>>>>>> Kind Regards,
> > >>>>>>>>>>
> > >>>>>>>>>> Peter.
> > >>>>>>>>>>
> > >>>>>>>>> Hi Peter,
> > >>>>>>>>>
> > >>>>>>>>> the description "poor network performance" is quite vague, therefore:
> > >>>>>>>>>
> > >>>>>>>>> - Can you provide any measurements?
> > >>>>>>>>> - iperf results before and after
> > >>>>>>>>> - statistics about dropped packets (rx and/or tx)
> > >>>>>>>>> - Do you use jumbo packets?
> > >>>>>>>>>
> > >>>>>>>>> Also help would be a "lspci -vv" output for the network card and
> > >>>>>>>>> the dmesg output line with the chip XID.
> > >>>>>>>>>
> > >>>>>>>>> Heiner
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> > >
> >
> >

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: r8169 Driver - Poor Network Performance Since Kernel 4.19
  2019-01-31 12:09                       ` Peter Ceiley
@ 2019-01-31 18:28                         ` Heiner Kallweit
  2019-02-01  4:27                           ` David Chang
  0 siblings, 1 reply; 25+ messages in thread
From: Heiner Kallweit @ 2019-01-31 18:28 UTC (permalink / raw)
  To: Peter Ceiley, David Chang; +Cc: Realtek linux nic maintainers, netdev

Thanks for testing, Peter!
So we have an ASPM-related issue indeed. I'm aware that there are certain
incompatibilities between board chipsets and network chip versions
(although it's not known which combinations are affected).
And we don't know whether it's a hardware or BIOS issue.

Older driver versions dealt with this by simply disabling ASPM in general.
As a result all systems with a supported Realtek chip didn't reach higher
package power-saving states, resulting in significantly reduced battery
lifetime on notebooks.
The network driver has no stake in dealing with the ASPM policies, this
is handled by lower PCI layers.

Unfortunately we can't detect ASPM incompatibilities at runtime. Maybe
we could build some heuristics based on rx_missed percentage, but it's
not clear that ASPM issues always show the same symptoms.

So for now people with affected systems have to set a proper
pcie_aspm.policy parameter.
Just what is not clear to me is why pcie_aspm=off doesn't help.

@David:
I assume you'll check with the affected user to test the ASPM policy
parameter.

Heiner


On 31.01.2019 13:09, Peter Ceiley wrote:
> Hi Heiner,
> 
> A quick update on my testing with different pcie_aspm settings:
> 
> pcie_aspm=off | no change
> pcie_aspm.policy=default | no change
> pcie_aspm.policy=performance | issue resolved
> pcie_aspm.policy=powersave | issue resolved
> pcie_aspm.policy=powersupersave | issue resolved
> 
> It seems the new driver does not play nicely with the default ASPM policy.
> 
> As requested, I've included an output of ethtool below when experiencing
> the issue - note that no errors are recorded.
> 
> # ethtool -S enp3s0
> NIC statistics:
>      tx_packets: 2749
>      rx_packets: 4089
>      tx_errors: 0
>      rx_errors: 0
>      rx_missed: 0
>      align_errors: 0
>      tx_single_collisions: 0
>      tx_multi_collisions: 0
>      unicast: 4078
>      broadcast: 9
>      multicast: 2
>      tx_aborted: 0
>      tx_underrun: 0
> 
> David, I hope this helps for your user as well. I appreciate you sharing
> the bug ticket - thanks.
> 
> Heiner, thanks very much for your help to date.
> 
> Regards,
> 
> Peter.
> 
> On Thu, 31 Jan 2019 at 18:23, David Chang <dchang@suse.com> wrote:
>>
>> Hi Heiner,
>>
>> On Jan 31, 2019 at 07:35:30 +0100, Heiner Kallweit wrote:
>>> Hi David, two more things:
>>>
>>> 1. Could you please test a recent linux-next kernel?
>>> 2. Please get a register dump (ethtool -d <if>) from 4.18 and 4.19
>>>    and compare them.
>>
>> I'm sorry that I do not have the issue machine handy. I would ask
>> our user to do the test. Thanks!
>>
>> Regards,
>> David
>>
>>>
>>> Heiner
>>>
>>>
>>> On 31.01.2019 07:21, Heiner Kallweit wrote:
>>>> David, thanks for the link to the bug ticket.
>>>> I think only a proper bisect can help to find the offending commit.
>>>>
>>>> Heiner
>>>>
>>>>
>>>> On 31.01.2019 03:32, David Chang wrote:
>>>>> Hi,
>>>>>
>>>>> We had a similr case here.
>>>>> - Realtek r8169 receive performance regression in kernel 4.19
>>>>>   https://bugzilla.suse.com/show_bug.cgi?id=1119649
>>>>>
>>>>> kernel: r8169 0000:01:00.0 eth0: RTL8168h/8111h, XID 54100880
>>>>> The major symptom is there are many rx_missed count.
>>>>>
>>>>>
>>>>> On Jan 30, 2019 at 20:15:45 +0100, Heiner Kallweit wrote:
>>>>>> Hi Peter,
>>>>>>
>>>>>> recently I had somebody where pcie_aspm=off for whatever reason didn't
>>>>>> do the trick, can you also check with pcie_aspm.policy=performance.
>>>>>
>>>>> We will give it a try later.
>>>>>
>>>>>> And please check with "ethtool -S <if>" whether the chip statistics
>>>>>> show a significant number of errors.
>>>>>>
>>>>>> If this doesn't help you may have to bisect to find the offending commit.
>>>>>
>>>>> We had tried fallback driver to a few previous commits as following,
>>>>> but with no luck.
>>>>>
>>>>> 9675931e6b65 r8169: re-enable MSI-X on RTL8168g (v4.19)
>>>>> 098b01ad9837 r8169: don't include asm headers directly (v4.19-rc1)
>>>>> a2965f12fde6 r8169: remove rtl8169_set_speed_xmii (v4.19-rc1)
>>>>> 6fcf9b1d4d6c r8169: fix runtime suspend (v4.19-rc1)
>>>>> e397286b8e89 r8169: remove TBI 1000BaseX support (v4.19-rc1)
>>>>>
>>>>> Thanks,
>>>>> David Chang
>>>>>
>>>>>>
>>>>>> Heiner
>>>>>>
>>>>>>
>>>>>> On 30.01.2019 10:59, Peter Ceiley wrote:
>>>>>>> Hi Heiner,
>>>>>>>
>>>>>>> I tried disabling the ASPM using the pcie_aspm=off kernel parameter
>>>>>>> and this made no difference.
>>>>>>>
>>>>>>> I tried compiling the 4.18.16 r8169.c with the 4.19.18 source and
>>>>>>> subsequently loaded the module in the running 4.19.18 kernel. I can
>>>>>>> confirm that this immediately resolved the issue and access to the NFS
>>>>>>> shares operated as expected.
>>>>>>>
>>>>>>> I presume this means it is an issue with the r8169 driver included in
>>>>>>> 4.19 onwards?
>>>>>>>
>>>>>>> To answer your last questions:
>>>>>>>
>>>>>>> Base Board Information
>>>>>>>     Manufacturer: Alienware
>>>>>>>     Product Name: 0PGRP5
>>>>>>>     Version: A02
>>>>>>>
>>>>>>> ... and yes, the RTL8168 is the onboard network chip.
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Peter.
>>>>>>>
>>>>>>> On Tue, 29 Jan 2019 at 17:44, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>>>>>
>>>>>>>> Hi Peter,
>>>>>>>>
>>>>>>>> I think the vendor driver doesn't enable ASPM per default.
>>>>>>>> So it's worth a try to disable ASPM in the BIOS or via sysfs.
>>>>>>>> Few older systems seem to have issues with ASPM, what kind of
>>>>>>>> system / mainboard are you using? The RTL8168 is the onboard
>>>>>>>> network chip?
>>>>>>>>
>>>>>>>> Rgds, Heiner
>>>>>>>>
>>>>>>>>
>>>>>>>> On 29.01.2019 07:20, Peter Ceiley wrote:
>>>>>>>>> Hi Heiner,
>>>>>>>>>
>>>>>>>>> Thanks, I'll do some more testing. It might not be the driver - I
>>>>>>>>> assumed it was due to the fact that using the r8168 driver 'resolves'
>>>>>>>>> the issue. I'll see if I can test the r8169.c on top of 4.19 - this is
>>>>>>>>> a good idea.
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>>
>>>>>>>>> Peter.
>>>>>>>>>
>>>>>>>>> On Tue, 29 Jan 2019 at 17:16, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi Peter,
>>>>>>>>>>
>>>>>>>>>> at a first glance it doesn't look like a typical driver issue.
>>>>>>>>>> What you could do:
>>>>>>>>>>
>>>>>>>>>> - Test the r8169.c from 4.18 on top of 4.19.
>>>>>>>>>>
>>>>>>>>>> - Check whether disabling ASPM (/sys/module/pcie_aspm) has an effect.
>>>>>>>>>>
>>>>>>>>>> - Bisect between 4.18 and 4.19 to find the offending commit.
>>>>>>>>>>
>>>>>>>>>> Any specific reason why you think root cause is in the driver and not
>>>>>>>>>> elsewhere in the network subsystem?
>>>>>>>>>>
>>>>>>>>>> Heiner
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 28.01.2019 23:10, Peter Ceiley wrote:
>>>>>>>>>>> Hi Heiner,
>>>>>>>>>>>
>>>>>>>>>>> Thanks for getting back to me.
>>>>>>>>>>>
>>>>>>>>>>> No, I don't use jumbo packets.
>>>>>>>>>>>
>>>>>>>>>>> Bandwidth is *generally* good, and iperf results to my NAS provide
>>>>>>>>>>> over 900 Mbits/s in both circumstances. The issue seems to appear when
>>>>>>>>>>> establishing a connection and is most notable, for example, on my
>>>>>>>>>>> mounted NFS shares where it takes seconds (up to 10's of seconds on
>>>>>>>>>>> larger directories) to list the contents of each directory. Once a
>>>>>>>>>>> transfer begins on a file, I appear to get good bandwidth.
>>>>>>>>>>>
>>>>>>>>>>> I'm unsure of the best scientific data to provide you in order to
>>>>>>>>>>> troubleshoot this issue. Running the following
>>>>>>>>>>>
>>>>>>>>>>>     netstat -s |grep retransmitted
>>>>>>>>>>>
>>>>>>>>>>> shows a steady increase in retransmitted segments each time I list the
>>>>>>>>>>> contents of a remote directory, for example, running 'ls' on a
>>>>>>>>>>> directory containing 345 media files did the following using kernel
>>>>>>>>>>> 4.19.18:
>>>>>>>>>>>
>>>>>>>>>>> increased retransmitted segments by 21 and the 'time' command showed
>>>>>>>>>>> the following:
>>>>>>>>>>>     real    0m19.867s
>>>>>>>>>>>     user    0m0.012s
>>>>>>>>>>>     sys    0m0.036s
>>>>>>>>>>>
>>>>>>>>>>> The same command shows no retransmitted segments running kernel
>>>>>>>>>>> 4.18.16 and 'time' showed:
>>>>>>>>>>>     real    0m0.300s
>>>>>>>>>>>     user    0m0.004s
>>>>>>>>>>>     sys    0m0.007s
>>>>>>>>>>>
>>>>>>>>>>> ifconfig does not show any RX/TX errors nor dropped packets in either case.
>>>>>>>>>>>
>>>>>>>>>>> dmesg XID:
>>>>>>>>>>> [    2.979984] r8169 0000:03:00.0 eth0: RTL8168g/8111g,
>>>>>>>>>>> f8:b1:56:fe:67:e0, XID 4c000800, IRQ 32
>>>>>>>>>>>
>>>>>>>>>>> # lspci -vv
>>>>>>>>>>> 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
>>>>>>>>>>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
>>>>>>>>>>>     Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
>>>>>>>>>>>     Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
>>>>>>>>>>> ParErr- Stepping- SERR- FastB2B- DisINTx+
>>>>>>>>>>>     Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
>>>>>>>>>>> <TAbort- <MAbort- >SERR- <PERR- INTx-
>>>>>>>>>>>     Latency: 0, Cache Line Size: 64 bytes
>>>>>>>>>>>     Interrupt: pin A routed to IRQ 19
>>>>>>>>>>>     Region 0: I/O ports at d000 [size=256]
>>>>>>>>>>>     Region 2: Memory at f7b00000 (64-bit, non-prefetchable) [size=4K]
>>>>>>>>>>>     Region 4: Memory at f2100000 (64-bit, prefetchable) [size=16K]
>>>>>>>>>>>     Capabilities: [40] Power Management version 3
>>>>>>>>>>>         Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
>>>>>>>>>>> PME(D0+,D1+,D2+,D3hot+,D3cold+)
>>>>>>>>>>>         Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
>>>>>>>>>>>     Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
>>>>>>>>>>>         Address: 0000000000000000  Data: 0000
>>>>>>>>>>>     Capabilities: [70] Express (v2) Endpoint, MSI 01
>>>>>>>>>>>         DevCap:    MaxPayload 128 bytes, PhantFunc 0, Latency L0s
>>>>>>>>>>> <512ns, L1 <64us
>>>>>>>>>>>             ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
>>>>>>>>>>> SlotPowerLimit 10.000W
>>>>>>>>>>>         DevCtl:    CorrErr- NonFatalErr- FatalErr- UnsupReq-
>>>>>>>>>>>             RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
>>>>>>>>>>>             MaxPayload 128 bytes, MaxReadReq 4096 bytes
>>>>>>>>>>>         DevSta:    CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
>>>>>>>>>>>         LnkCap:    Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
>>>>>>>>>>> Latency L0s unlimited, L1 <64us
>>>>>>>>>>>             ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
>>>>>>>>>>>         LnkCtl:    ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
>>>>>>>>>>>             ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
>>>>>>>>>>>         LnkSta:    Speed 2.5GT/s (ok), Width x1 (ok)
>>>>>>>>>>>             TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
>>>>>>>>>>>         DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+,
>>>>>>>>>>> OBFF Via message/WAKE#
>>>>>>>>>>>              AtomicOpsCap: 32bit- 64bit- 128bitCAS-
>>>>>>>>>>>         DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+,
>>>>>>>>>>> OBFF Disabled
>>>>>>>>>>>              AtomicOpsCtl: ReqEn-
>>>>>>>>>>>         LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
>>>>>>>>>>>              Transmit Margin: Normal Operating Range,
>>>>>>>>>>> EnterModifiedCompliance- ComplianceSOS-
>>>>>>>>>>>              Compliance De-emphasis: -6dB
>>>>>>>>>>>         LnkSta2: Current De-emphasis Level: -6dB,
>>>>>>>>>>> EqualizationComplete-, EqualizationPhase1-
>>>>>>>>>>>              EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
>>>>>>>>>>>     Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
>>>>>>>>>>>         Vector table: BAR=4 offset=00000000
>>>>>>>>>>>         PBA: BAR=4 offset=00000800
>>>>>>>>>>>     Capabilities: [d0] Vital Product Data
>>>>>>>>>>> pcilib: sysfs_read_vpd: read failed: Input/output error
>>>>>>>>>>>         Not readable
>>>>>>>>>>>     Capabilities: [100 v1] Advanced Error Reporting
>>>>>>>>>>>         UESta:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
>>>>>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>>>>>>>>>>         UEMsk:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
>>>>>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>>>>>>>>>>         UESvrt:    DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
>>>>>>>>>>> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>>>>>>>>>>>         CESta:    RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ AdvNonFatalErr-
>>>>>>>>>>>         CEMsk:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
>>>>>>>>>>>         AERCap:    First Error Pointer: 00, ECRCGenCap+ ECRCGenEn-
>>>>>>>>>>> ECRCChkCap+ ECRCChkEn-
>>>>>>>>>>>             MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
>>>>>>>>>>>         HeaderLog: 00000000 00000000 00000000 00000000
>>>>>>>>>>>     Capabilities: [140 v1] Virtual Channel
>>>>>>>>>>>         Caps:    LPEVC=0 RefClk=100ns PATEntryBits=1
>>>>>>>>>>>         Arb:    Fixed- WRR32- WRR64- WRR128-
>>>>>>>>>>>         Ctrl:    ArbSelect=Fixed
>>>>>>>>>>>         Status:    InProgress-
>>>>>>>>>>>         VC0:    Caps:    PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
>>>>>>>>>>>             Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
>>>>>>>>>>>             Ctrl:    Enable+ ID=0 ArbSelect=Fixed TC/VC=01
>>>>>>>>>>>             Status:    NegoPending- InProgress-
>>>>>>>>>>>     Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
>>>>>>>>>>>     Capabilities: [170 v1] Latency Tolerance Reporting
>>>>>>>>>>>         Max snoop latency: 71680ns
>>>>>>>>>>>         Max no snoop latency: 71680ns
>>>>>>>>>>>     Kernel driver in use: r8169
>>>>>>>>>>>     Kernel modules: r8169
>>>>>>>>>>>
>>>>>>>>>>> Please let me know if you have any other ideas in terms of testing.
>>>>>>>>>>>
>>>>>>>>>>> Thanks!
>>>>>>>>>>>
>>>>>>>>>>> Peter.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, 29 Jan 2019 at 05:28, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> On 28.01.2019 12:13, Peter Ceiley wrote:
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have been experiencing very poor network performance since Kernel
>>>>>>>>>>>>> 4.19 and I'm confident it's related to the r8169 driver.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have no issue with kernel versions 4.18 and prior. I am experiencing
>>>>>>>>>>>>> this issue in kernels 4.19 and 4.20 (currently running/testing with
>>>>>>>>>>>>> 4.20.4 & 4.19.18).
>>>>>>>>>>>>>
>>>>>>>>>>>>> If someone could guide me in the right direction, I'm happy to help
>>>>>>>>>>>>> troubleshoot this issue. Note that I have been keeping an eye on one
>>>>>>>>>>>>> issue related to loading of the PHY driver, however, my symptoms
>>>>>>>>>>>>> differ in that I still have a network connection. I have attempted to
>>>>>>>>>>>>> reload the driver on a running system, but this does not improve the
>>>>>>>>>>>>> situation.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Using the proprietary r8168 driver returns my device to proper working order.
>>>>>>>>>>>>>
>>>>>>>>>>>>> lshw shows:
>>>>>>>>>>>>>        description: Ethernet interface
>>>>>>>>>>>>>        product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
>>>>>>>>>>>>>        vendor: Realtek Semiconductor Co., Ltd.
>>>>>>>>>>>>>        physical id: 0
>>>>>>>>>>>>>        bus info: pci@0000:03:00.0
>>>>>>>>>>>>>        logical name: enp3s0
>>>>>>>>>>>>>        version: 0c
>>>>>>>>>>>>>        serial:
>>>>>>>>>>>>>        size: 1Gbit/s
>>>>>>>>>>>>>        capacity: 1Gbit/s
>>>>>>>>>>>>>        width: 64 bits
>>>>>>>>>>>>>        clock: 33MHz
>>>>>>>>>>>>>        capabilities: pm msi pciexpress msix vpd bus_master cap_list
>>>>>>>>>>>>> ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd
>>>>>>>>>>>>> 1000bt-fd autonegotiation
>>>>>>>>>>>>>        configuration: autonegotiation=on broadcast=yes driver=r8169
>>>>>>>>>>>>> duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25
>>>>>>>>>>>>> latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
>>>>>>>>>>>>>        resources: irq:19 ioport:d000(size=256)
>>>>>>>>>>>>> memory:f7b00000-f7b00fff memory:f2100000-f2103fff
>>>>>>>>>>>>>
>>>>>>>>>>>>> Kind Regards,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Peter.
>>>>>>>>>>>>>
>>>>>>>>>>>> Hi Peter,
>>>>>>>>>>>>
>>>>>>>>>>>> the description "poor network performance" is quite vague, therefore:
>>>>>>>>>>>>
>>>>>>>>>>>> - Can you provide any measurements?
>>>>>>>>>>>> - iperf results before and after
>>>>>>>>>>>> - statistics about dropped packets (rx and/or tx)
>>>>>>>>>>>> - Do you use jumbo packets?
>>>>>>>>>>>>
>>>>>>>>>>>> Also help would be a "lspci -vv" output for the network card and
>>>>>>>>>>>> the dmesg output line with the chip XID.
>>>>>>>>>>>>
>>>>>>>>>>>> Heiner
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: r8169 Driver - Poor Network Performance Since Kernel 4.19
  2019-01-31 18:28                         ` Heiner Kallweit
@ 2019-02-01  4:27                           ` David Chang
  0 siblings, 0 replies; 25+ messages in thread
From: David Chang @ 2019-02-01  4:27 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Peter Ceiley, Realtek linux nic maintainers, netdev, Martti Laaksonen

On Jan 31, 2019 at 19:28:20 +0100, Heiner Kallweit wrote:
> Thanks for testing, Peter!
> So we have an ASPM-related issue indeed. I'm aware that there are certain
> incompatibilities between board chipsets and network chip versions
> (although it's not known which combinations are affected).
> And we don't know whether it's a hardware or BIOS issue.
> 
> Older driver versions dealt with this by simply disabling ASPM in general.
> As a result all systems with a supported Realtek chip didn't reach higher
> package power-saving states, resulting in significantly reduced battery
> lifetime on notebooks.
> The network driver has no stake in dealing with the ASPM policies, this
> is handled by lower PCI layers.
> 
> Unfortunately we can't detect ASPM incompatibilities at runtime. Maybe
> we could build some heuristics based on rx_missed percentage, but it's
> not clear that ASPM issues always show the same symptoms.
> 
> So for now people with affected systems have to set a proper
> pcie_aspm.policy parameter.
> Just what is not clear to me is why pcie_aspm=off doesn't help.
> 
> @David:
> I assume you'll check with the affected user to test the ASPM policy
> parameter.

Unfortunately, we did not have any performace improvement when 
using both kernel parameters.

@Peter, thanks for the information.

regards,
David
> 
> Heiner
> 
> 
> On 31.01.2019 13:09, Peter Ceiley wrote:
> > Hi Heiner,
> > 
> > A quick update on my testing with different pcie_aspm settings:
> > 
> > pcie_aspm=off | no change
> > pcie_aspm.policy=default | no change
> > pcie_aspm.policy=performance | issue resolved
> > pcie_aspm.policy=powersave | issue resolved
> > pcie_aspm.policy=powersupersave | issue resolved
> > 
> > It seems the new driver does not play nicely with the default ASPM policy.
> > 
> > As requested, I've included an output of ethtool below when experiencing
> > the issue - note that no errors are recorded.
> > 
> > # ethtool -S enp3s0
> > NIC statistics:
> >      tx_packets: 2749
> >      rx_packets: 4089
> >      tx_errors: 0
> >      rx_errors: 0
> >      rx_missed: 0
> >      align_errors: 0
> >      tx_single_collisions: 0
> >      tx_multi_collisions: 0
> >      unicast: 4078
> >      broadcast: 9
> >      multicast: 2
> >      tx_aborted: 0
> >      tx_underrun: 0
> > 
> > David, I hope this helps for your user as well. I appreciate you sharing
> > the bug ticket - thanks.
> > 
> > Heiner, thanks very much for your help to date.
> > 
> > Regards,
> > 
> > Peter.
> > 
> > On Thu, 31 Jan 2019 at 18:23, David Chang <dchang@suse.com> wrote:
> >>
> >> Hi Heiner,
> >>
> >> On Jan 31, 2019 at 07:35:30 +0100, Heiner Kallweit wrote:
> >>> Hi David, two more things:
> >>>
> >>> 1. Could you please test a recent linux-next kernel?
> >>> 2. Please get a register dump (ethtool -d <if>) from 4.18 and 4.19
> >>>    and compare them.
> >>
> >> I'm sorry that I do not have the issue machine handy. I would ask
> >> our user to do the test. Thanks!
> >>
> >> Regards,
> >> David
> >>
> >>>
> >>> Heiner
> >>>
> >>>
> >>> On 31.01.2019 07:21, Heiner Kallweit wrote:
> >>>> David, thanks for the link to the bug ticket.
> >>>> I think only a proper bisect can help to find the offending commit.
> >>>>
> >>>> Heiner
> >>>>
> >>>>
> >>>> On 31.01.2019 03:32, David Chang wrote:
> >>>>> Hi,
> >>>>>
> >>>>> We had a similr case here.
> >>>>> - Realtek r8169 receive performance regression in kernel 4.19
> >>>>>   https://bugzilla.suse.com/show_bug.cgi?id=1119649
> >>>>>
> >>>>> kernel: r8169 0000:01:00.0 eth0: RTL8168h/8111h, XID 54100880
> >>>>> The major symptom is there are many rx_missed count.
> >>>>>
> >>>>>
> >>>>> On Jan 30, 2019 at 20:15:45 +0100, Heiner Kallweit wrote:
> >>>>>> Hi Peter,
> >>>>>>
> >>>>>> recently I had somebody where pcie_aspm=off for whatever reason didn't
> >>>>>> do the trick, can you also check with pcie_aspm.policy=performance.
> >>>>>
> >>>>> We will give it a try later.
> >>>>>
> >>>>>> And please check with "ethtool -S <if>" whether the chip statistics
> >>>>>> show a significant number of errors.
> >>>>>>
> >>>>>> If this doesn't help you may have to bisect to find the offending commit.
> >>>>>
> >>>>> We had tried fallback driver to a few previous commits as following,
> >>>>> but with no luck.
> >>>>>
> >>>>> 9675931e6b65 r8169: re-enable MSI-X on RTL8168g (v4.19)
> >>>>> 098b01ad9837 r8169: don't include asm headers directly (v4.19-rc1)
> >>>>> a2965f12fde6 r8169: remove rtl8169_set_speed_xmii (v4.19-rc1)
> >>>>> 6fcf9b1d4d6c r8169: fix runtime suspend (v4.19-rc1)
> >>>>> e397286b8e89 r8169: remove TBI 1000BaseX support (v4.19-rc1)
> >>>>>
> >>>>> Thanks,
> >>>>> David Chang
> >>>>>
> >>>>>>
> >>>>>> Heiner
> >>>>>>
> >>>>>>
> >>>>>> On 30.01.2019 10:59, Peter Ceiley wrote:
> >>>>>>> Hi Heiner,
> >>>>>>>
> >>>>>>> I tried disabling the ASPM using the pcie_aspm=off kernel parameter
> >>>>>>> and this made no difference.
> >>>>>>>
> >>>>>>> I tried compiling the 4.18.16 r8169.c with the 4.19.18 source and
> >>>>>>> subsequently loaded the module in the running 4.19.18 kernel. I can
> >>>>>>> confirm that this immediately resolved the issue and access to the NFS
> >>>>>>> shares operated as expected.
> >>>>>>>
> >>>>>>> I presume this means it is an issue with the r8169 driver included in
> >>>>>>> 4.19 onwards?
> >>>>>>>
> >>>>>>> To answer your last questions:
> >>>>>>>
> >>>>>>> Base Board Information
> >>>>>>>     Manufacturer: Alienware
> >>>>>>>     Product Name: 0PGRP5
> >>>>>>>     Version: A02
> >>>>>>>
> >>>>>>> ... and yes, the RTL8168 is the onboard network chip.
> >>>>>>>
> >>>>>>> Regards,
> >>>>>>>
> >>>>>>> Peter.
> >>>>>>>
> >>>>>>> On Tue, 29 Jan 2019 at 17:44, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>>>>>>>
> >>>>>>>> Hi Peter,
> >>>>>>>>
> >>>>>>>> I think the vendor driver doesn't enable ASPM per default.
> >>>>>>>> So it's worth a try to disable ASPM in the BIOS or via sysfs.
> >>>>>>>> Few older systems seem to have issues with ASPM, what kind of
> >>>>>>>> system / mainboard are you using? The RTL8168 is the onboard
> >>>>>>>> network chip?
> >>>>>>>>
> >>>>>>>> Rgds, Heiner
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 29.01.2019 07:20, Peter Ceiley wrote:
> >>>>>>>>> Hi Heiner,
> >>>>>>>>>
> >>>>>>>>> Thanks, I'll do some more testing. It might not be the driver - I
> >>>>>>>>> assumed it was due to the fact that using the r8168 driver 'resolves'
> >>>>>>>>> the issue. I'll see if I can test the r8169.c on top of 4.19 - this is
> >>>>>>>>> a good idea.
> >>>>>>>>>
> >>>>>>>>> Cheers,
> >>>>>>>>>
> >>>>>>>>> Peter.
> >>>>>>>>>
> >>>>>>>>> On Tue, 29 Jan 2019 at 17:16, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Hi Peter,
> >>>>>>>>>>
> >>>>>>>>>> at a first glance it doesn't look like a typical driver issue.
> >>>>>>>>>> What you could do:
> >>>>>>>>>>
> >>>>>>>>>> - Test the r8169.c from 4.18 on top of 4.19.
> >>>>>>>>>>
> >>>>>>>>>> - Check whether disabling ASPM (/sys/module/pcie_aspm) has an effect.
> >>>>>>>>>>
> >>>>>>>>>> - Bisect between 4.18 and 4.19 to find the offending commit.
> >>>>>>>>>>
> >>>>>>>>>> Any specific reason why you think root cause is in the driver and not
> >>>>>>>>>> elsewhere in the network subsystem?
> >>>>>>>>>>
> >>>>>>>>>> Heiner
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On 28.01.2019 23:10, Peter Ceiley wrote:
> >>>>>>>>>>> Hi Heiner,
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks for getting back to me.
> >>>>>>>>>>>
> >>>>>>>>>>> No, I don't use jumbo packets.
> >>>>>>>>>>>
> >>>>>>>>>>> Bandwidth is *generally* good, and iperf results to my NAS provide
> >>>>>>>>>>> over 900 Mbits/s in both circumstances. The issue seems to appear when
> >>>>>>>>>>> establishing a connection and is most notable, for example, on my
> >>>>>>>>>>> mounted NFS shares where it takes seconds (up to 10's of seconds on
> >>>>>>>>>>> larger directories) to list the contents of each directory. Once a
> >>>>>>>>>>> transfer begins on a file, I appear to get good bandwidth.
> >>>>>>>>>>>
> >>>>>>>>>>> I'm unsure of the best scientific data to provide you in order to
> >>>>>>>>>>> troubleshoot this issue. Running the following
> >>>>>>>>>>>
> >>>>>>>>>>>     netstat -s |grep retransmitted
> >>>>>>>>>>>
> >>>>>>>>>>> shows a steady increase in retransmitted segments each time I list the
> >>>>>>>>>>> contents of a remote directory, for example, running 'ls' on a
> >>>>>>>>>>> directory containing 345 media files did the following using kernel
> >>>>>>>>>>> 4.19.18:
> >>>>>>>>>>>
> >>>>>>>>>>> increased retransmitted segments by 21 and the 'time' command showed
> >>>>>>>>>>> the following:
> >>>>>>>>>>>     real    0m19.867s
> >>>>>>>>>>>     user    0m0.012s
> >>>>>>>>>>>     sys    0m0.036s
> >>>>>>>>>>>
> >>>>>>>>>>> The same command shows no retransmitted segments running kernel
> >>>>>>>>>>> 4.18.16 and 'time' showed:
> >>>>>>>>>>>     real    0m0.300s
> >>>>>>>>>>>     user    0m0.004s
> >>>>>>>>>>>     sys    0m0.007s
> >>>>>>>>>>>
> >>>>>>>>>>> ifconfig does not show any RX/TX errors nor dropped packets in either case.
> >>>>>>>>>>>
> >>>>>>>>>>> dmesg XID:
> >>>>>>>>>>> [    2.979984] r8169 0000:03:00.0 eth0: RTL8168g/8111g,
> >>>>>>>>>>> f8:b1:56:fe:67:e0, XID 4c000800, IRQ 32
> >>>>>>>>>>>
> >>>>>>>>>>> # lspci -vv
> >>>>>>>>>>> 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
> >>>>>>>>>>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
> >>>>>>>>>>>     Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
> >>>>>>>>>>>     Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> >>>>>>>>>>> ParErr- Stepping- SERR- FastB2B- DisINTx+
> >>>>>>>>>>>     Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> >>>>>>>>>>> <TAbort- <MAbort- >SERR- <PERR- INTx-
> >>>>>>>>>>>     Latency: 0, Cache Line Size: 64 bytes
> >>>>>>>>>>>     Interrupt: pin A routed to IRQ 19
> >>>>>>>>>>>     Region 0: I/O ports at d000 [size=256]
> >>>>>>>>>>>     Region 2: Memory at f7b00000 (64-bit, non-prefetchable) [size=4K]
> >>>>>>>>>>>     Region 4: Memory at f2100000 (64-bit, prefetchable) [size=16K]
> >>>>>>>>>>>     Capabilities: [40] Power Management version 3
> >>>>>>>>>>>         Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
> >>>>>>>>>>> PME(D0+,D1+,D2+,D3hot+,D3cold+)
> >>>>>>>>>>>         Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> >>>>>>>>>>>     Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
> >>>>>>>>>>>         Address: 0000000000000000  Data: 0000
> >>>>>>>>>>>     Capabilities: [70] Express (v2) Endpoint, MSI 01
> >>>>>>>>>>>         DevCap:    MaxPayload 128 bytes, PhantFunc 0, Latency L0s
> >>>>>>>>>>> <512ns, L1 <64us
> >>>>>>>>>>>             ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
> >>>>>>>>>>> SlotPowerLimit 10.000W
> >>>>>>>>>>>         DevCtl:    CorrErr- NonFatalErr- FatalErr- UnsupReq-
> >>>>>>>>>>>             RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
> >>>>>>>>>>>             MaxPayload 128 bytes, MaxReadReq 4096 bytes
> >>>>>>>>>>>         DevSta:    CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
> >>>>>>>>>>>         LnkCap:    Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
> >>>>>>>>>>> Latency L0s unlimited, L1 <64us
> >>>>>>>>>>>             ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
> >>>>>>>>>>>         LnkCtl:    ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
> >>>>>>>>>>>             ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
> >>>>>>>>>>>         LnkSta:    Speed 2.5GT/s (ok), Width x1 (ok)
> >>>>>>>>>>>             TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> >>>>>>>>>>>         DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+,
> >>>>>>>>>>> OBFF Via message/WAKE#
> >>>>>>>>>>>              AtomicOpsCap: 32bit- 64bit- 128bitCAS-
> >>>>>>>>>>>         DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+,
> >>>>>>>>>>> OBFF Disabled
> >>>>>>>>>>>              AtomicOpsCtl: ReqEn-
> >>>>>>>>>>>         LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
> >>>>>>>>>>>              Transmit Margin: Normal Operating Range,
> >>>>>>>>>>> EnterModifiedCompliance- ComplianceSOS-
> >>>>>>>>>>>              Compliance De-emphasis: -6dB
> >>>>>>>>>>>         LnkSta2: Current De-emphasis Level: -6dB,
> >>>>>>>>>>> EqualizationComplete-, EqualizationPhase1-
> >>>>>>>>>>>              EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
> >>>>>>>>>>>     Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
> >>>>>>>>>>>         Vector table: BAR=4 offset=00000000
> >>>>>>>>>>>         PBA: BAR=4 offset=00000800
> >>>>>>>>>>>     Capabilities: [d0] Vital Product Data
> >>>>>>>>>>> pcilib: sysfs_read_vpd: read failed: Input/output error
> >>>>>>>>>>>         Not readable
> >>>>>>>>>>>     Capabilities: [100 v1] Advanced Error Reporting
> >>>>>>>>>>>         UESta:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> >>>>>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> >>>>>>>>>>>         UEMsk:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> >>>>>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> >>>>>>>>>>>         UESvrt:    DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
> >>>>>>>>>>> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> >>>>>>>>>>>         CESta:    RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ AdvNonFatalErr-
> >>>>>>>>>>>         CEMsk:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
> >>>>>>>>>>>         AERCap:    First Error Pointer: 00, ECRCGenCap+ ECRCGenEn-
> >>>>>>>>>>> ECRCChkCap+ ECRCChkEn-
> >>>>>>>>>>>             MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
> >>>>>>>>>>>         HeaderLog: 00000000 00000000 00000000 00000000
> >>>>>>>>>>>     Capabilities: [140 v1] Virtual Channel
> >>>>>>>>>>>         Caps:    LPEVC=0 RefClk=100ns PATEntryBits=1
> >>>>>>>>>>>         Arb:    Fixed- WRR32- WRR64- WRR128-
> >>>>>>>>>>>         Ctrl:    ArbSelect=Fixed
> >>>>>>>>>>>         Status:    InProgress-
> >>>>>>>>>>>         VC0:    Caps:    PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
> >>>>>>>>>>>             Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
> >>>>>>>>>>>             Ctrl:    Enable+ ID=0 ArbSelect=Fixed TC/VC=01
> >>>>>>>>>>>             Status:    NegoPending- InProgress-
> >>>>>>>>>>>     Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
> >>>>>>>>>>>     Capabilities: [170 v1] Latency Tolerance Reporting
> >>>>>>>>>>>         Max snoop latency: 71680ns
> >>>>>>>>>>>         Max no snoop latency: 71680ns
> >>>>>>>>>>>     Kernel driver in use: r8169
> >>>>>>>>>>>     Kernel modules: r8169
> >>>>>>>>>>>
> >>>>>>>>>>> Please let me know if you have any other ideas in terms of testing.
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks!
> >>>>>>>>>>>
> >>>>>>>>>>> Peter.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Tue, 29 Jan 2019 at 05:28, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 28.01.2019 12:13, Peter Ceiley wrote:
> >>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I have been experiencing very poor network performance since Kernel
> >>>>>>>>>>>>> 4.19 and I'm confident it's related to the r8169 driver.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I have no issue with kernel versions 4.18 and prior. I am experiencing
> >>>>>>>>>>>>> this issue in kernels 4.19 and 4.20 (currently running/testing with
> >>>>>>>>>>>>> 4.20.4 & 4.19.18).
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> If someone could guide me in the right direction, I'm happy to help
> >>>>>>>>>>>>> troubleshoot this issue. Note that I have been keeping an eye on one
> >>>>>>>>>>>>> issue related to loading of the PHY driver, however, my symptoms
> >>>>>>>>>>>>> differ in that I still have a network connection. I have attempted to
> >>>>>>>>>>>>> reload the driver on a running system, but this does not improve the
> >>>>>>>>>>>>> situation.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Using the proprietary r8168 driver returns my device to proper working order.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> lshw shows:
> >>>>>>>>>>>>>        description: Ethernet interface
> >>>>>>>>>>>>>        product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
> >>>>>>>>>>>>>        vendor: Realtek Semiconductor Co., Ltd.
> >>>>>>>>>>>>>        physical id: 0
> >>>>>>>>>>>>>        bus info: pci@0000:03:00.0
> >>>>>>>>>>>>>        logical name: enp3s0
> >>>>>>>>>>>>>        version: 0c
> >>>>>>>>>>>>>        serial:
> >>>>>>>>>>>>>        size: 1Gbit/s
> >>>>>>>>>>>>>        capacity: 1Gbit/s
> >>>>>>>>>>>>>        width: 64 bits
> >>>>>>>>>>>>>        clock: 33MHz
> >>>>>>>>>>>>>        capabilities: pm msi pciexpress msix vpd bus_master cap_list
> >>>>>>>>>>>>> ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd
> >>>>>>>>>>>>> 1000bt-fd autonegotiation
> >>>>>>>>>>>>>        configuration: autonegotiation=on broadcast=yes driver=r8169
> >>>>>>>>>>>>> duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25
> >>>>>>>>>>>>> latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
> >>>>>>>>>>>>>        resources: irq:19 ioport:d000(size=256)
> >>>>>>>>>>>>> memory:f7b00000-f7b00fff memory:f2100000-f2103fff
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Kind Regards,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Peter.
> >>>>>>>>>>>>>
> >>>>>>>>>>>> Hi Peter,
> >>>>>>>>>>>>
> >>>>>>>>>>>> the description "poor network performance" is quite vague, therefore:
> >>>>>>>>>>>>
> >>>>>>>>>>>> - Can you provide any measurements?
> >>>>>>>>>>>> - iperf results before and after
> >>>>>>>>>>>> - statistics about dropped packets (rx and/or tx)
> >>>>>>>>>>>> - Do you use jumbo packets?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Also help would be a "lspci -vv" output for the network card and
> >>>>>>>>>>>> the dmesg output line with the chip XID.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Heiner
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>>
> > 
> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: r8169 Driver - Poor Network Performance Since Kernel 4.19
  2019-01-31  6:35                   ` Heiner Kallweit
  2019-01-31  6:49                     ` Heiner Kallweit
  2019-01-31  7:23                     ` David Chang
@ 2019-02-01  4:29                     ` David Chang
  2019-02-01  6:32                       ` Heiner Kallweit
  2 siblings, 1 reply; 25+ messages in thread
From: David Chang @ 2019-02-01  4:29 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Peter Ceiley, Realtek linux nic maintainers, netdev, Martti Laaksonen

On Jan 31, 2019 at 07:35:30 +0100, Heiner Kallweit wrote:
> Hi David, two more things:
> 
> 1. Could you please test a recent linux-next kernel?

Not tested yet. Will do if possible.

> 2. Please get a register dump (ethtool -d <if>) from 4.18 and 4.19
>    and compare them.

For your informaiton.

[with pcie_aspm=off]
--- v4.18.15	2019-02-01 12:11:56.019051828 +0800
+++ v4.9.11	2019-02-01 12:12:26.827439645 +0800
@@ -3,18 +3,19 @@
 Offset          Values
 ------          ------
 0x0000:         ec 8e b5 5a 2c f5 00 00 48 00 40 00 80 00 80 00
-0x0010:         00 10 38 0e 04 00 00 00 78 00 06 00 00 00 00 00
-0x0020:         00 f0 9b f6 03 00 00 00 00 00 00 00 00 00 00 00
+0x0010:         00 f0 ba 0d 04 00 00 00 78 00 06 00 00 00 00 00
+0x0020:         00 d0 35 f7 03 00 00 00 00 00 00 00 00 00 00 00
 0x0030:         00 00 00 00 00 00 00 0c 00 00 00 00 3f 80 00 00
-0x0040:         80 0f 10 57 0e cf 02 00 00 cf ba 34 00 00 00 00
-0x0050:         10 00 cf 18 60 11 00 01 11 11 11 00 00 00 00 00
+0x0040:         80 0f 10 57 0e cf 02 00 00 d8 c7 50 00 00 00 00
+0x0050:         10 00 cf 98 60 11 01 01 11 11 11 00 00 00 00 00
 0x0060:         00 00 00 00 3c 10 00 81 2c f0 00 80 93 00 80 f0
-0x0070:         00 6f 00 c4 b0 31 00 00 07 00 00 00 00 00 00 00
+0x0070:         00 6f 00 c4 b0 31 00 00 07 00 00 00 00 00 76 d0
 0x0080:         8b 06 01 00 00 00 00 00 00 00 00 00 00 00 00 00
 0x0090:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 0x00a0:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
-0x00b0:         7f 04 00 00 00 00 00 00 e1 c1 05 d2 00 00 00 00
+0x00b0:         7f 04 00 00 00 00 00 00 ad 79 01 d2 00 00 00 00
 0x00c0:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 0x00d0:         21 00 00 32 0e 00 00 00 00 00 00 40 06 11 fd 00
-0x00e0:         e1 20 51 51 00 30 94 f6 03 00 00 00 27 00 00 00
+0x00e0:         e1 20 51 51 00 e0 35 f7 03 00 00 00 27 00 00 00
 0x00f0:         3f 00 00 00 00 00 00 00 03 00 00 00 00 00 00 00

[pcie_aspm.policy=performance]
--- v4.18.15-p	2019-02-01 12:18:46.919221060 +0800
+++ v4.9.11-p	2019-02-01 12:19:09.207474824 +0800
@@ -3,18 +3,19 @@
 Offset          Values
 ------          ------
 0x0000:         ec 8e b5 5a 2c f5 00 00 48 00 40 00 80 00 80 00
-0x0010:         00 f0 bc 0d 04 00 00 00 78 00 06 00 00 00 00 00
-0x0020:         00 60 2e f7 03 00 00 00 00 00 00 00 00 00 00 00
+0x0010:         00 c0 22 09 04 00 00 00 78 00 06 00 00 00 00 00
+0x0020:         00 f0 e5 f4 03 00 00 00 00 00 00 00 00 00 00 00
 0x0030:         00 00 00 00 00 00 00 0c 00 00 00 00 3f 80 00 00
-0x0040:         80 0f 10 57 0e cf 02 00 00 53 50 1a 00 00 00 00
-0x0050:         10 00 cf 18 60 11 00 01 11 11 11 00 00 00 00 00
+0x0040:         80 0f 10 57 0e cf 02 00 00 d2 35 7b 00 00 00 00
+0x0050:         10 00 cf 98 60 11 01 01 11 11 11 00 00 00 00 00
 0x0060:         00 00 00 00 3c 10 00 81 2c f0 00 80 93 00 80 f0
-0x0070:         00 6f 00 c4 b0 31 00 00 07 00 00 00 00 00 00 00
+0x0070:         00 6f 00 c4 b0 31 00 00 07 00 00 00 00 00 a4 a0
 0x0080:         8b 06 01 00 00 00 00 00 00 00 00 00 00 00 00 00
 0x0090:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 0x00a0:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
-0x00b0:         7f 04 00 00 00 00 00 00 e1 c1 05 d2 00 00 00 00
+0x00b0:         7f 04 00 00 00 00 00 00 ad 79 01 d2 00 00 00 00
 0x00c0:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 0x00d0:         21 00 00 32 0e 00 00 00 00 00 00 40 06 11 fd 00
-0x00e0:         e1 20 51 51 00 70 2e f7 03 00 00 00 27 00 00 00
+0x00e0:         e1 20 51 51 00 00 e6 f4 03 00 00 00 27 00 00 00
 0x00f0:         3f 00 00 00 00 00 00 00 03 00 00 00 00 00 00 00

Thanks,
David

> Heiner
> 
> 
> On 31.01.2019 07:21, Heiner Kallweit wrote:
> > David, thanks for the link to the bug ticket.
> > I think only a proper bisect can help to find the offending commit.
> > 
> > Heiner
> > 
> > 
> > On 31.01.2019 03:32, David Chang wrote:
> >> Hi,
> >>
> >> We had a similr case here.
> >> - Realtek r8169 receive performance regression in kernel 4.19
> >>   https://bugzilla.suse.com/show_bug.cgi?id=1119649
> >>
> >> kernel: r8169 0000:01:00.0 eth0: RTL8168h/8111h, XID 54100880
> >> The major symptom is there are many rx_missed count.
> >>
> >>
> >> On Jan 30, 2019 at 20:15:45 +0100, Heiner Kallweit wrote:
> >>> Hi Peter,
> >>>
> >>> recently I had somebody where pcie_aspm=off for whatever reason didn't
> >>> do the trick, can you also check with pcie_aspm.policy=performance.
> >>
> >> We will give it a try later.
> >>
> >>> And please check with "ethtool -S <if>" whether the chip statistics
> >>> show a significant number of errors.
> >>>
> >>> If this doesn't help you may have to bisect to find the offending commit.
> >>
> >> We had tried fallback driver to a few previous commits as following,
> >> but with no luck.
> >>
> >> 9675931e6b65 r8169: re-enable MSI-X on RTL8168g (v4.19)
> >> 098b01ad9837 r8169: don't include asm headers directly (v4.19-rc1)
> >> a2965f12fde6 r8169: remove rtl8169_set_speed_xmii (v4.19-rc1)
> >> 6fcf9b1d4d6c r8169: fix runtime suspend (v4.19-rc1)
> >> e397286b8e89 r8169: remove TBI 1000BaseX support (v4.19-rc1)
> >>
> >> Thanks,
> >> David Chang
> >>
> >>>
> >>> Heiner
> >>>
> >>>
> >>> On 30.01.2019 10:59, Peter Ceiley wrote:
> >>>> Hi Heiner,
> >>>>
> >>>> I tried disabling the ASPM using the pcie_aspm=off kernel parameter
> >>>> and this made no difference.
> >>>>
> >>>> I tried compiling the 4.18.16 r8169.c with the 4.19.18 source and
> >>>> subsequently loaded the module in the running 4.19.18 kernel. I can
> >>>> confirm that this immediately resolved the issue and access to the NFS
> >>>> shares operated as expected.
> >>>>
> >>>> I presume this means it is an issue with the r8169 driver included in
> >>>> 4.19 onwards?
> >>>>
> >>>> To answer your last questions:
> >>>>
> >>>> Base Board Information
> >>>>     Manufacturer: Alienware
> >>>>     Product Name: 0PGRP5
> >>>>     Version: A02
> >>>>
> >>>> ... and yes, the RTL8168 is the onboard network chip.
> >>>>
> >>>> Regards,
> >>>>
> >>>> Peter.
> >>>>
> >>>> On Tue, 29 Jan 2019 at 17:44, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>>>>
> >>>>> Hi Peter,
> >>>>>
> >>>>> I think the vendor driver doesn't enable ASPM per default.
> >>>>> So it's worth a try to disable ASPM in the BIOS or via sysfs.
> >>>>> Few older systems seem to have issues with ASPM, what kind of
> >>>>> system / mainboard are you using? The RTL8168 is the onboard
> >>>>> network chip?
> >>>>>
> >>>>> Rgds, Heiner
> >>>>>
> >>>>>
> >>>>> On 29.01.2019 07:20, Peter Ceiley wrote:
> >>>>>> Hi Heiner,
> >>>>>>
> >>>>>> Thanks, I'll do some more testing. It might not be the driver - I
> >>>>>> assumed it was due to the fact that using the r8168 driver 'resolves'
> >>>>>> the issue. I'll see if I can test the r8169.c on top of 4.19 - this is
> >>>>>> a good idea.
> >>>>>>
> >>>>>> Cheers,
> >>>>>>
> >>>>>> Peter.
> >>>>>>
> >>>>>> On Tue, 29 Jan 2019 at 17:16, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>>>>>>
> >>>>>>> Hi Peter,
> >>>>>>>
> >>>>>>> at a first glance it doesn't look like a typical driver issue.
> >>>>>>> What you could do:
> >>>>>>>
> >>>>>>> - Test the r8169.c from 4.18 on top of 4.19.
> >>>>>>>
> >>>>>>> - Check whether disabling ASPM (/sys/module/pcie_aspm) has an effect.
> >>>>>>>
> >>>>>>> - Bisect between 4.18 and 4.19 to find the offending commit.
> >>>>>>>
> >>>>>>> Any specific reason why you think root cause is in the driver and not
> >>>>>>> elsewhere in the network subsystem?
> >>>>>>>
> >>>>>>> Heiner
> >>>>>>>
> >>>>>>>
> >>>>>>> On 28.01.2019 23:10, Peter Ceiley wrote:
> >>>>>>>> Hi Heiner,
> >>>>>>>>
> >>>>>>>> Thanks for getting back to me.
> >>>>>>>>
> >>>>>>>> No, I don't use jumbo packets.
> >>>>>>>>
> >>>>>>>> Bandwidth is *generally* good, and iperf results to my NAS provide
> >>>>>>>> over 900 Mbits/s in both circumstances. The issue seems to appear when
> >>>>>>>> establishing a connection and is most notable, for example, on my
> >>>>>>>> mounted NFS shares where it takes seconds (up to 10's of seconds on
> >>>>>>>> larger directories) to list the contents of each directory. Once a
> >>>>>>>> transfer begins on a file, I appear to get good bandwidth.
> >>>>>>>>
> >>>>>>>> I'm unsure of the best scientific data to provide you in order to
> >>>>>>>> troubleshoot this issue. Running the following
> >>>>>>>>
> >>>>>>>>     netstat -s |grep retransmitted
> >>>>>>>>
> >>>>>>>> shows a steady increase in retransmitted segments each time I list the
> >>>>>>>> contents of a remote directory, for example, running 'ls' on a
> >>>>>>>> directory containing 345 media files did the following using kernel
> >>>>>>>> 4.19.18:
> >>>>>>>>
> >>>>>>>> increased retransmitted segments by 21 and the 'time' command showed
> >>>>>>>> the following:
> >>>>>>>>     real    0m19.867s
> >>>>>>>>     user    0m0.012s
> >>>>>>>>     sys    0m0.036s
> >>>>>>>>
> >>>>>>>> The same command shows no retransmitted segments running kernel
> >>>>>>>> 4.18.16 and 'time' showed:
> >>>>>>>>     real    0m0.300s
> >>>>>>>>     user    0m0.004s
> >>>>>>>>     sys    0m0.007s
> >>>>>>>>
> >>>>>>>> ifconfig does not show any RX/TX errors nor dropped packets in either case.
> >>>>>>>>
> >>>>>>>> dmesg XID:
> >>>>>>>> [    2.979984] r8169 0000:03:00.0 eth0: RTL8168g/8111g,
> >>>>>>>> f8:b1:56:fe:67:e0, XID 4c000800, IRQ 32
> >>>>>>>>
> >>>>>>>> # lspci -vv
> >>>>>>>> 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
> >>>>>>>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
> >>>>>>>>     Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
> >>>>>>>>     Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> >>>>>>>> ParErr- Stepping- SERR- FastB2B- DisINTx+
> >>>>>>>>     Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> >>>>>>>> <TAbort- <MAbort- >SERR- <PERR- INTx-
> >>>>>>>>     Latency: 0, Cache Line Size: 64 bytes
> >>>>>>>>     Interrupt: pin A routed to IRQ 19
> >>>>>>>>     Region 0: I/O ports at d000 [size=256]
> >>>>>>>>     Region 2: Memory at f7b00000 (64-bit, non-prefetchable) [size=4K]
> >>>>>>>>     Region 4: Memory at f2100000 (64-bit, prefetchable) [size=16K]
> >>>>>>>>     Capabilities: [40] Power Management version 3
> >>>>>>>>         Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
> >>>>>>>> PME(D0+,D1+,D2+,D3hot+,D3cold+)
> >>>>>>>>         Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> >>>>>>>>     Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
> >>>>>>>>         Address: 0000000000000000  Data: 0000
> >>>>>>>>     Capabilities: [70] Express (v2) Endpoint, MSI 01
> >>>>>>>>         DevCap:    MaxPayload 128 bytes, PhantFunc 0, Latency L0s
> >>>>>>>> <512ns, L1 <64us
> >>>>>>>>             ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
> >>>>>>>> SlotPowerLimit 10.000W
> >>>>>>>>         DevCtl:    CorrErr- NonFatalErr- FatalErr- UnsupReq-
> >>>>>>>>             RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
> >>>>>>>>             MaxPayload 128 bytes, MaxReadReq 4096 bytes
> >>>>>>>>         DevSta:    CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
> >>>>>>>>         LnkCap:    Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
> >>>>>>>> Latency L0s unlimited, L1 <64us
> >>>>>>>>             ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
> >>>>>>>>         LnkCtl:    ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
> >>>>>>>>             ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
> >>>>>>>>         LnkSta:    Speed 2.5GT/s (ok), Width x1 (ok)
> >>>>>>>>             TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> >>>>>>>>         DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+,
> >>>>>>>> OBFF Via message/WAKE#
> >>>>>>>>              AtomicOpsCap: 32bit- 64bit- 128bitCAS-
> >>>>>>>>         DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+,
> >>>>>>>> OBFF Disabled
> >>>>>>>>              AtomicOpsCtl: ReqEn-
> >>>>>>>>         LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
> >>>>>>>>              Transmit Margin: Normal Operating Range,
> >>>>>>>> EnterModifiedCompliance- ComplianceSOS-
> >>>>>>>>              Compliance De-emphasis: -6dB
> >>>>>>>>         LnkSta2: Current De-emphasis Level: -6dB,
> >>>>>>>> EqualizationComplete-, EqualizationPhase1-
> >>>>>>>>              EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
> >>>>>>>>     Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
> >>>>>>>>         Vector table: BAR=4 offset=00000000
> >>>>>>>>         PBA: BAR=4 offset=00000800
> >>>>>>>>     Capabilities: [d0] Vital Product Data
> >>>>>>>> pcilib: sysfs_read_vpd: read failed: Input/output error
> >>>>>>>>         Not readable
> >>>>>>>>     Capabilities: [100 v1] Advanced Error Reporting
> >>>>>>>>         UESta:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> >>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> >>>>>>>>         UEMsk:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> >>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> >>>>>>>>         UESvrt:    DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
> >>>>>>>> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> >>>>>>>>         CESta:    RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ AdvNonFatalErr-
> >>>>>>>>         CEMsk:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
> >>>>>>>>         AERCap:    First Error Pointer: 00, ECRCGenCap+ ECRCGenEn-
> >>>>>>>> ECRCChkCap+ ECRCChkEn-
> >>>>>>>>             MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
> >>>>>>>>         HeaderLog: 00000000 00000000 00000000 00000000
> >>>>>>>>     Capabilities: [140 v1] Virtual Channel
> >>>>>>>>         Caps:    LPEVC=0 RefClk=100ns PATEntryBits=1
> >>>>>>>>         Arb:    Fixed- WRR32- WRR64- WRR128-
> >>>>>>>>         Ctrl:    ArbSelect=Fixed
> >>>>>>>>         Status:    InProgress-
> >>>>>>>>         VC0:    Caps:    PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
> >>>>>>>>             Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
> >>>>>>>>             Ctrl:    Enable+ ID=0 ArbSelect=Fixed TC/VC=01
> >>>>>>>>             Status:    NegoPending- InProgress-
> >>>>>>>>     Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
> >>>>>>>>     Capabilities: [170 v1] Latency Tolerance Reporting
> >>>>>>>>         Max snoop latency: 71680ns
> >>>>>>>>         Max no snoop latency: 71680ns
> >>>>>>>>     Kernel driver in use: r8169
> >>>>>>>>     Kernel modules: r8169
> >>>>>>>>
> >>>>>>>> Please let me know if you have any other ideas in terms of testing.
> >>>>>>>>
> >>>>>>>> Thanks!
> >>>>>>>>
> >>>>>>>> Peter.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Tue, 29 Jan 2019 at 05:28, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>>>>>>>>
> >>>>>>>>> On 28.01.2019 12:13, Peter Ceiley wrote:
> >>>>>>>>>> Hi,
> >>>>>>>>>>
> >>>>>>>>>> I have been experiencing very poor network performance since Kernel
> >>>>>>>>>> 4.19 and I'm confident it's related to the r8169 driver.
> >>>>>>>>>>
> >>>>>>>>>> I have no issue with kernel versions 4.18 and prior. I am experiencing
> >>>>>>>>>> this issue in kernels 4.19 and 4.20 (currently running/testing with
> >>>>>>>>>> 4.20.4 & 4.19.18).
> >>>>>>>>>>
> >>>>>>>>>> If someone could guide me in the right direction, I'm happy to help
> >>>>>>>>>> troubleshoot this issue. Note that I have been keeping an eye on one
> >>>>>>>>>> issue related to loading of the PHY driver, however, my symptoms
> >>>>>>>>>> differ in that I still have a network connection. I have attempted to
> >>>>>>>>>> reload the driver on a running system, but this does not improve the
> >>>>>>>>>> situation.
> >>>>>>>>>>
> >>>>>>>>>> Using the proprietary r8168 driver returns my device to proper working order.
> >>>>>>>>>>
> >>>>>>>>>> lshw shows:
> >>>>>>>>>>        description: Ethernet interface
> >>>>>>>>>>        product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
> >>>>>>>>>>        vendor: Realtek Semiconductor Co., Ltd.
> >>>>>>>>>>        physical id: 0
> >>>>>>>>>>        bus info: pci@0000:03:00.0
> >>>>>>>>>>        logical name: enp3s0
> >>>>>>>>>>        version: 0c
> >>>>>>>>>>        serial:
> >>>>>>>>>>        size: 1Gbit/s
> >>>>>>>>>>        capacity: 1Gbit/s
> >>>>>>>>>>        width: 64 bits
> >>>>>>>>>>        clock: 33MHz
> >>>>>>>>>>        capabilities: pm msi pciexpress msix vpd bus_master cap_list
> >>>>>>>>>> ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd
> >>>>>>>>>> 1000bt-fd autonegotiation
> >>>>>>>>>>        configuration: autonegotiation=on broadcast=yes driver=r8169
> >>>>>>>>>> duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25
> >>>>>>>>>> latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
> >>>>>>>>>>        resources: irq:19 ioport:d000(size=256)
> >>>>>>>>>> memory:f7b00000-f7b00fff memory:f2100000-f2103fff
> >>>>>>>>>>
> >>>>>>>>>> Kind Regards,
> >>>>>>>>>>
> >>>>>>>>>> Peter.
> >>>>>>>>>>
> >>>>>>>>> Hi Peter,
> >>>>>>>>>
> >>>>>>>>> the description "poor network performance" is quite vague, therefore:
> >>>>>>>>>
> >>>>>>>>> - Can you provide any measurements?
> >>>>>>>>> - iperf results before and after
> >>>>>>>>> - statistics about dropped packets (rx and/or tx)
> >>>>>>>>> - Do you use jumbo packets?
> >>>>>>>>>
> >>>>>>>>> Also help would be a "lspci -vv" output for the network card and
> >>>>>>>>> the dmesg output line with the chip XID.
> >>>>>>>>>
> >>>>>>>>> Heiner
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> > 
> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: r8169 Driver - Poor Network Performance Since Kernel 4.19
  2019-02-01  4:29                     ` David Chang
@ 2019-02-01  6:32                       ` Heiner Kallweit
  0 siblings, 0 replies; 25+ messages in thread
From: Heiner Kallweit @ 2019-02-01  6:32 UTC (permalink / raw)
  To: David Chang
  Cc: Peter Ceiley, Realtek linux nic maintainers, netdev, Martti Laaksonen

Thanks, however the register diff is a little hard to read.
Usually ethtool -d outputs something like this:

RealTek RTL8168g/8111g registers:
--------------------------------------------------------
0x00: MAC Address                      00:01:2e:83:90:11
0x08: Multicast Address Filter     0x100000c0 0x00000084
0x10: Dump Tally Counter Command   0x78c43000 0x00000001
0x20: Tx Normal Priority Ring Addr 0x77cc4000 0x00000001
0x28: Tx High Priority Ring Addr   0x00000000 0x00000000
0x30: Flash memory read/write                 0x00000000
0x34: Early Rx Byte Count                              0
0x36: Early Rx Status                               0x00
0x37: Command                                       0x0c
      Rx on, Tx on
0x3C: Interrupt Mask                              0x003f
      LinkChg RxNoBuf TxErr TxOK RxErr RxOK
0x3E: Interrupt Status                            0x0000

0x40: Tx Configuration                        0x4f000f80
0x44: Rx Configuration                        0x0002cf0e
0x48: Timer count                             0x00000000
0x4C: Missed packet counter                     0x000000
0x50: EEPROM Command                                0x10
0x51: Config 0                                      0x00
0x52: Config 1                                      0xcf
0x53: Config 2                                      0x9c
0x54: Config 3                                      0x60
0x55: Config 4                                      0x50
0x56: Config 5                                      0x01
0x58: Timer interrupt                         0x00000000
0x5C: Multiple Interrupt Select                   0x0000
0x60: PHY access                              0x00000000
0x64: TBI control and status                  0x00000000
0x68: TBI Autonegotiation advertisement (ANAR)    0x0000
0x6A: TBI Link partner ability (LPAR)             0x0000
0x6C: PHY status                                    0xf3
..


On 01.02.2019 05:29, David Chang wrote:
> On Jan 31, 2019 at 07:35:30 +0100, Heiner Kallweit wrote:
>> Hi David, two more things:
>>
>> 1. Could you please test a recent linux-next kernel?
> 
> Not tested yet. Will do if possible.
> 
>> 2. Please get a register dump (ethtool -d <if>) from 4.18 and 4.19
>>    and compare them.
> 
> For your informaiton.
> 
> [with pcie_aspm=off]
> --- v4.18.15	2019-02-01 12:11:56.019051828 +0800
> +++ v4.9.11	2019-02-01 12:12:26.827439645 +0800
> @@ -3,18 +3,19 @@
>  Offset          Values
>  ------          ------
>  0x0000:         ec 8e b5 5a 2c f5 00 00 48 00 40 00 80 00 80 00
> -0x0010:         00 10 38 0e 04 00 00 00 78 00 06 00 00 00 00 00
> -0x0020:         00 f0 9b f6 03 00 00 00 00 00 00 00 00 00 00 00
> +0x0010:         00 f0 ba 0d 04 00 00 00 78 00 06 00 00 00 00 00
> +0x0020:         00 d0 35 f7 03 00 00 00 00 00 00 00 00 00 00 00
>  0x0030:         00 00 00 00 00 00 00 0c 00 00 00 00 3f 80 00 00
> -0x0040:         80 0f 10 57 0e cf 02 00 00 cf ba 34 00 00 00 00
> -0x0050:         10 00 cf 18 60 11 00 01 11 11 11 00 00 00 00 00
> +0x0040:         80 0f 10 57 0e cf 02 00 00 d8 c7 50 00 00 00 00
> +0x0050:         10 00 cf 98 60 11 01 01 11 11 11 00 00 00 00 00
>  0x0060:         00 00 00 00 3c 10 00 81 2c f0 00 80 93 00 80 f0
> -0x0070:         00 6f 00 c4 b0 31 00 00 07 00 00 00 00 00 00 00
> +0x0070:         00 6f 00 c4 b0 31 00 00 07 00 00 00 00 00 76 d0
>  0x0080:         8b 06 01 00 00 00 00 00 00 00 00 00 00 00 00 00
>  0x0090:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>  0x00a0:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> -0x00b0:         7f 04 00 00 00 00 00 00 e1 c1 05 d2 00 00 00 00
> +0x00b0:         7f 04 00 00 00 00 00 00 ad 79 01 d2 00 00 00 00
>  0x00c0:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>  0x00d0:         21 00 00 32 0e 00 00 00 00 00 00 40 06 11 fd 00
> -0x00e0:         e1 20 51 51 00 30 94 f6 03 00 00 00 27 00 00 00
> +0x00e0:         e1 20 51 51 00 e0 35 f7 03 00 00 00 27 00 00 00
>  0x00f0:         3f 00 00 00 00 00 00 00 03 00 00 00 00 00 00 00
> 
> [pcie_aspm.policy=performance]
> --- v4.18.15-p	2019-02-01 12:18:46.919221060 +0800
> +++ v4.9.11-p	2019-02-01 12:19:09.207474824 +0800
> @@ -3,18 +3,19 @@
>  Offset          Values
>  ------          ------
>  0x0000:         ec 8e b5 5a 2c f5 00 00 48 00 40 00 80 00 80 00
> -0x0010:         00 f0 bc 0d 04 00 00 00 78 00 06 00 00 00 00 00
> -0x0020:         00 60 2e f7 03 00 00 00 00 00 00 00 00 00 00 00
> +0x0010:         00 c0 22 09 04 00 00 00 78 00 06 00 00 00 00 00
> +0x0020:         00 f0 e5 f4 03 00 00 00 00 00 00 00 00 00 00 00
>  0x0030:         00 00 00 00 00 00 00 0c 00 00 00 00 3f 80 00 00
> -0x0040:         80 0f 10 57 0e cf 02 00 00 53 50 1a 00 00 00 00
> -0x0050:         10 00 cf 18 60 11 00 01 11 11 11 00 00 00 00 00
> +0x0040:         80 0f 10 57 0e cf 02 00 00 d2 35 7b 00 00 00 00
> +0x0050:         10 00 cf 98 60 11 01 01 11 11 11 00 00 00 00 00
>  0x0060:         00 00 00 00 3c 10 00 81 2c f0 00 80 93 00 80 f0
> -0x0070:         00 6f 00 c4 b0 31 00 00 07 00 00 00 00 00 00 00
> +0x0070:         00 6f 00 c4 b0 31 00 00 07 00 00 00 00 00 a4 a0
>  0x0080:         8b 06 01 00 00 00 00 00 00 00 00 00 00 00 00 00
>  0x0090:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>  0x00a0:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> -0x00b0:         7f 04 00 00 00 00 00 00 e1 c1 05 d2 00 00 00 00
> +0x00b0:         7f 04 00 00 00 00 00 00 ad 79 01 d2 00 00 00 00
>  0x00c0:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>  0x00d0:         21 00 00 32 0e 00 00 00 00 00 00 40 06 11 fd 00
> -0x00e0:         e1 20 51 51 00 70 2e f7 03 00 00 00 27 00 00 00
> +0x00e0:         e1 20 51 51 00 00 e6 f4 03 00 00 00 27 00 00 00
>  0x00f0:         3f 00 00 00 00 00 00 00 03 00 00 00 00 00 00 00
> 
> Thanks,
> David
> 
>> Heiner
>>
>>
>> On 31.01.2019 07:21, Heiner Kallweit wrote:
>>> David, thanks for the link to the bug ticket.
>>> I think only a proper bisect can help to find the offending commit.
>>>
>>> Heiner
>>>
>>>
>>> On 31.01.2019 03:32, David Chang wrote:
>>>> Hi,
>>>>
>>>> We had a similr case here.
>>>> - Realtek r8169 receive performance regression in kernel 4.19
>>>>   https://bugzilla.suse.com/show_bug.cgi?id=1119649
>>>>
>>>> kernel: r8169 0000:01:00.0 eth0: RTL8168h/8111h, XID 54100880
>>>> The major symptom is there are many rx_missed count.
>>>>
>>>>
>>>> On Jan 30, 2019 at 20:15:45 +0100, Heiner Kallweit wrote:
>>>>> Hi Peter,
>>>>>
>>>>> recently I had somebody where pcie_aspm=off for whatever reason didn't
>>>>> do the trick, can you also check with pcie_aspm.policy=performance.
>>>>
>>>> We will give it a try later.
>>>>
>>>>> And please check with "ethtool -S <if>" whether the chip statistics
>>>>> show a significant number of errors.
>>>>>
>>>>> If this doesn't help you may have to bisect to find the offending commit.
>>>>
>>>> We had tried fallback driver to a few previous commits as following,
>>>> but with no luck.
>>>>
>>>> 9675931e6b65 r8169: re-enable MSI-X on RTL8168g (v4.19)
>>>> 098b01ad9837 r8169: don't include asm headers directly (v4.19-rc1)
>>>> a2965f12fde6 r8169: remove rtl8169_set_speed_xmii (v4.19-rc1)
>>>> 6fcf9b1d4d6c r8169: fix runtime suspend (v4.19-rc1)
>>>> e397286b8e89 r8169: remove TBI 1000BaseX support (v4.19-rc1)
>>>>
>>>> Thanks,
>>>> David Chang
>>>>
>>>>>
>>>>> Heiner
>>>>>
>>>>>
>>>>> On 30.01.2019 10:59, Peter Ceiley wrote:
>>>>>> Hi Heiner,
>>>>>>
>>>>>> I tried disabling the ASPM using the pcie_aspm=off kernel parameter
>>>>>> and this made no difference.
>>>>>>
>>>>>> I tried compiling the 4.18.16 r8169.c with the 4.19.18 source and
>>>>>> subsequently loaded the module in the running 4.19.18 kernel. I can
>>>>>> confirm that this immediately resolved the issue and access to the NFS
>>>>>> shares operated as expected.
>>>>>>
>>>>>> I presume this means it is an issue with the r8169 driver included in
>>>>>> 4.19 onwards?
>>>>>>
>>>>>> To answer your last questions:
>>>>>>
>>>>>> Base Board Information
>>>>>>     Manufacturer: Alienware
>>>>>>     Product Name: 0PGRP5
>>>>>>     Version: A02
>>>>>>
>>>>>> ... and yes, the RTL8168 is the onboard network chip.
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Peter.
>>>>>>
>>>>>> On Tue, 29 Jan 2019 at 17:44, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>>>>
>>>>>>> Hi Peter,
>>>>>>>
>>>>>>> I think the vendor driver doesn't enable ASPM per default.
>>>>>>> So it's worth a try to disable ASPM in the BIOS or via sysfs.
>>>>>>> Few older systems seem to have issues with ASPM, what kind of
>>>>>>> system / mainboard are you using? The RTL8168 is the onboard
>>>>>>> network chip?
>>>>>>>
>>>>>>> Rgds, Heiner
>>>>>>>
>>>>>>>
>>>>>>> On 29.01.2019 07:20, Peter Ceiley wrote:
>>>>>>>> Hi Heiner,
>>>>>>>>
>>>>>>>> Thanks, I'll do some more testing. It might not be the driver - I
>>>>>>>> assumed it was due to the fact that using the r8168 driver 'resolves'
>>>>>>>> the issue. I'll see if I can test the r8169.c on top of 4.19 - this is
>>>>>>>> a good idea.
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>>
>>>>>>>> Peter.
>>>>>>>>
>>>>>>>> On Tue, 29 Jan 2019 at 17:16, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> Hi Peter,
>>>>>>>>>
>>>>>>>>> at a first glance it doesn't look like a typical driver issue.
>>>>>>>>> What you could do:
>>>>>>>>>
>>>>>>>>> - Test the r8169.c from 4.18 on top of 4.19.
>>>>>>>>>
>>>>>>>>> - Check whether disabling ASPM (/sys/module/pcie_aspm) has an effect.
>>>>>>>>>
>>>>>>>>> - Bisect between 4.18 and 4.19 to find the offending commit.
>>>>>>>>>
>>>>>>>>> Any specific reason why you think root cause is in the driver and not
>>>>>>>>> elsewhere in the network subsystem?
>>>>>>>>>
>>>>>>>>> Heiner
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 28.01.2019 23:10, Peter Ceiley wrote:
>>>>>>>>>> Hi Heiner,
>>>>>>>>>>
>>>>>>>>>> Thanks for getting back to me.
>>>>>>>>>>
>>>>>>>>>> No, I don't use jumbo packets.
>>>>>>>>>>
>>>>>>>>>> Bandwidth is *generally* good, and iperf results to my NAS provide
>>>>>>>>>> over 900 Mbits/s in both circumstances. The issue seems to appear when
>>>>>>>>>> establishing a connection and is most notable, for example, on my
>>>>>>>>>> mounted NFS shares where it takes seconds (up to 10's of seconds on
>>>>>>>>>> larger directories) to list the contents of each directory. Once a
>>>>>>>>>> transfer begins on a file, I appear to get good bandwidth.
>>>>>>>>>>
>>>>>>>>>> I'm unsure of the best scientific data to provide you in order to
>>>>>>>>>> troubleshoot this issue. Running the following
>>>>>>>>>>
>>>>>>>>>>     netstat -s |grep retransmitted
>>>>>>>>>>
>>>>>>>>>> shows a steady increase in retransmitted segments each time I list the
>>>>>>>>>> contents of a remote directory, for example, running 'ls' on a
>>>>>>>>>> directory containing 345 media files did the following using kernel
>>>>>>>>>> 4.19.18:
>>>>>>>>>>
>>>>>>>>>> increased retransmitted segments by 21 and the 'time' command showed
>>>>>>>>>> the following:
>>>>>>>>>>     real    0m19.867s
>>>>>>>>>>     user    0m0.012s
>>>>>>>>>>     sys    0m0.036s
>>>>>>>>>>
>>>>>>>>>> The same command shows no retransmitted segments running kernel
>>>>>>>>>> 4.18.16 and 'time' showed:
>>>>>>>>>>     real    0m0.300s
>>>>>>>>>>     user    0m0.004s
>>>>>>>>>>     sys    0m0.007s
>>>>>>>>>>
>>>>>>>>>> ifconfig does not show any RX/TX errors nor dropped packets in either case.
>>>>>>>>>>
>>>>>>>>>> dmesg XID:
>>>>>>>>>> [    2.979984] r8169 0000:03:00.0 eth0: RTL8168g/8111g,
>>>>>>>>>> f8:b1:56:fe:67:e0, XID 4c000800, IRQ 32
>>>>>>>>>>
>>>>>>>>>> # lspci -vv
>>>>>>>>>> 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
>>>>>>>>>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
>>>>>>>>>>     Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
>>>>>>>>>>     Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
>>>>>>>>>> ParErr- Stepping- SERR- FastB2B- DisINTx+
>>>>>>>>>>     Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
>>>>>>>>>> <TAbort- <MAbort- >SERR- <PERR- INTx-
>>>>>>>>>>     Latency: 0, Cache Line Size: 64 bytes
>>>>>>>>>>     Interrupt: pin A routed to IRQ 19
>>>>>>>>>>     Region 0: I/O ports at d000 [size=256]
>>>>>>>>>>     Region 2: Memory at f7b00000 (64-bit, non-prefetchable) [size=4K]
>>>>>>>>>>     Region 4: Memory at f2100000 (64-bit, prefetchable) [size=16K]
>>>>>>>>>>     Capabilities: [40] Power Management version 3
>>>>>>>>>>         Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
>>>>>>>>>> PME(D0+,D1+,D2+,D3hot+,D3cold+)
>>>>>>>>>>         Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
>>>>>>>>>>     Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
>>>>>>>>>>         Address: 0000000000000000  Data: 0000
>>>>>>>>>>     Capabilities: [70] Express (v2) Endpoint, MSI 01
>>>>>>>>>>         DevCap:    MaxPayload 128 bytes, PhantFunc 0, Latency L0s
>>>>>>>>>> <512ns, L1 <64us
>>>>>>>>>>             ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
>>>>>>>>>> SlotPowerLimit 10.000W
>>>>>>>>>>         DevCtl:    CorrErr- NonFatalErr- FatalErr- UnsupReq-
>>>>>>>>>>             RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
>>>>>>>>>>             MaxPayload 128 bytes, MaxReadReq 4096 bytes
>>>>>>>>>>         DevSta:    CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
>>>>>>>>>>         LnkCap:    Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
>>>>>>>>>> Latency L0s unlimited, L1 <64us
>>>>>>>>>>             ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
>>>>>>>>>>         LnkCtl:    ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
>>>>>>>>>>             ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
>>>>>>>>>>         LnkSta:    Speed 2.5GT/s (ok), Width x1 (ok)
>>>>>>>>>>             TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
>>>>>>>>>>         DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+,
>>>>>>>>>> OBFF Via message/WAKE#
>>>>>>>>>>              AtomicOpsCap: 32bit- 64bit- 128bitCAS-
>>>>>>>>>>         DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+,
>>>>>>>>>> OBFF Disabled
>>>>>>>>>>              AtomicOpsCtl: ReqEn-
>>>>>>>>>>         LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
>>>>>>>>>>              Transmit Margin: Normal Operating Range,
>>>>>>>>>> EnterModifiedCompliance- ComplianceSOS-
>>>>>>>>>>              Compliance De-emphasis: -6dB
>>>>>>>>>>         LnkSta2: Current De-emphasis Level: -6dB,
>>>>>>>>>> EqualizationComplete-, EqualizationPhase1-
>>>>>>>>>>              EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
>>>>>>>>>>     Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
>>>>>>>>>>         Vector table: BAR=4 offset=00000000
>>>>>>>>>>         PBA: BAR=4 offset=00000800
>>>>>>>>>>     Capabilities: [d0] Vital Product Data
>>>>>>>>>> pcilib: sysfs_read_vpd: read failed: Input/output error
>>>>>>>>>>         Not readable
>>>>>>>>>>     Capabilities: [100 v1] Advanced Error Reporting
>>>>>>>>>>         UESta:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
>>>>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>>>>>>>>>         UEMsk:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
>>>>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>>>>>>>>>         UESvrt:    DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
>>>>>>>>>> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>>>>>>>>>>         CESta:    RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ AdvNonFatalErr-
>>>>>>>>>>         CEMsk:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
>>>>>>>>>>         AERCap:    First Error Pointer: 00, ECRCGenCap+ ECRCGenEn-
>>>>>>>>>> ECRCChkCap+ ECRCChkEn-
>>>>>>>>>>             MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
>>>>>>>>>>         HeaderLog: 00000000 00000000 00000000 00000000
>>>>>>>>>>     Capabilities: [140 v1] Virtual Channel
>>>>>>>>>>         Caps:    LPEVC=0 RefClk=100ns PATEntryBits=1
>>>>>>>>>>         Arb:    Fixed- WRR32- WRR64- WRR128-
>>>>>>>>>>         Ctrl:    ArbSelect=Fixed
>>>>>>>>>>         Status:    InProgress-
>>>>>>>>>>         VC0:    Caps:    PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
>>>>>>>>>>             Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
>>>>>>>>>>             Ctrl:    Enable+ ID=0 ArbSelect=Fixed TC/VC=01
>>>>>>>>>>             Status:    NegoPending- InProgress-
>>>>>>>>>>     Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
>>>>>>>>>>     Capabilities: [170 v1] Latency Tolerance Reporting
>>>>>>>>>>         Max snoop latency: 71680ns
>>>>>>>>>>         Max no snoop latency: 71680ns
>>>>>>>>>>     Kernel driver in use: r8169
>>>>>>>>>>     Kernel modules: r8169
>>>>>>>>>>
>>>>>>>>>> Please let me know if you have any other ideas in terms of testing.
>>>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>>
>>>>>>>>>> Peter.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, 29 Jan 2019 at 05:28, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> On 28.01.2019 12:13, Peter Ceiley wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> I have been experiencing very poor network performance since Kernel
>>>>>>>>>>>> 4.19 and I'm confident it's related to the r8169 driver.
>>>>>>>>>>>>
>>>>>>>>>>>> I have no issue with kernel versions 4.18 and prior. I am experiencing
>>>>>>>>>>>> this issue in kernels 4.19 and 4.20 (currently running/testing with
>>>>>>>>>>>> 4.20.4 & 4.19.18).
>>>>>>>>>>>>
>>>>>>>>>>>> If someone could guide me in the right direction, I'm happy to help
>>>>>>>>>>>> troubleshoot this issue. Note that I have been keeping an eye on one
>>>>>>>>>>>> issue related to loading of the PHY driver, however, my symptoms
>>>>>>>>>>>> differ in that I still have a network connection. I have attempted to
>>>>>>>>>>>> reload the driver on a running system, but this does not improve the
>>>>>>>>>>>> situation.
>>>>>>>>>>>>
>>>>>>>>>>>> Using the proprietary r8168 driver returns my device to proper working order.
>>>>>>>>>>>>
>>>>>>>>>>>> lshw shows:
>>>>>>>>>>>>        description: Ethernet interface
>>>>>>>>>>>>        product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
>>>>>>>>>>>>        vendor: Realtek Semiconductor Co., Ltd.
>>>>>>>>>>>>        physical id: 0
>>>>>>>>>>>>        bus info: pci@0000:03:00.0
>>>>>>>>>>>>        logical name: enp3s0
>>>>>>>>>>>>        version: 0c
>>>>>>>>>>>>        serial:
>>>>>>>>>>>>        size: 1Gbit/s
>>>>>>>>>>>>        capacity: 1Gbit/s
>>>>>>>>>>>>        width: 64 bits
>>>>>>>>>>>>        clock: 33MHz
>>>>>>>>>>>>        capabilities: pm msi pciexpress msix vpd bus_master cap_list
>>>>>>>>>>>> ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd
>>>>>>>>>>>> 1000bt-fd autonegotiation
>>>>>>>>>>>>        configuration: autonegotiation=on broadcast=yes driver=r8169
>>>>>>>>>>>> duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25
>>>>>>>>>>>> latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
>>>>>>>>>>>>        resources: irq:19 ioport:d000(size=256)
>>>>>>>>>>>> memory:f7b00000-f7b00fff memory:f2100000-f2103fff
>>>>>>>>>>>>
>>>>>>>>>>>> Kind Regards,
>>>>>>>>>>>>
>>>>>>>>>>>> Peter.
>>>>>>>>>>>>
>>>>>>>>>>> Hi Peter,
>>>>>>>>>>>
>>>>>>>>>>> the description "poor network performance" is quite vague, therefore:
>>>>>>>>>>>
>>>>>>>>>>> - Can you provide any measurements?
>>>>>>>>>>> - iperf results before and after
>>>>>>>>>>> - statistics about dropped packets (rx and/or tx)
>>>>>>>>>>> - Do you use jumbo packets?
>>>>>>>>>>>
>>>>>>>>>>> Also help would be a "lspci -vv" output for the network card and
>>>>>>>>>>> the dmesg output line with the chip XID.
>>>>>>>>>>>
>>>>>>>>>>> Heiner
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>>
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: r8169 Driver - Poor Network Performance Since Kernel 4.19
  2019-01-31  2:32               ` David Chang
  2019-01-31  6:21                 ` Heiner Kallweit
@ 2019-02-02 12:25                 ` Heiner Kallweit
  2019-02-05 18:50                 ` Heiner Kallweit
  2 siblings, 0 replies; 25+ messages in thread
From: Heiner Kallweit @ 2019-02-02 12:25 UTC (permalink / raw)
  To: David Chang; +Cc: Realtek linux nic maintainers, netdev

Hi David,

to check another potential incompatibility:
Could you please test a 4.19 version with the following line disabled.

Rgds, Heiner

---
 drivers/net/ethernet/realtek/r8169.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index e8a112149..6ef89f518 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -5334,7 +5334,7 @@ static void rtl_hw_start_8168h_1(struct rtl8169_private *tp)
 	r8168_mac_ocp_write(tp, 0xc094, 0x0000);
 	r8168_mac_ocp_write(tp, 0xc09e, 0x0000);
 
-	rtl_hw_aspm_clkreq_enable(tp, true);
+	// rtl_hw_aspm_clkreq_enable(tp, true);
 }
 
 static void rtl_hw_start_8168ep(struct rtl8169_private *tp)
-- 
2.20.1


On 31.01.2019 03:32, David Chang wrote:
> Hi,
> 
> We had a similr case here.
> - Realtek r8169 receive performance regression in kernel 4.19
>   https://bugzilla.suse.com/show_bug.cgi?id=1119649
> 
> kernel: r8169 0000:01:00.0 eth0: RTL8168h/8111h, XID 54100880
> The major symptom is there are many rx_missed count.
> 
> 
> On Jan 30, 2019 at 20:15:45 +0100, Heiner Kallweit wrote:
>> Hi Peter,
>>
>> recently I had somebody where pcie_aspm=off for whatever reason didn't
>> do the trick, can you also check with pcie_aspm.policy=performance.
> 
> We will give it a try later.
> 
>> And please check with "ethtool -S <if>" whether the chip statistics
>> show a significant number of errors.
>>
>> If this doesn't help you may have to bisect to find the offending commit.
> 
> We had tried fallback driver to a few previous commits as following,
> but with no luck.
> 
> 9675931e6b65 r8169: re-enable MSI-X on RTL8168g (v4.19)
> 098b01ad9837 r8169: don't include asm headers directly (v4.19-rc1)
> a2965f12fde6 r8169: remove rtl8169_set_speed_xmii (v4.19-rc1)
> 6fcf9b1d4d6c r8169: fix runtime suspend (v4.19-rc1)
> e397286b8e89 r8169: remove TBI 1000BaseX support (v4.19-rc1)
> 
> Thanks,
> David Chang
> 
>>
>> Heiner
>>
>>
>> On 30.01.2019 10:59, Peter Ceiley wrote:
>>> Hi Heiner,
>>>
>>> I tried disabling the ASPM using the pcie_aspm=off kernel parameter
>>> and this made no difference.
>>>
>>> I tried compiling the 4.18.16 r8169.c with the 4.19.18 source and
>>> subsequently loaded the module in the running 4.19.18 kernel. I can
>>> confirm that this immediately resolved the issue and access to the NFS
>>> shares operated as expected.
>>>
>>> I presume this means it is an issue with the r8169 driver included in
>>> 4.19 onwards?
>>>
>>> To answer your last questions:
>>>
>>> Base Board Information
>>>     Manufacturer: Alienware
>>>     Product Name: 0PGRP5
>>>     Version: A02
>>>
>>> ... and yes, the RTL8168 is the onboard network chip.
>>>
>>> Regards,
>>>
>>> Peter.
>>>
>>> On Tue, 29 Jan 2019 at 17:44, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>
>>>> Hi Peter,
>>>>
>>>> I think the vendor driver doesn't enable ASPM per default.
>>>> So it's worth a try to disable ASPM in the BIOS or via sysfs.
>>>> Few older systems seem to have issues with ASPM, what kind of
>>>> system / mainboard are you using? The RTL8168 is the onboard
>>>> network chip?
>>>>
>>>> Rgds, Heiner
>>>>
>>>>
>>>> On 29.01.2019 07:20, Peter Ceiley wrote:
>>>>> Hi Heiner,
>>>>>
>>>>> Thanks, I'll do some more testing. It might not be the driver - I
>>>>> assumed it was due to the fact that using the r8168 driver 'resolves'
>>>>> the issue. I'll see if I can test the r8169.c on top of 4.19 - this is
>>>>> a good idea.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Peter.
>>>>>
>>>>> On Tue, 29 Jan 2019 at 17:16, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>>>
>>>>>> Hi Peter,
>>>>>>
>>>>>> at a first glance it doesn't look like a typical driver issue.
>>>>>> What you could do:
>>>>>>
>>>>>> - Test the r8169.c from 4.18 on top of 4.19.
>>>>>>
>>>>>> - Check whether disabling ASPM (/sys/module/pcie_aspm) has an effect.
>>>>>>
>>>>>> - Bisect between 4.18 and 4.19 to find the offending commit.
>>>>>>
>>>>>> Any specific reason why you think root cause is in the driver and not
>>>>>> elsewhere in the network subsystem?
>>>>>>
>>>>>> Heiner
>>>>>>
>>>>>>
>>>>>> On 28.01.2019 23:10, Peter Ceiley wrote:
>>>>>>> Hi Heiner,
>>>>>>>
>>>>>>> Thanks for getting back to me.
>>>>>>>
>>>>>>> No, I don't use jumbo packets.
>>>>>>>
>>>>>>> Bandwidth is *generally* good, and iperf results to my NAS provide
>>>>>>> over 900 Mbits/s in both circumstances. The issue seems to appear when
>>>>>>> establishing a connection and is most notable, for example, on my
>>>>>>> mounted NFS shares where it takes seconds (up to 10's of seconds on
>>>>>>> larger directories) to list the contents of each directory. Once a
>>>>>>> transfer begins on a file, I appear to get good bandwidth.
>>>>>>>
>>>>>>> I'm unsure of the best scientific data to provide you in order to
>>>>>>> troubleshoot this issue. Running the following
>>>>>>>
>>>>>>>     netstat -s |grep retransmitted
>>>>>>>
>>>>>>> shows a steady increase in retransmitted segments each time I list the
>>>>>>> contents of a remote directory, for example, running 'ls' on a
>>>>>>> directory containing 345 media files did the following using kernel
>>>>>>> 4.19.18:
>>>>>>>
>>>>>>> increased retransmitted segments by 21 and the 'time' command showed
>>>>>>> the following:
>>>>>>>     real    0m19.867s
>>>>>>>     user    0m0.012s
>>>>>>>     sys    0m0.036s
>>>>>>>
>>>>>>> The same command shows no retransmitted segments running kernel
>>>>>>> 4.18.16 and 'time' showed:
>>>>>>>     real    0m0.300s
>>>>>>>     user    0m0.004s
>>>>>>>     sys    0m0.007s
>>>>>>>
>>>>>>> ifconfig does not show any RX/TX errors nor dropped packets in either case.
>>>>>>>
>>>>>>> dmesg XID:
>>>>>>> [    2.979984] r8169 0000:03:00.0 eth0: RTL8168g/8111g,
>>>>>>> f8:b1:56:fe:67:e0, XID 4c000800, IRQ 32
>>>>>>>
>>>>>>> # lspci -vv
>>>>>>> 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
>>>>>>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
>>>>>>>     Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
>>>>>>>     Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
>>>>>>> ParErr- Stepping- SERR- FastB2B- DisINTx+
>>>>>>>     Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
>>>>>>> <TAbort- <MAbort- >SERR- <PERR- INTx-
>>>>>>>     Latency: 0, Cache Line Size: 64 bytes
>>>>>>>     Interrupt: pin A routed to IRQ 19
>>>>>>>     Region 0: I/O ports at d000 [size=256]
>>>>>>>     Region 2: Memory at f7b00000 (64-bit, non-prefetchable) [size=4K]
>>>>>>>     Region 4: Memory at f2100000 (64-bit, prefetchable) [size=16K]
>>>>>>>     Capabilities: [40] Power Management version 3
>>>>>>>         Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
>>>>>>> PME(D0+,D1+,D2+,D3hot+,D3cold+)
>>>>>>>         Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
>>>>>>>     Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
>>>>>>>         Address: 0000000000000000  Data: 0000
>>>>>>>     Capabilities: [70] Express (v2) Endpoint, MSI 01
>>>>>>>         DevCap:    MaxPayload 128 bytes, PhantFunc 0, Latency L0s
>>>>>>> <512ns, L1 <64us
>>>>>>>             ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
>>>>>>> SlotPowerLimit 10.000W
>>>>>>>         DevCtl:    CorrErr- NonFatalErr- FatalErr- UnsupReq-
>>>>>>>             RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
>>>>>>>             MaxPayload 128 bytes, MaxReadReq 4096 bytes
>>>>>>>         DevSta:    CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
>>>>>>>         LnkCap:    Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
>>>>>>> Latency L0s unlimited, L1 <64us
>>>>>>>             ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
>>>>>>>         LnkCtl:    ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
>>>>>>>             ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
>>>>>>>         LnkSta:    Speed 2.5GT/s (ok), Width x1 (ok)
>>>>>>>             TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
>>>>>>>         DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+,
>>>>>>> OBFF Via message/WAKE#
>>>>>>>              AtomicOpsCap: 32bit- 64bit- 128bitCAS-
>>>>>>>         DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+,
>>>>>>> OBFF Disabled
>>>>>>>              AtomicOpsCtl: ReqEn-
>>>>>>>         LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
>>>>>>>              Transmit Margin: Normal Operating Range,
>>>>>>> EnterModifiedCompliance- ComplianceSOS-
>>>>>>>              Compliance De-emphasis: -6dB
>>>>>>>         LnkSta2: Current De-emphasis Level: -6dB,
>>>>>>> EqualizationComplete-, EqualizationPhase1-
>>>>>>>              EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
>>>>>>>     Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
>>>>>>>         Vector table: BAR=4 offset=00000000
>>>>>>>         PBA: BAR=4 offset=00000800
>>>>>>>     Capabilities: [d0] Vital Product Data
>>>>>>> pcilib: sysfs_read_vpd: read failed: Input/output error
>>>>>>>         Not readable
>>>>>>>     Capabilities: [100 v1] Advanced Error Reporting
>>>>>>>         UESta:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>>>>>>         UEMsk:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>>>>>>         UESvrt:    DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
>>>>>>> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>>>>>>>         CESta:    RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ AdvNonFatalErr-
>>>>>>>         CEMsk:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
>>>>>>>         AERCap:    First Error Pointer: 00, ECRCGenCap+ ECRCGenEn-
>>>>>>> ECRCChkCap+ ECRCChkEn-
>>>>>>>             MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
>>>>>>>         HeaderLog: 00000000 00000000 00000000 00000000
>>>>>>>     Capabilities: [140 v1] Virtual Channel
>>>>>>>         Caps:    LPEVC=0 RefClk=100ns PATEntryBits=1
>>>>>>>         Arb:    Fixed- WRR32- WRR64- WRR128-
>>>>>>>         Ctrl:    ArbSelect=Fixed
>>>>>>>         Status:    InProgress-
>>>>>>>         VC0:    Caps:    PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
>>>>>>>             Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
>>>>>>>             Ctrl:    Enable+ ID=0 ArbSelect=Fixed TC/VC=01
>>>>>>>             Status:    NegoPending- InProgress-
>>>>>>>     Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
>>>>>>>     Capabilities: [170 v1] Latency Tolerance Reporting
>>>>>>>         Max snoop latency: 71680ns
>>>>>>>         Max no snoop latency: 71680ns
>>>>>>>     Kernel driver in use: r8169
>>>>>>>     Kernel modules: r8169
>>>>>>>
>>>>>>> Please let me know if you have any other ideas in terms of testing.
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>> Peter.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, 29 Jan 2019 at 05:28, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>>>>>
>>>>>>>> On 28.01.2019 12:13, Peter Ceiley wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I have been experiencing very poor network performance since Kernel
>>>>>>>>> 4.19 and I'm confident it's related to the r8169 driver.
>>>>>>>>>
>>>>>>>>> I have no issue with kernel versions 4.18 and prior. I am experiencing
>>>>>>>>> this issue in kernels 4.19 and 4.20 (currently running/testing with
>>>>>>>>> 4.20.4 & 4.19.18).
>>>>>>>>>
>>>>>>>>> If someone could guide me in the right direction, I'm happy to help
>>>>>>>>> troubleshoot this issue. Note that I have been keeping an eye on one
>>>>>>>>> issue related to loading of the PHY driver, however, my symptoms
>>>>>>>>> differ in that I still have a network connection. I have attempted to
>>>>>>>>> reload the driver on a running system, but this does not improve the
>>>>>>>>> situation.
>>>>>>>>>
>>>>>>>>> Using the proprietary r8168 driver returns my device to proper working order.
>>>>>>>>>
>>>>>>>>> lshw shows:
>>>>>>>>>        description: Ethernet interface
>>>>>>>>>        product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
>>>>>>>>>        vendor: Realtek Semiconductor Co., Ltd.
>>>>>>>>>        physical id: 0
>>>>>>>>>        bus info: pci@0000:03:00.0
>>>>>>>>>        logical name: enp3s0
>>>>>>>>>        version: 0c
>>>>>>>>>        serial:
>>>>>>>>>        size: 1Gbit/s
>>>>>>>>>        capacity: 1Gbit/s
>>>>>>>>>        width: 64 bits
>>>>>>>>>        clock: 33MHz
>>>>>>>>>        capabilities: pm msi pciexpress msix vpd bus_master cap_list
>>>>>>>>> ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd
>>>>>>>>> 1000bt-fd autonegotiation
>>>>>>>>>        configuration: autonegotiation=on broadcast=yes driver=r8169
>>>>>>>>> duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25
>>>>>>>>> latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
>>>>>>>>>        resources: irq:19 ioport:d000(size=256)
>>>>>>>>> memory:f7b00000-f7b00fff memory:f2100000-f2103fff
>>>>>>>>>
>>>>>>>>> Kind Regards,
>>>>>>>>>
>>>>>>>>> Peter.
>>>>>>>>>
>>>>>>>> Hi Peter,
>>>>>>>>
>>>>>>>> the description "poor network performance" is quite vague, therefore:
>>>>>>>>
>>>>>>>> - Can you provide any measurements?
>>>>>>>> - iperf results before and after
>>>>>>>> - statistics about dropped packets (rx and/or tx)
>>>>>>>> - Do you use jumbo packets?
>>>>>>>>
>>>>>>>> Also help would be a "lspci -vv" output for the network card and
>>>>>>>> the dmesg output line with the chip XID.
>>>>>>>>
>>>>>>>> Heiner
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: r8169 Driver - Poor Network Performance Since Kernel 4.19
  2019-01-31  2:32               ` David Chang
  2019-01-31  6:21                 ` Heiner Kallweit
  2019-02-02 12:25                 ` Heiner Kallweit
@ 2019-02-05 18:50                 ` Heiner Kallweit
  2019-02-05 18:53                   ` Heiner Kallweit
                                     ` (2 more replies)
  2 siblings, 3 replies; 25+ messages in thread
From: Heiner Kallweit @ 2019-02-05 18:50 UTC (permalink / raw)
  To: David Chang; +Cc: Realtek linux nic maintainers, netdev

Hi David,

meanwhile there's the following bug report matching what reported.
It's even the same chip version (RTL8168h).
https://bugzilla.redhat.com/show_bug.cgi?id=1671958

Symptom there is also a significant number of rx_missed packets.
Could you try what I mentioned there last:
Try building a kernel with the call to rtl_hw_aspm_clkreq_enable(tp, true) at the
end of rtl_hw_start_8168h_1() being disabled.

Heiner


On 31.01.2019 03:32, David Chang wrote:
> Hi,
> 
> We had a similr case here.
> - Realtek r8169 receive performance regression in kernel 4.19
>   https://bugzilla.suse.com/show_bug.cgi?id=1119649
> 
> kernel: r8169 0000:01:00.0 eth0: RTL8168h/8111h, XID 54100880
> The major symptom is there are many rx_missed count.
> 
> 
> On Jan 30, 2019 at 20:15:45 +0100, Heiner Kallweit wrote:
>> Hi Peter,
>>
>> recently I had somebody where pcie_aspm=off for whatever reason didn't
>> do the trick, can you also check with pcie_aspm.policy=performance.
> 
> We will give it a try later.
> 
>> And please check with "ethtool -S <if>" whether the chip statistics
>> show a significant number of errors.
>>
>> If this doesn't help you may have to bisect to find the offending commit.
> 
> We had tried fallback driver to a few previous commits as following,
> but with no luck.
> 
> 9675931e6b65 r8169: re-enable MSI-X on RTL8168g (v4.19)
> 098b01ad9837 r8169: don't include asm headers directly (v4.19-rc1)
> a2965f12fde6 r8169: remove rtl8169_set_speed_xmii (v4.19-rc1)
> 6fcf9b1d4d6c r8169: fix runtime suspend (v4.19-rc1)
> e397286b8e89 r8169: remove TBI 1000BaseX support (v4.19-rc1)
> 
> Thanks,
> David Chang
> 
>>
>> Heiner
>>
>>
>> On 30.01.2019 10:59, Peter Ceiley wrote:
>>> Hi Heiner,
>>>
>>> I tried disabling the ASPM using the pcie_aspm=off kernel parameter
>>> and this made no difference.
>>>
>>> I tried compiling the 4.18.16 r8169.c with the 4.19.18 source and
>>> subsequently loaded the module in the running 4.19.18 kernel. I can
>>> confirm that this immediately resolved the issue and access to the NFS
>>> shares operated as expected.
>>>
>>> I presume this means it is an issue with the r8169 driver included in
>>> 4.19 onwards?
>>>
>>> To answer your last questions:
>>>
>>> Base Board Information
>>>     Manufacturer: Alienware
>>>     Product Name: 0PGRP5
>>>     Version: A02
>>>
>>> ... and yes, the RTL8168 is the onboard network chip.
>>>
>>> Regards,
>>>
>>> Peter.
>>>
>>> On Tue, 29 Jan 2019 at 17:44, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>
>>>> Hi Peter,
>>>>
>>>> I think the vendor driver doesn't enable ASPM per default.
>>>> So it's worth a try to disable ASPM in the BIOS or via sysfs.
>>>> Few older systems seem to have issues with ASPM, what kind of
>>>> system / mainboard are you using? The RTL8168 is the onboard
>>>> network chip?
>>>>
>>>> Rgds, Heiner
>>>>
>>>>
>>>> On 29.01.2019 07:20, Peter Ceiley wrote:
>>>>> Hi Heiner,
>>>>>
>>>>> Thanks, I'll do some more testing. It might not be the driver - I
>>>>> assumed it was due to the fact that using the r8168 driver 'resolves'
>>>>> the issue. I'll see if I can test the r8169.c on top of 4.19 - this is
>>>>> a good idea.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Peter.
>>>>>
>>>>> On Tue, 29 Jan 2019 at 17:16, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>>>
>>>>>> Hi Peter,
>>>>>>
>>>>>> at a first glance it doesn't look like a typical driver issue.
>>>>>> What you could do:
>>>>>>
>>>>>> - Test the r8169.c from 4.18 on top of 4.19.
>>>>>>
>>>>>> - Check whether disabling ASPM (/sys/module/pcie_aspm) has an effect.
>>>>>>
>>>>>> - Bisect between 4.18 and 4.19 to find the offending commit.
>>>>>>
>>>>>> Any specific reason why you think root cause is in the driver and not
>>>>>> elsewhere in the network subsystem?
>>>>>>
>>>>>> Heiner
>>>>>>
>>>>>>
>>>>>> On 28.01.2019 23:10, Peter Ceiley wrote:
>>>>>>> Hi Heiner,
>>>>>>>
>>>>>>> Thanks for getting back to me.
>>>>>>>
>>>>>>> No, I don't use jumbo packets.
>>>>>>>
>>>>>>> Bandwidth is *generally* good, and iperf results to my NAS provide
>>>>>>> over 900 Mbits/s in both circumstances. The issue seems to appear when
>>>>>>> establishing a connection and is most notable, for example, on my
>>>>>>> mounted NFS shares where it takes seconds (up to 10's of seconds on
>>>>>>> larger directories) to list the contents of each directory. Once a
>>>>>>> transfer begins on a file, I appear to get good bandwidth.
>>>>>>>
>>>>>>> I'm unsure of the best scientific data to provide you in order to
>>>>>>> troubleshoot this issue. Running the following
>>>>>>>
>>>>>>>     netstat -s |grep retransmitted
>>>>>>>
>>>>>>> shows a steady increase in retransmitted segments each time I list the
>>>>>>> contents of a remote directory, for example, running 'ls' on a
>>>>>>> directory containing 345 media files did the following using kernel
>>>>>>> 4.19.18:
>>>>>>>
>>>>>>> increased retransmitted segments by 21 and the 'time' command showed
>>>>>>> the following:
>>>>>>>     real    0m19.867s
>>>>>>>     user    0m0.012s
>>>>>>>     sys    0m0.036s
>>>>>>>
>>>>>>> The same command shows no retransmitted segments running kernel
>>>>>>> 4.18.16 and 'time' showed:
>>>>>>>     real    0m0.300s
>>>>>>>     user    0m0.004s
>>>>>>>     sys    0m0.007s
>>>>>>>
>>>>>>> ifconfig does not show any RX/TX errors nor dropped packets in either case.
>>>>>>>
>>>>>>> dmesg XID:
>>>>>>> [    2.979984] r8169 0000:03:00.0 eth0: RTL8168g/8111g,
>>>>>>> f8:b1:56:fe:67:e0, XID 4c000800, IRQ 32
>>>>>>>
>>>>>>> # lspci -vv
>>>>>>> 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
>>>>>>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
>>>>>>>     Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
>>>>>>>     Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
>>>>>>> ParErr- Stepping- SERR- FastB2B- DisINTx+
>>>>>>>     Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
>>>>>>> <TAbort- <MAbort- >SERR- <PERR- INTx-
>>>>>>>     Latency: 0, Cache Line Size: 64 bytes
>>>>>>>     Interrupt: pin A routed to IRQ 19
>>>>>>>     Region 0: I/O ports at d000 [size=256]
>>>>>>>     Region 2: Memory at f7b00000 (64-bit, non-prefetchable) [size=4K]
>>>>>>>     Region 4: Memory at f2100000 (64-bit, prefetchable) [size=16K]
>>>>>>>     Capabilities: [40] Power Management version 3
>>>>>>>         Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
>>>>>>> PME(D0+,D1+,D2+,D3hot+,D3cold+)
>>>>>>>         Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
>>>>>>>     Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
>>>>>>>         Address: 0000000000000000  Data: 0000
>>>>>>>     Capabilities: [70] Express (v2) Endpoint, MSI 01
>>>>>>>         DevCap:    MaxPayload 128 bytes, PhantFunc 0, Latency L0s
>>>>>>> <512ns, L1 <64us
>>>>>>>             ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
>>>>>>> SlotPowerLimit 10.000W
>>>>>>>         DevCtl:    CorrErr- NonFatalErr- FatalErr- UnsupReq-
>>>>>>>             RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
>>>>>>>             MaxPayload 128 bytes, MaxReadReq 4096 bytes
>>>>>>>         DevSta:    CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
>>>>>>>         LnkCap:    Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
>>>>>>> Latency L0s unlimited, L1 <64us
>>>>>>>             ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
>>>>>>>         LnkCtl:    ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
>>>>>>>             ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
>>>>>>>         LnkSta:    Speed 2.5GT/s (ok), Width x1 (ok)
>>>>>>>             TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
>>>>>>>         DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+,
>>>>>>> OBFF Via message/WAKE#
>>>>>>>              AtomicOpsCap: 32bit- 64bit- 128bitCAS-
>>>>>>>         DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+,
>>>>>>> OBFF Disabled
>>>>>>>              AtomicOpsCtl: ReqEn-
>>>>>>>         LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
>>>>>>>              Transmit Margin: Normal Operating Range,
>>>>>>> EnterModifiedCompliance- ComplianceSOS-
>>>>>>>              Compliance De-emphasis: -6dB
>>>>>>>         LnkSta2: Current De-emphasis Level: -6dB,
>>>>>>> EqualizationComplete-, EqualizationPhase1-
>>>>>>>              EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
>>>>>>>     Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
>>>>>>>         Vector table: BAR=4 offset=00000000
>>>>>>>         PBA: BAR=4 offset=00000800
>>>>>>>     Capabilities: [d0] Vital Product Data
>>>>>>> pcilib: sysfs_read_vpd: read failed: Input/output error
>>>>>>>         Not readable
>>>>>>>     Capabilities: [100 v1] Advanced Error Reporting
>>>>>>>         UESta:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>>>>>>         UEMsk:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>>>>>>         UESvrt:    DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
>>>>>>> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>>>>>>>         CESta:    RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ AdvNonFatalErr-
>>>>>>>         CEMsk:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
>>>>>>>         AERCap:    First Error Pointer: 00, ECRCGenCap+ ECRCGenEn-
>>>>>>> ECRCChkCap+ ECRCChkEn-
>>>>>>>             MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
>>>>>>>         HeaderLog: 00000000 00000000 00000000 00000000
>>>>>>>     Capabilities: [140 v1] Virtual Channel
>>>>>>>         Caps:    LPEVC=0 RefClk=100ns PATEntryBits=1
>>>>>>>         Arb:    Fixed- WRR32- WRR64- WRR128-
>>>>>>>         Ctrl:    ArbSelect=Fixed
>>>>>>>         Status:    InProgress-
>>>>>>>         VC0:    Caps:    PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
>>>>>>>             Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
>>>>>>>             Ctrl:    Enable+ ID=0 ArbSelect=Fixed TC/VC=01
>>>>>>>             Status:    NegoPending- InProgress-
>>>>>>>     Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
>>>>>>>     Capabilities: [170 v1] Latency Tolerance Reporting
>>>>>>>         Max snoop latency: 71680ns
>>>>>>>         Max no snoop latency: 71680ns
>>>>>>>     Kernel driver in use: r8169
>>>>>>>     Kernel modules: r8169
>>>>>>>
>>>>>>> Please let me know if you have any other ideas in terms of testing.
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>> Peter.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, 29 Jan 2019 at 05:28, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>>>>>
>>>>>>>> On 28.01.2019 12:13, Peter Ceiley wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I have been experiencing very poor network performance since Kernel
>>>>>>>>> 4.19 and I'm confident it's related to the r8169 driver.
>>>>>>>>>
>>>>>>>>> I have no issue with kernel versions 4.18 and prior. I am experiencing
>>>>>>>>> this issue in kernels 4.19 and 4.20 (currently running/testing with
>>>>>>>>> 4.20.4 & 4.19.18).
>>>>>>>>>
>>>>>>>>> If someone could guide me in the right direction, I'm happy to help
>>>>>>>>> troubleshoot this issue. Note that I have been keeping an eye on one
>>>>>>>>> issue related to loading of the PHY driver, however, my symptoms
>>>>>>>>> differ in that I still have a network connection. I have attempted to
>>>>>>>>> reload the driver on a running system, but this does not improve the
>>>>>>>>> situation.
>>>>>>>>>
>>>>>>>>> Using the proprietary r8168 driver returns my device to proper working order.
>>>>>>>>>
>>>>>>>>> lshw shows:
>>>>>>>>>        description: Ethernet interface
>>>>>>>>>        product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
>>>>>>>>>        vendor: Realtek Semiconductor Co., Ltd.
>>>>>>>>>        physical id: 0
>>>>>>>>>        bus info: pci@0000:03:00.0
>>>>>>>>>        logical name: enp3s0
>>>>>>>>>        version: 0c
>>>>>>>>>        serial:
>>>>>>>>>        size: 1Gbit/s
>>>>>>>>>        capacity: 1Gbit/s
>>>>>>>>>        width: 64 bits
>>>>>>>>>        clock: 33MHz
>>>>>>>>>        capabilities: pm msi pciexpress msix vpd bus_master cap_list
>>>>>>>>> ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd
>>>>>>>>> 1000bt-fd autonegotiation
>>>>>>>>>        configuration: autonegotiation=on broadcast=yes driver=r8169
>>>>>>>>> duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25
>>>>>>>>> latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
>>>>>>>>>        resources: irq:19 ioport:d000(size=256)
>>>>>>>>> memory:f7b00000-f7b00fff memory:f2100000-f2103fff
>>>>>>>>>
>>>>>>>>> Kind Regards,
>>>>>>>>>
>>>>>>>>> Peter.
>>>>>>>>>
>>>>>>>> Hi Peter,
>>>>>>>>
>>>>>>>> the description "poor network performance" is quite vague, therefore:
>>>>>>>>
>>>>>>>> - Can you provide any measurements?
>>>>>>>> - iperf results before and after
>>>>>>>> - statistics about dropped packets (rx and/or tx)
>>>>>>>> - Do you use jumbo packets?
>>>>>>>>
>>>>>>>> Also help would be a "lspci -vv" output for the network card and
>>>>>>>> the dmesg output line with the chip XID.
>>>>>>>>
>>>>>>>> Heiner
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: r8169 Driver - Poor Network Performance Since Kernel 4.19
  2019-02-05 18:50                 ` Heiner Kallweit
@ 2019-02-05 18:53                   ` Heiner Kallweit
  2019-02-11  6:23                   ` David Chang
  2019-02-14  2:45                   ` David Chang
  2 siblings, 0 replies; 25+ messages in thread
From: Heiner Kallweit @ 2019-02-05 18:53 UTC (permalink / raw)
  To: David Chang; +Cc: Realtek linux nic maintainers, netdev

By the way: I can't reproduce the issue on a RTL8168g.
So it doesn't seem to be an issue with generic code in the driver.
I would assume it's some kind of incompatibility between activated
chip settings (ASPM etc) and certain systems.

Heiner



On 05.02.2019 19:50, Heiner Kallweit wrote:
> Hi David,
> 
> meanwhile there's the following bug report matching what reported.
> It's even the same chip version (RTL8168h).
> https://bugzilla.redhat.com/show_bug.cgi?id=1671958
> 
> Symptom there is also a significant number of rx_missed packets.
> Could you try what I mentioned there last:
> Try building a kernel with the call to rtl_hw_aspm_clkreq_enable(tp, true) at the
> end of rtl_hw_start_8168h_1() being disabled.
> 
> Heiner
> 
> 
> On 31.01.2019 03:32, David Chang wrote:
>> Hi,
>>
>> We had a similr case here.
>> - Realtek r8169 receive performance regression in kernel 4.19
>>   https://bugzilla.suse.com/show_bug.cgi?id=1119649
>>
>> kernel: r8169 0000:01:00.0 eth0: RTL8168h/8111h, XID 54100880
>> The major symptom is there are many rx_missed count.
>>
>>
>> On Jan 30, 2019 at 20:15:45 +0100, Heiner Kallweit wrote:
>>> Hi Peter,
>>>
>>> recently I had somebody where pcie_aspm=off for whatever reason didn't
>>> do the trick, can you also check with pcie_aspm.policy=performance.
>>
>> We will give it a try later.
>>
>>> And please check with "ethtool -S <if>" whether the chip statistics
>>> show a significant number of errors.
>>>
>>> If this doesn't help you may have to bisect to find the offending commit.
>>
>> We had tried fallback driver to a few previous commits as following,
>> but with no luck.
>>
>> 9675931e6b65 r8169: re-enable MSI-X on RTL8168g (v4.19)
>> 098b01ad9837 r8169: don't include asm headers directly (v4.19-rc1)
>> a2965f12fde6 r8169: remove rtl8169_set_speed_xmii (v4.19-rc1)
>> 6fcf9b1d4d6c r8169: fix runtime suspend (v4.19-rc1)
>> e397286b8e89 r8169: remove TBI 1000BaseX support (v4.19-rc1)
>>
>> Thanks,
>> David Chang
>>
>>>
>>> Heiner
>>>
>>>
>>> On 30.01.2019 10:59, Peter Ceiley wrote:
>>>> Hi Heiner,
>>>>
>>>> I tried disabling the ASPM using the pcie_aspm=off kernel parameter
>>>> and this made no difference.
>>>>
>>>> I tried compiling the 4.18.16 r8169.c with the 4.19.18 source and
>>>> subsequently loaded the module in the running 4.19.18 kernel. I can
>>>> confirm that this immediately resolved the issue and access to the NFS
>>>> shares operated as expected.
>>>>
>>>> I presume this means it is an issue with the r8169 driver included in
>>>> 4.19 onwards?
>>>>
>>>> To answer your last questions:
>>>>
>>>> Base Board Information
>>>>     Manufacturer: Alienware
>>>>     Product Name: 0PGRP5
>>>>     Version: A02
>>>>
>>>> ... and yes, the RTL8168 is the onboard network chip.
>>>>
>>>> Regards,
>>>>
>>>> Peter.
>>>>
>>>> On Tue, 29 Jan 2019 at 17:44, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>>
>>>>> Hi Peter,
>>>>>
>>>>> I think the vendor driver doesn't enable ASPM per default.
>>>>> So it's worth a try to disable ASPM in the BIOS or via sysfs.
>>>>> Few older systems seem to have issues with ASPM, what kind of
>>>>> system / mainboard are you using? The RTL8168 is the onboard
>>>>> network chip?
>>>>>
>>>>> Rgds, Heiner
>>>>>
>>>>>
>>>>> On 29.01.2019 07:20, Peter Ceiley wrote:
>>>>>> Hi Heiner,
>>>>>>
>>>>>> Thanks, I'll do some more testing. It might not be the driver - I
>>>>>> assumed it was due to the fact that using the r8168 driver 'resolves'
>>>>>> the issue. I'll see if I can test the r8169.c on top of 4.19 - this is
>>>>>> a good idea.
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Peter.
>>>>>>
>>>>>> On Tue, 29 Jan 2019 at 17:16, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>>>>
>>>>>>> Hi Peter,
>>>>>>>
>>>>>>> at a first glance it doesn't look like a typical driver issue.
>>>>>>> What you could do:
>>>>>>>
>>>>>>> - Test the r8169.c from 4.18 on top of 4.19.
>>>>>>>
>>>>>>> - Check whether disabling ASPM (/sys/module/pcie_aspm) has an effect.
>>>>>>>
>>>>>>> - Bisect between 4.18 and 4.19 to find the offending commit.
>>>>>>>
>>>>>>> Any specific reason why you think root cause is in the driver and not
>>>>>>> elsewhere in the network subsystem?
>>>>>>>
>>>>>>> Heiner
>>>>>>>
>>>>>>>
>>>>>>> On 28.01.2019 23:10, Peter Ceiley wrote:
>>>>>>>> Hi Heiner,
>>>>>>>>
>>>>>>>> Thanks for getting back to me.
>>>>>>>>
>>>>>>>> No, I don't use jumbo packets.
>>>>>>>>
>>>>>>>> Bandwidth is *generally* good, and iperf results to my NAS provide
>>>>>>>> over 900 Mbits/s in both circumstances. The issue seems to appear when
>>>>>>>> establishing a connection and is most notable, for example, on my
>>>>>>>> mounted NFS shares where it takes seconds (up to 10's of seconds on
>>>>>>>> larger directories) to list the contents of each directory. Once a
>>>>>>>> transfer begins on a file, I appear to get good bandwidth.
>>>>>>>>
>>>>>>>> I'm unsure of the best scientific data to provide you in order to
>>>>>>>> troubleshoot this issue. Running the following
>>>>>>>>
>>>>>>>>     netstat -s |grep retransmitted
>>>>>>>>
>>>>>>>> shows a steady increase in retransmitted segments each time I list the
>>>>>>>> contents of a remote directory, for example, running 'ls' on a
>>>>>>>> directory containing 345 media files did the following using kernel
>>>>>>>> 4.19.18:
>>>>>>>>
>>>>>>>> increased retransmitted segments by 21 and the 'time' command showed
>>>>>>>> the following:
>>>>>>>>     real    0m19.867s
>>>>>>>>     user    0m0.012s
>>>>>>>>     sys    0m0.036s
>>>>>>>>
>>>>>>>> The same command shows no retransmitted segments running kernel
>>>>>>>> 4.18.16 and 'time' showed:
>>>>>>>>     real    0m0.300s
>>>>>>>>     user    0m0.004s
>>>>>>>>     sys    0m0.007s
>>>>>>>>
>>>>>>>> ifconfig does not show any RX/TX errors nor dropped packets in either case.
>>>>>>>>
>>>>>>>> dmesg XID:
>>>>>>>> [    2.979984] r8169 0000:03:00.0 eth0: RTL8168g/8111g,
>>>>>>>> f8:b1:56:fe:67:e0, XID 4c000800, IRQ 32
>>>>>>>>
>>>>>>>> # lspci -vv
>>>>>>>> 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
>>>>>>>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
>>>>>>>>     Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
>>>>>>>>     Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
>>>>>>>> ParErr- Stepping- SERR- FastB2B- DisINTx+
>>>>>>>>     Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
>>>>>>>> <TAbort- <MAbort- >SERR- <PERR- INTx-
>>>>>>>>     Latency: 0, Cache Line Size: 64 bytes
>>>>>>>>     Interrupt: pin A routed to IRQ 19
>>>>>>>>     Region 0: I/O ports at d000 [size=256]
>>>>>>>>     Region 2: Memory at f7b00000 (64-bit, non-prefetchable) [size=4K]
>>>>>>>>     Region 4: Memory at f2100000 (64-bit, prefetchable) [size=16K]
>>>>>>>>     Capabilities: [40] Power Management version 3
>>>>>>>>         Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
>>>>>>>> PME(D0+,D1+,D2+,D3hot+,D3cold+)
>>>>>>>>         Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
>>>>>>>>     Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
>>>>>>>>         Address: 0000000000000000  Data: 0000
>>>>>>>>     Capabilities: [70] Express (v2) Endpoint, MSI 01
>>>>>>>>         DevCap:    MaxPayload 128 bytes, PhantFunc 0, Latency L0s
>>>>>>>> <512ns, L1 <64us
>>>>>>>>             ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
>>>>>>>> SlotPowerLimit 10.000W
>>>>>>>>         DevCtl:    CorrErr- NonFatalErr- FatalErr- UnsupReq-
>>>>>>>>             RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
>>>>>>>>             MaxPayload 128 bytes, MaxReadReq 4096 bytes
>>>>>>>>         DevSta:    CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
>>>>>>>>         LnkCap:    Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
>>>>>>>> Latency L0s unlimited, L1 <64us
>>>>>>>>             ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
>>>>>>>>         LnkCtl:    ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
>>>>>>>>             ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
>>>>>>>>         LnkSta:    Speed 2.5GT/s (ok), Width x1 (ok)
>>>>>>>>             TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
>>>>>>>>         DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+,
>>>>>>>> OBFF Via message/WAKE#
>>>>>>>>              AtomicOpsCap: 32bit- 64bit- 128bitCAS-
>>>>>>>>         DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+,
>>>>>>>> OBFF Disabled
>>>>>>>>              AtomicOpsCtl: ReqEn-
>>>>>>>>         LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
>>>>>>>>              Transmit Margin: Normal Operating Range,
>>>>>>>> EnterModifiedCompliance- ComplianceSOS-
>>>>>>>>              Compliance De-emphasis: -6dB
>>>>>>>>         LnkSta2: Current De-emphasis Level: -6dB,
>>>>>>>> EqualizationComplete-, EqualizationPhase1-
>>>>>>>>              EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
>>>>>>>>     Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
>>>>>>>>         Vector table: BAR=4 offset=00000000
>>>>>>>>         PBA: BAR=4 offset=00000800
>>>>>>>>     Capabilities: [d0] Vital Product Data
>>>>>>>> pcilib: sysfs_read_vpd: read failed: Input/output error
>>>>>>>>         Not readable
>>>>>>>>     Capabilities: [100 v1] Advanced Error Reporting
>>>>>>>>         UESta:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
>>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>>>>>>>         UEMsk:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
>>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>>>>>>>         UESvrt:    DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
>>>>>>>> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>>>>>>>>         CESta:    RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ AdvNonFatalErr-
>>>>>>>>         CEMsk:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
>>>>>>>>         AERCap:    First Error Pointer: 00, ECRCGenCap+ ECRCGenEn-
>>>>>>>> ECRCChkCap+ ECRCChkEn-
>>>>>>>>             MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
>>>>>>>>         HeaderLog: 00000000 00000000 00000000 00000000
>>>>>>>>     Capabilities: [140 v1] Virtual Channel
>>>>>>>>         Caps:    LPEVC=0 RefClk=100ns PATEntryBits=1
>>>>>>>>         Arb:    Fixed- WRR32- WRR64- WRR128-
>>>>>>>>         Ctrl:    ArbSelect=Fixed
>>>>>>>>         Status:    InProgress-
>>>>>>>>         VC0:    Caps:    PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
>>>>>>>>             Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
>>>>>>>>             Ctrl:    Enable+ ID=0 ArbSelect=Fixed TC/VC=01
>>>>>>>>             Status:    NegoPending- InProgress-
>>>>>>>>     Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
>>>>>>>>     Capabilities: [170 v1] Latency Tolerance Reporting
>>>>>>>>         Max snoop latency: 71680ns
>>>>>>>>         Max no snoop latency: 71680ns
>>>>>>>>     Kernel driver in use: r8169
>>>>>>>>     Kernel modules: r8169
>>>>>>>>
>>>>>>>> Please let me know if you have any other ideas in terms of testing.
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>> Peter.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, 29 Jan 2019 at 05:28, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> On 28.01.2019 12:13, Peter Ceiley wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I have been experiencing very poor network performance since Kernel
>>>>>>>>>> 4.19 and I'm confident it's related to the r8169 driver.
>>>>>>>>>>
>>>>>>>>>> I have no issue with kernel versions 4.18 and prior. I am experiencing
>>>>>>>>>> this issue in kernels 4.19 and 4.20 (currently running/testing with
>>>>>>>>>> 4.20.4 & 4.19.18).
>>>>>>>>>>
>>>>>>>>>> If someone could guide me in the right direction, I'm happy to help
>>>>>>>>>> troubleshoot this issue. Note that I have been keeping an eye on one
>>>>>>>>>> issue related to loading of the PHY driver, however, my symptoms
>>>>>>>>>> differ in that I still have a network connection. I have attempted to
>>>>>>>>>> reload the driver on a running system, but this does not improve the
>>>>>>>>>> situation.
>>>>>>>>>>
>>>>>>>>>> Using the proprietary r8168 driver returns my device to proper working order.
>>>>>>>>>>
>>>>>>>>>> lshw shows:
>>>>>>>>>>        description: Ethernet interface
>>>>>>>>>>        product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
>>>>>>>>>>        vendor: Realtek Semiconductor Co., Ltd.
>>>>>>>>>>        physical id: 0
>>>>>>>>>>        bus info: pci@0000:03:00.0
>>>>>>>>>>        logical name: enp3s0
>>>>>>>>>>        version: 0c
>>>>>>>>>>        serial:
>>>>>>>>>>        size: 1Gbit/s
>>>>>>>>>>        capacity: 1Gbit/s
>>>>>>>>>>        width: 64 bits
>>>>>>>>>>        clock: 33MHz
>>>>>>>>>>        capabilities: pm msi pciexpress msix vpd bus_master cap_list
>>>>>>>>>> ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd
>>>>>>>>>> 1000bt-fd autonegotiation
>>>>>>>>>>        configuration: autonegotiation=on broadcast=yes driver=r8169
>>>>>>>>>> duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25
>>>>>>>>>> latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
>>>>>>>>>>        resources: irq:19 ioport:d000(size=256)
>>>>>>>>>> memory:f7b00000-f7b00fff memory:f2100000-f2103fff
>>>>>>>>>>
>>>>>>>>>> Kind Regards,
>>>>>>>>>>
>>>>>>>>>> Peter.
>>>>>>>>>>
>>>>>>>>> Hi Peter,
>>>>>>>>>
>>>>>>>>> the description "poor network performance" is quite vague, therefore:
>>>>>>>>>
>>>>>>>>> - Can you provide any measurements?
>>>>>>>>> - iperf results before and after
>>>>>>>>> - statistics about dropped packets (rx and/or tx)
>>>>>>>>> - Do you use jumbo packets?
>>>>>>>>>
>>>>>>>>> Also help would be a "lspci -vv" output for the network card and
>>>>>>>>> the dmesg output line with the chip XID.
>>>>>>>>>
>>>>>>>>> Heiner
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: r8169 Driver - Poor Network Performance Since Kernel 4.19
  2019-02-05 18:50                 ` Heiner Kallweit
  2019-02-05 18:53                   ` Heiner Kallweit
@ 2019-02-11  6:23                   ` David Chang
  2019-02-14  2:45                   ` David Chang
  2 siblings, 0 replies; 25+ messages in thread
From: David Chang @ 2019-02-11  6:23 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Realtek linux nic maintainers, netdev, Martti Laaksonen

Hi Heiner,

Sorry for late!

On Feb 05, 2019 at 19:50:30 +0100, Heiner Kallweit wrote:
> Hi David,
> 
> meanwhile there's the following bug report matching what reported.
> It's even the same chip version (RTL8168h).
> https://bugzilla.redhat.com/show_bug.cgi?id=1671958
> 
> Symptom there is also a significant number of rx_missed packets.
> Could you try what I mentioned there last:
> Try building a kernel with the call to rtl_hw_aspm_clkreq_enable(tp, true) at the
> end of rtl_hw_start_8168h_1() being disabled.

Will do.

Thanks,
David Chang

> Heiner
> 
> 
> On 31.01.2019 03:32, David Chang wrote:
> > Hi,
> > 
> > We had a similr case here.
> > - Realtek r8169 receive performance regression in kernel 4.19
> >   https://bugzilla.suse.com/show_bug.cgi?id=1119649
> > 
> > kernel: r8169 0000:01:00.0 eth0: RTL8168h/8111h, XID 54100880
> > The major symptom is there are many rx_missed count.
> > 
> > 
> > On Jan 30, 2019 at 20:15:45 +0100, Heiner Kallweit wrote:
> >> Hi Peter,
> >>
> >> recently I had somebody where pcie_aspm=off for whatever reason didn't
> >> do the trick, can you also check with pcie_aspm.policy=performance.
> > 
> > We will give it a try later.
> > 
> >> And please check with "ethtool -S <if>" whether the chip statistics
> >> show a significant number of errors.
> >>
> >> If this doesn't help you may have to bisect to find the offending commit.
> > 
> > We had tried fallback driver to a few previous commits as following,
> > but with no luck.
> > 
> > 9675931e6b65 r8169: re-enable MSI-X on RTL8168g (v4.19)
> > 098b01ad9837 r8169: don't include asm headers directly (v4.19-rc1)
> > a2965f12fde6 r8169: remove rtl8169_set_speed_xmii (v4.19-rc1)
> > 6fcf9b1d4d6c r8169: fix runtime suspend (v4.19-rc1)
> > e397286b8e89 r8169: remove TBI 1000BaseX support (v4.19-rc1)
> > 
> > Thanks,
> > David Chang
> > 
> >>
> >> Heiner
> >>
> >>
> >> On 30.01.2019 10:59, Peter Ceiley wrote:
> >>> Hi Heiner,
> >>>
> >>> I tried disabling the ASPM using the pcie_aspm=off kernel parameter
> >>> and this made no difference.
> >>>
> >>> I tried compiling the 4.18.16 r8169.c with the 4.19.18 source and
> >>> subsequently loaded the module in the running 4.19.18 kernel. I can
> >>> confirm that this immediately resolved the issue and access to the NFS
> >>> shares operated as expected.
> >>>
> >>> I presume this means it is an issue with the r8169 driver included in
> >>> 4.19 onwards?
> >>>
> >>> To answer your last questions:
> >>>
> >>> Base Board Information
> >>>     Manufacturer: Alienware
> >>>     Product Name: 0PGRP5
> >>>     Version: A02
> >>>
> >>> ... and yes, the RTL8168 is the onboard network chip.
> >>>
> >>> Regards,
> >>>
> >>> Peter.
> >>>
> >>> On Tue, 29 Jan 2019 at 17:44, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>>>
> >>>> Hi Peter,
> >>>>
> >>>> I think the vendor driver doesn't enable ASPM per default.
> >>>> So it's worth a try to disable ASPM in the BIOS or via sysfs.
> >>>> Few older systems seem to have issues with ASPM, what kind of
> >>>> system / mainboard are you using? The RTL8168 is the onboard
> >>>> network chip?
> >>>>
> >>>> Rgds, Heiner
> >>>>
> >>>>
> >>>> On 29.01.2019 07:20, Peter Ceiley wrote:
> >>>>> Hi Heiner,
> >>>>>
> >>>>> Thanks, I'll do some more testing. It might not be the driver - I
> >>>>> assumed it was due to the fact that using the r8168 driver 'resolves'
> >>>>> the issue. I'll see if I can test the r8169.c on top of 4.19 - this is
> >>>>> a good idea.
> >>>>>
> >>>>> Cheers,
> >>>>>
> >>>>> Peter.
> >>>>>
> >>>>> On Tue, 29 Jan 2019 at 17:16, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>>>>>
> >>>>>> Hi Peter,
> >>>>>>
> >>>>>> at a first glance it doesn't look like a typical driver issue.
> >>>>>> What you could do:
> >>>>>>
> >>>>>> - Test the r8169.c from 4.18 on top of 4.19.
> >>>>>>
> >>>>>> - Check whether disabling ASPM (/sys/module/pcie_aspm) has an effect.
> >>>>>>
> >>>>>> - Bisect between 4.18 and 4.19 to find the offending commit.
> >>>>>>
> >>>>>> Any specific reason why you think root cause is in the driver and not
> >>>>>> elsewhere in the network subsystem?
> >>>>>>
> >>>>>> Heiner
> >>>>>>
> >>>>>>
> >>>>>> On 28.01.2019 23:10, Peter Ceiley wrote:
> >>>>>>> Hi Heiner,
> >>>>>>>
> >>>>>>> Thanks for getting back to me.
> >>>>>>>
> >>>>>>> No, I don't use jumbo packets.
> >>>>>>>
> >>>>>>> Bandwidth is *generally* good, and iperf results to my NAS provide
> >>>>>>> over 900 Mbits/s in both circumstances. The issue seems to appear when
> >>>>>>> establishing a connection and is most notable, for example, on my
> >>>>>>> mounted NFS shares where it takes seconds (up to 10's of seconds on
> >>>>>>> larger directories) to list the contents of each directory. Once a
> >>>>>>> transfer begins on a file, I appear to get good bandwidth.
> >>>>>>>
> >>>>>>> I'm unsure of the best scientific data to provide you in order to
> >>>>>>> troubleshoot this issue. Running the following
> >>>>>>>
> >>>>>>>     netstat -s |grep retransmitted
> >>>>>>>
> >>>>>>> shows a steady increase in retransmitted segments each time I list the
> >>>>>>> contents of a remote directory, for example, running 'ls' on a
> >>>>>>> directory containing 345 media files did the following using kernel
> >>>>>>> 4.19.18:
> >>>>>>>
> >>>>>>> increased retransmitted segments by 21 and the 'time' command showed
> >>>>>>> the following:
> >>>>>>>     real    0m19.867s
> >>>>>>>     user    0m0.012s
> >>>>>>>     sys    0m0.036s
> >>>>>>>
> >>>>>>> The same command shows no retransmitted segments running kernel
> >>>>>>> 4.18.16 and 'time' showed:
> >>>>>>>     real    0m0.300s
> >>>>>>>     user    0m0.004s
> >>>>>>>     sys    0m0.007s
> >>>>>>>
> >>>>>>> ifconfig does not show any RX/TX errors nor dropped packets in either case.
> >>>>>>>
> >>>>>>> dmesg XID:
> >>>>>>> [    2.979984] r8169 0000:03:00.0 eth0: RTL8168g/8111g,
> >>>>>>> f8:b1:56:fe:67:e0, XID 4c000800, IRQ 32
> >>>>>>>
> >>>>>>> # lspci -vv
> >>>>>>> 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
> >>>>>>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
> >>>>>>>     Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
> >>>>>>>     Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> >>>>>>> ParErr- Stepping- SERR- FastB2B- DisINTx+
> >>>>>>>     Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> >>>>>>> <TAbort- <MAbort- >SERR- <PERR- INTx-
> >>>>>>>     Latency: 0, Cache Line Size: 64 bytes
> >>>>>>>     Interrupt: pin A routed to IRQ 19
> >>>>>>>     Region 0: I/O ports at d000 [size=256]
> >>>>>>>     Region 2: Memory at f7b00000 (64-bit, non-prefetchable) [size=4K]
> >>>>>>>     Region 4: Memory at f2100000 (64-bit, prefetchable) [size=16K]
> >>>>>>>     Capabilities: [40] Power Management version 3
> >>>>>>>         Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
> >>>>>>> PME(D0+,D1+,D2+,D3hot+,D3cold+)
> >>>>>>>         Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> >>>>>>>     Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
> >>>>>>>         Address: 0000000000000000  Data: 0000
> >>>>>>>     Capabilities: [70] Express (v2) Endpoint, MSI 01
> >>>>>>>         DevCap:    MaxPayload 128 bytes, PhantFunc 0, Latency L0s
> >>>>>>> <512ns, L1 <64us
> >>>>>>>             ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
> >>>>>>> SlotPowerLimit 10.000W
> >>>>>>>         DevCtl:    CorrErr- NonFatalErr- FatalErr- UnsupReq-
> >>>>>>>             RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
> >>>>>>>             MaxPayload 128 bytes, MaxReadReq 4096 bytes
> >>>>>>>         DevSta:    CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
> >>>>>>>         LnkCap:    Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
> >>>>>>> Latency L0s unlimited, L1 <64us
> >>>>>>>             ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
> >>>>>>>         LnkCtl:    ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
> >>>>>>>             ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
> >>>>>>>         LnkSta:    Speed 2.5GT/s (ok), Width x1 (ok)
> >>>>>>>             TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> >>>>>>>         DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+,
> >>>>>>> OBFF Via message/WAKE#
> >>>>>>>              AtomicOpsCap: 32bit- 64bit- 128bitCAS-
> >>>>>>>         DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+,
> >>>>>>> OBFF Disabled
> >>>>>>>              AtomicOpsCtl: ReqEn-
> >>>>>>>         LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
> >>>>>>>              Transmit Margin: Normal Operating Range,
> >>>>>>> EnterModifiedCompliance- ComplianceSOS-
> >>>>>>>              Compliance De-emphasis: -6dB
> >>>>>>>         LnkSta2: Current De-emphasis Level: -6dB,
> >>>>>>> EqualizationComplete-, EqualizationPhase1-
> >>>>>>>              EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
> >>>>>>>     Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
> >>>>>>>         Vector table: BAR=4 offset=00000000
> >>>>>>>         PBA: BAR=4 offset=00000800
> >>>>>>>     Capabilities: [d0] Vital Product Data
> >>>>>>> pcilib: sysfs_read_vpd: read failed: Input/output error
> >>>>>>>         Not readable
> >>>>>>>     Capabilities: [100 v1] Advanced Error Reporting
> >>>>>>>         UESta:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> >>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> >>>>>>>         UEMsk:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> >>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> >>>>>>>         UESvrt:    DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
> >>>>>>> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> >>>>>>>         CESta:    RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ AdvNonFatalErr-
> >>>>>>>         CEMsk:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
> >>>>>>>         AERCap:    First Error Pointer: 00, ECRCGenCap+ ECRCGenEn-
> >>>>>>> ECRCChkCap+ ECRCChkEn-
> >>>>>>>             MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
> >>>>>>>         HeaderLog: 00000000 00000000 00000000 00000000
> >>>>>>>     Capabilities: [140 v1] Virtual Channel
> >>>>>>>         Caps:    LPEVC=0 RefClk=100ns PATEntryBits=1
> >>>>>>>         Arb:    Fixed- WRR32- WRR64- WRR128-
> >>>>>>>         Ctrl:    ArbSelect=Fixed
> >>>>>>>         Status:    InProgress-
> >>>>>>>         VC0:    Caps:    PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
> >>>>>>>             Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
> >>>>>>>             Ctrl:    Enable+ ID=0 ArbSelect=Fixed TC/VC=01
> >>>>>>>             Status:    NegoPending- InProgress-
> >>>>>>>     Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
> >>>>>>>     Capabilities: [170 v1] Latency Tolerance Reporting
> >>>>>>>         Max snoop latency: 71680ns
> >>>>>>>         Max no snoop latency: 71680ns
> >>>>>>>     Kernel driver in use: r8169
> >>>>>>>     Kernel modules: r8169
> >>>>>>>
> >>>>>>> Please let me know if you have any other ideas in terms of testing.
> >>>>>>>
> >>>>>>> Thanks!
> >>>>>>>
> >>>>>>> Peter.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Tue, 29 Jan 2019 at 05:28, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>>>>>>>
> >>>>>>>> On 28.01.2019 12:13, Peter Ceiley wrote:
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> I have been experiencing very poor network performance since Kernel
> >>>>>>>>> 4.19 and I'm confident it's related to the r8169 driver.
> >>>>>>>>>
> >>>>>>>>> I have no issue with kernel versions 4.18 and prior. I am experiencing
> >>>>>>>>> this issue in kernels 4.19 and 4.20 (currently running/testing with
> >>>>>>>>> 4.20.4 & 4.19.18).
> >>>>>>>>>
> >>>>>>>>> If someone could guide me in the right direction, I'm happy to help
> >>>>>>>>> troubleshoot this issue. Note that I have been keeping an eye on one
> >>>>>>>>> issue related to loading of the PHY driver, however, my symptoms
> >>>>>>>>> differ in that I still have a network connection. I have attempted to
> >>>>>>>>> reload the driver on a running system, but this does not improve the
> >>>>>>>>> situation.
> >>>>>>>>>
> >>>>>>>>> Using the proprietary r8168 driver returns my device to proper working order.
> >>>>>>>>>
> >>>>>>>>> lshw shows:
> >>>>>>>>>        description: Ethernet interface
> >>>>>>>>>        product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
> >>>>>>>>>        vendor: Realtek Semiconductor Co., Ltd.
> >>>>>>>>>        physical id: 0
> >>>>>>>>>        bus info: pci@0000:03:00.0
> >>>>>>>>>        logical name: enp3s0
> >>>>>>>>>        version: 0c
> >>>>>>>>>        serial:
> >>>>>>>>>        size: 1Gbit/s
> >>>>>>>>>        capacity: 1Gbit/s
> >>>>>>>>>        width: 64 bits
> >>>>>>>>>        clock: 33MHz
> >>>>>>>>>        capabilities: pm msi pciexpress msix vpd bus_master cap_list
> >>>>>>>>> ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd
> >>>>>>>>> 1000bt-fd autonegotiation
> >>>>>>>>>        configuration: autonegotiation=on broadcast=yes driver=r8169
> >>>>>>>>> duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25
> >>>>>>>>> latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
> >>>>>>>>>        resources: irq:19 ioport:d000(size=256)
> >>>>>>>>> memory:f7b00000-f7b00fff memory:f2100000-f2103fff
> >>>>>>>>>
> >>>>>>>>> Kind Regards,
> >>>>>>>>>
> >>>>>>>>> Peter.
> >>>>>>>>>
> >>>>>>>> Hi Peter,
> >>>>>>>>
> >>>>>>>> the description "poor network performance" is quite vague, therefore:
> >>>>>>>>
> >>>>>>>> - Can you provide any measurements?
> >>>>>>>> - iperf results before and after
> >>>>>>>> - statistics about dropped packets (rx and/or tx)
> >>>>>>>> - Do you use jumbo packets?
> >>>>>>>>
> >>>>>>>> Also help would be a "lspci -vv" output for the network card and
> >>>>>>>> the dmesg output line with the chip XID.
> >>>>>>>>
> >>>>>>>> Heiner
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> > 
> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: r8169 Driver - Poor Network Performance Since Kernel 4.19
  2019-02-05 18:50                 ` Heiner Kallweit
  2019-02-05 18:53                   ` Heiner Kallweit
  2019-02-11  6:23                   ` David Chang
@ 2019-02-14  2:45                   ` David Chang
  2019-02-14  6:17                     ` Heiner Kallweit
  2 siblings, 1 reply; 25+ messages in thread
From: David Chang @ 2019-02-14  2:45 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Realtek linux nic maintainers, netdev, Martti Laaksonen

Hi Heiner,

On Feb 05, 2019 at 19:50:30 +0100, Heiner Kallweit wrote:
> Hi David,
> 
> meanwhile there's the following bug report matching what reported.
> It's even the same chip version (RTL8168h).
> https://bugzilla.redhat.com/show_bug.cgi?id=1671958
> 
> Symptom there is also a significant number of rx_missed packets.
> Could you try what I mentioned there last:
> Try building a kernel with the call to rtl_hw_aspm_clkreq_enable(tp, true) at the
> end of rtl_hw_start_8168h_1() being disabled.

After disabled the aspm function that you mentioned, we finally got the
positive testing result. And the rx_missed error was gone. If without
the patch, the receive side get back to bad performance.

kernel: r8169: loading out-of-tree module taints kernel.
kernel: r8169: module verification failed: signature and/or required key missing - tainting kernel
kernel: libphy: r8169: probed
kernel: r8169 0000:01:00.0 eth0: RTL8168h/8111h, ec:8e:b5:5a:2c:f5, XID 54100880, IRQ 128
kernel: r8169 0000:01:00.0 eth0: jumbo features [frames: 9200 bytes, tx checksumming: ko]
kernel: r8169 0000:01:00.0 enp1s0: renamed from eth0
kernel: Generic PHY r8169-100:00: attached PHY driver [Generic PHY] (mii_bus:phy_addr=r8169-100:00, irq=IGNORE)
kernel: r8169 0000:01:00.0 enp1s0: Link is Up - 1Gbps/Full - flow control off

NIC statistics:
     tx_packets: 1653804
     rx_packets: 1555966
     tx_errors: 0
     rx_errors: 0
     rx_missed: 0
     align_errors: 0
     tx_single_collisions: 0
     tx_multi_collisions: 0
     unicast: 1555884
     broadcast: 78
     multicast: 4
     tx_aborted: 0
     tx_underrun: 0

iperf receive:
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
Accepted connection from 10.x.x.x, port 55516
[  5] local 10.x.x.x port 5201 connected to 10.x.x.x port 58172
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   108 MBytes   906 Mbits/sec
[  5]   1.00-2.00   sec   112 MBytes   941 Mbits/sec
[  5]   2.00-3.00   sec   112 MBytes   940 Mbits/sec
[  5]   3.00-4.00   sec   112 MBytes   941 Mbits/sec
[  5]   4.00-5.00   sec   112 MBytes   941 Mbits/sec
[  5]   5.00-6.00   sec   112 MBytes   942 Mbits/sec
[  5]   6.00-7.00   sec   112 MBytes   939 Mbits/sec
[  5]   7.00-8.00   sec   112 MBytes   941 Mbits/sec
[  5]   8.00-9.00   sec   112 MBytes   938 Mbits/sec
[  5]   9.00-10.00  sec   112 MBytes   941 Mbits/sec
[  5]  10.00-11.00  sec   112 MBytes   941 Mbits/sec
[...]
[  5]  50.00-51.00  sec   112 MBytes   941 Mbits/sec
[  5]  51.00-52.00  sec   112 MBytes   941 Mbits/sec
[  5]  52.00-53.00  sec   112 MBytes   942 Mbits/sec
[  5]  53.00-54.00  sec   112 MBytes   941 Mbits/sec
[  5]  54.00-55.00  sec   111 MBytes   934 Mbits/sec
[  5]  55.00-56.00  sec   112 MBytes   942 Mbits/sec
[  5]  56.00-57.00  sec   112 MBytes   937 Mbits/sec
[  5]  57.00-58.00  sec   112 MBytes   941 Mbits/sec
[  5]  58.00-59.00  sec   111 MBytes   932 Mbits/sec
[  5]  59.00-60.00  sec   112 MBytes   942 Mbits/sec
[  5]  60.00-60.04  sec  4.06 MBytes   939 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-60.04  sec  6.57 GBytes   940 Mbits/sec                  receiver

regards,
David

> 
> Heiner
> 
> 
> On 31.01.2019 03:32, David Chang wrote:
> > Hi,
> > 
> > We had a similr case here.
> > - Realtek r8169 receive performance regression in kernel 4.19
> >   https://bugzilla.suse.com/show_bug.cgi?id=1119649
> > 
> > kernel: r8169 0000:01:00.0 eth0: RTL8168h/8111h, XID 54100880
> > The major symptom is there are many rx_missed count.
> > 
> > 
> > On Jan 30, 2019 at 20:15:45 +0100, Heiner Kallweit wrote:
> >> Hi Peter,
> >>
> >> recently I had somebody where pcie_aspm=off for whatever reason didn't
> >> do the trick, can you also check with pcie_aspm.policy=performance.
> > 
> > We will give it a try later.
> > 
> >> And please check with "ethtool -S <if>" whether the chip statistics
> >> show a significant number of errors.
> >>
> >> If this doesn't help you may have to bisect to find the offending commit.
> > 
> > We had tried fallback driver to a few previous commits as following,
> > but with no luck.
> > 
> > 9675931e6b65 r8169: re-enable MSI-X on RTL8168g (v4.19)
> > 098b01ad9837 r8169: don't include asm headers directly (v4.19-rc1)
> > a2965f12fde6 r8169: remove rtl8169_set_speed_xmii (v4.19-rc1)
> > 6fcf9b1d4d6c r8169: fix runtime suspend (v4.19-rc1)
> > e397286b8e89 r8169: remove TBI 1000BaseX support (v4.19-rc1)
> > 
> > Thanks,
> > David Chang
> > 
> >>
> >> Heiner
> >>
> >>
> >> On 30.01.2019 10:59, Peter Ceiley wrote:
> >>> Hi Heiner,
> >>>
> >>> I tried disabling the ASPM using the pcie_aspm=off kernel parameter
> >>> and this made no difference.
> >>>
> >>> I tried compiling the 4.18.16 r8169.c with the 4.19.18 source and
> >>> subsequently loaded the module in the running 4.19.18 kernel. I can
> >>> confirm that this immediately resolved the issue and access to the NFS
> >>> shares operated as expected.
> >>>
> >>> I presume this means it is an issue with the r8169 driver included in
> >>> 4.19 onwards?
> >>>
> >>> To answer your last questions:
> >>>
> >>> Base Board Information
> >>>     Manufacturer: Alienware
> >>>     Product Name: 0PGRP5
> >>>     Version: A02
> >>>
> >>> ... and yes, the RTL8168 is the onboard network chip.
> >>>
> >>> Regards,
> >>>
> >>> Peter.
> >>>
> >>> On Tue, 29 Jan 2019 at 17:44, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>>>
> >>>> Hi Peter,
> >>>>
> >>>> I think the vendor driver doesn't enable ASPM per default.
> >>>> So it's worth a try to disable ASPM in the BIOS or via sysfs.
> >>>> Few older systems seem to have issues with ASPM, what kind of
> >>>> system / mainboard are you using? The RTL8168 is the onboard
> >>>> network chip?
> >>>>
> >>>> Rgds, Heiner
> >>>>
> >>>>
> >>>> On 29.01.2019 07:20, Peter Ceiley wrote:
> >>>>> Hi Heiner,
> >>>>>
> >>>>> Thanks, I'll do some more testing. It might not be the driver - I
> >>>>> assumed it was due to the fact that using the r8168 driver 'resolves'
> >>>>> the issue. I'll see if I can test the r8169.c on top of 4.19 - this is
> >>>>> a good idea.
> >>>>>
> >>>>> Cheers,
> >>>>>
> >>>>> Peter.
> >>>>>
> >>>>> On Tue, 29 Jan 2019 at 17:16, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>>>>>
> >>>>>> Hi Peter,
> >>>>>>
> >>>>>> at a first glance it doesn't look like a typical driver issue.
> >>>>>> What you could do:
> >>>>>>
> >>>>>> - Test the r8169.c from 4.18 on top of 4.19.
> >>>>>>
> >>>>>> - Check whether disabling ASPM (/sys/module/pcie_aspm) has an effect.
> >>>>>>
> >>>>>> - Bisect between 4.18 and 4.19 to find the offending commit.
> >>>>>>
> >>>>>> Any specific reason why you think root cause is in the driver and not
> >>>>>> elsewhere in the network subsystem?
> >>>>>>
> >>>>>> Heiner
> >>>>>>
> >>>>>>
> >>>>>> On 28.01.2019 23:10, Peter Ceiley wrote:
> >>>>>>> Hi Heiner,
> >>>>>>>
> >>>>>>> Thanks for getting back to me.
> >>>>>>>
> >>>>>>> No, I don't use jumbo packets.
> >>>>>>>
> >>>>>>> Bandwidth is *generally* good, and iperf results to my NAS provide
> >>>>>>> over 900 Mbits/s in both circumstances. The issue seems to appear when
> >>>>>>> establishing a connection and is most notable, for example, on my
> >>>>>>> mounted NFS shares where it takes seconds (up to 10's of seconds on
> >>>>>>> larger directories) to list the contents of each directory. Once a
> >>>>>>> transfer begins on a file, I appear to get good bandwidth.
> >>>>>>>
> >>>>>>> I'm unsure of the best scientific data to provide you in order to
> >>>>>>> troubleshoot this issue. Running the following
> >>>>>>>
> >>>>>>>     netstat -s |grep retransmitted
> >>>>>>>
> >>>>>>> shows a steady increase in retransmitted segments each time I list the
> >>>>>>> contents of a remote directory, for example, running 'ls' on a
> >>>>>>> directory containing 345 media files did the following using kernel
> >>>>>>> 4.19.18:
> >>>>>>>
> >>>>>>> increased retransmitted segments by 21 and the 'time' command showed
> >>>>>>> the following:
> >>>>>>>     real    0m19.867s
> >>>>>>>     user    0m0.012s
> >>>>>>>     sys    0m0.036s
> >>>>>>>
> >>>>>>> The same command shows no retransmitted segments running kernel
> >>>>>>> 4.18.16 and 'time' showed:
> >>>>>>>     real    0m0.300s
> >>>>>>>     user    0m0.004s
> >>>>>>>     sys    0m0.007s
> >>>>>>>
> >>>>>>> ifconfig does not show any RX/TX errors nor dropped packets in either case.
> >>>>>>>
> >>>>>>> dmesg XID:
> >>>>>>> [    2.979984] r8169 0000:03:00.0 eth0: RTL8168g/8111g,
> >>>>>>> f8:b1:56:fe:67:e0, XID 4c000800, IRQ 32
> >>>>>>>
> >>>>>>> # lspci -vv
> >>>>>>> 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
> >>>>>>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
> >>>>>>>     Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
> >>>>>>>     Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> >>>>>>> ParErr- Stepping- SERR- FastB2B- DisINTx+
> >>>>>>>     Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> >>>>>>> <TAbort- <MAbort- >SERR- <PERR- INTx-
> >>>>>>>     Latency: 0, Cache Line Size: 64 bytes
> >>>>>>>     Interrupt: pin A routed to IRQ 19
> >>>>>>>     Region 0: I/O ports at d000 [size=256]
> >>>>>>>     Region 2: Memory at f7b00000 (64-bit, non-prefetchable) [size=4K]
> >>>>>>>     Region 4: Memory at f2100000 (64-bit, prefetchable) [size=16K]
> >>>>>>>     Capabilities: [40] Power Management version 3
> >>>>>>>         Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
> >>>>>>> PME(D0+,D1+,D2+,D3hot+,D3cold+)
> >>>>>>>         Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> >>>>>>>     Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
> >>>>>>>         Address: 0000000000000000  Data: 0000
> >>>>>>>     Capabilities: [70] Express (v2) Endpoint, MSI 01
> >>>>>>>         DevCap:    MaxPayload 128 bytes, PhantFunc 0, Latency L0s
> >>>>>>> <512ns, L1 <64us
> >>>>>>>             ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
> >>>>>>> SlotPowerLimit 10.000W
> >>>>>>>         DevCtl:    CorrErr- NonFatalErr- FatalErr- UnsupReq-
> >>>>>>>             RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
> >>>>>>>             MaxPayload 128 bytes, MaxReadReq 4096 bytes
> >>>>>>>         DevSta:    CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
> >>>>>>>         LnkCap:    Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
> >>>>>>> Latency L0s unlimited, L1 <64us
> >>>>>>>             ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
> >>>>>>>         LnkCtl:    ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
> >>>>>>>             ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
> >>>>>>>         LnkSta:    Speed 2.5GT/s (ok), Width x1 (ok)
> >>>>>>>             TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> >>>>>>>         DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+,
> >>>>>>> OBFF Via message/WAKE#
> >>>>>>>              AtomicOpsCap: 32bit- 64bit- 128bitCAS-
> >>>>>>>         DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+,
> >>>>>>> OBFF Disabled
> >>>>>>>              AtomicOpsCtl: ReqEn-
> >>>>>>>         LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
> >>>>>>>              Transmit Margin: Normal Operating Range,
> >>>>>>> EnterModifiedCompliance- ComplianceSOS-
> >>>>>>>              Compliance De-emphasis: -6dB
> >>>>>>>         LnkSta2: Current De-emphasis Level: -6dB,
> >>>>>>> EqualizationComplete-, EqualizationPhase1-
> >>>>>>>              EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
> >>>>>>>     Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
> >>>>>>>         Vector table: BAR=4 offset=00000000
> >>>>>>>         PBA: BAR=4 offset=00000800
> >>>>>>>     Capabilities: [d0] Vital Product Data
> >>>>>>> pcilib: sysfs_read_vpd: read failed: Input/output error
> >>>>>>>         Not readable
> >>>>>>>     Capabilities: [100 v1] Advanced Error Reporting
> >>>>>>>         UESta:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> >>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> >>>>>>>         UEMsk:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> >>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> >>>>>>>         UESvrt:    DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
> >>>>>>> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> >>>>>>>         CESta:    RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ AdvNonFatalErr-
> >>>>>>>         CEMsk:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
> >>>>>>>         AERCap:    First Error Pointer: 00, ECRCGenCap+ ECRCGenEn-
> >>>>>>> ECRCChkCap+ ECRCChkEn-
> >>>>>>>             MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
> >>>>>>>         HeaderLog: 00000000 00000000 00000000 00000000
> >>>>>>>     Capabilities: [140 v1] Virtual Channel
> >>>>>>>         Caps:    LPEVC=0 RefClk=100ns PATEntryBits=1
> >>>>>>>         Arb:    Fixed- WRR32- WRR64- WRR128-
> >>>>>>>         Ctrl:    ArbSelect=Fixed
> >>>>>>>         Status:    InProgress-
> >>>>>>>         VC0:    Caps:    PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
> >>>>>>>             Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
> >>>>>>>             Ctrl:    Enable+ ID=0 ArbSelect=Fixed TC/VC=01
> >>>>>>>             Status:    NegoPending- InProgress-
> >>>>>>>     Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
> >>>>>>>     Capabilities: [170 v1] Latency Tolerance Reporting
> >>>>>>>         Max snoop latency: 71680ns
> >>>>>>>         Max no snoop latency: 71680ns
> >>>>>>>     Kernel driver in use: r8169
> >>>>>>>     Kernel modules: r8169
> >>>>>>>
> >>>>>>> Please let me know if you have any other ideas in terms of testing.
> >>>>>>>
> >>>>>>> Thanks!
> >>>>>>>
> >>>>>>> Peter.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Tue, 29 Jan 2019 at 05:28, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>>>>>>>
> >>>>>>>> On 28.01.2019 12:13, Peter Ceiley wrote:
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> I have been experiencing very poor network performance since Kernel
> >>>>>>>>> 4.19 and I'm confident it's related to the r8169 driver.
> >>>>>>>>>
> >>>>>>>>> I have no issue with kernel versions 4.18 and prior. I am experiencing
> >>>>>>>>> this issue in kernels 4.19 and 4.20 (currently running/testing with
> >>>>>>>>> 4.20.4 & 4.19.18).
> >>>>>>>>>
> >>>>>>>>> If someone could guide me in the right direction, I'm happy to help
> >>>>>>>>> troubleshoot this issue. Note that I have been keeping an eye on one
> >>>>>>>>> issue related to loading of the PHY driver, however, my symptoms
> >>>>>>>>> differ in that I still have a network connection. I have attempted to
> >>>>>>>>> reload the driver on a running system, but this does not improve the
> >>>>>>>>> situation.
> >>>>>>>>>
> >>>>>>>>> Using the proprietary r8168 driver returns my device to proper working order.
> >>>>>>>>>
> >>>>>>>>> lshw shows:
> >>>>>>>>>        description: Ethernet interface
> >>>>>>>>>        product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
> >>>>>>>>>        vendor: Realtek Semiconductor Co., Ltd.
> >>>>>>>>>        physical id: 0
> >>>>>>>>>        bus info: pci@0000:03:00.0
> >>>>>>>>>        logical name: enp3s0
> >>>>>>>>>        version: 0c
> >>>>>>>>>        serial:
> >>>>>>>>>        size: 1Gbit/s
> >>>>>>>>>        capacity: 1Gbit/s
> >>>>>>>>>        width: 64 bits
> >>>>>>>>>        clock: 33MHz
> >>>>>>>>>        capabilities: pm msi pciexpress msix vpd bus_master cap_list
> >>>>>>>>> ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd
> >>>>>>>>> 1000bt-fd autonegotiation
> >>>>>>>>>        configuration: autonegotiation=on broadcast=yes driver=r8169
> >>>>>>>>> duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25
> >>>>>>>>> latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
> >>>>>>>>>        resources: irq:19 ioport:d000(size=256)
> >>>>>>>>> memory:f7b00000-f7b00fff memory:f2100000-f2103fff
> >>>>>>>>>
> >>>>>>>>> Kind Regards,
> >>>>>>>>>
> >>>>>>>>> Peter.
> >>>>>>>>>
> >>>>>>>> Hi Peter,
> >>>>>>>>
> >>>>>>>> the description "poor network performance" is quite vague, therefore:
> >>>>>>>>
> >>>>>>>> - Can you provide any measurements?
> >>>>>>>> - iperf results before and after
> >>>>>>>> - statistics about dropped packets (rx and/or tx)
> >>>>>>>> - Do you use jumbo packets?
> >>>>>>>>
> >>>>>>>> Also help would be a "lspci -vv" output for the network card and
> >>>>>>>> the dmesg output line with the chip XID.
> >>>>>>>>
> >>>>>>>> Heiner
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> > 
> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: r8169 Driver - Poor Network Performance Since Kernel 4.19
  2019-02-14  2:45                   ` David Chang
@ 2019-02-14  6:17                     ` Heiner Kallweit
  2019-02-15  2:51                       ` David Chang
  0 siblings, 1 reply; 25+ messages in thread
From: Heiner Kallweit @ 2019-02-14  6:17 UTC (permalink / raw)
  To: David Chang; +Cc: Realtek linux nic maintainers, netdev, Martti Laaksonen

Hi David,

On 14.02.2019 03:45, David Chang wrote:
> Hi Heiner,
> 
> On Feb 05, 2019 at 19:50:30 +0100, Heiner Kallweit wrote:
>> Hi David,
>>
>> meanwhile there's the following bug report matching what reported.
>> It's even the same chip version (RTL8168h).
>> https://bugzilla.redhat.com/show_bug.cgi?id=1671958
>>
>> Symptom there is also a significant number of rx_missed packets.
>> Could you try what I mentioned there last:
>> Try building a kernel with the call to rtl_hw_aspm_clkreq_enable(tp, true) at the
>> end of rtl_hw_start_8168h_1() being disabled.
> 
> After disabled the aspm function that you mentioned, we finally got the
> positive testing result. And the rx_missed error was gone. If without
> the patch, the receive side get back to bad performance.
> 
Good to know, thanks. I also checked with Realtek, they confirmed that their Windows
driver uses some heuristics to disable ASPM under high load. So it seems like there
is some hw issue. Open so far is whether this affects certain chip versions only.
Let's see whether they can provide more information.
Disabling ASPM in general would hurt notebook users because based on some past
measurements we know ASPM can significantly save energy.

> kernel: r8169: loading out-of-tree module taints kernel.
> kernel: r8169: module verification failed: signature and/or required key missing - tainting kernel
> kernel: libphy: r8169: probed
> kernel: r8169 0000:01:00.0 eth0: RTL8168h/8111h, ec:8e:b5:5a:2c:f5, XID 54100880, IRQ 128
> kernel: r8169 0000:01:00.0 eth0: jumbo features [frames: 9200 bytes, tx checksumming: ko]
> kernel: r8169 0000:01:00.0 enp1s0: renamed from eth0
> kernel: Generic PHY r8169-100:00: attached PHY driver [Generic PHY] (mii_bus:phy_addr=r8169-100:00, irq=IGNORE)
> kernel: r8169 0000:01:00.0 enp1s0: Link is Up - 1Gbps/Full - flow control off
> 
> NIC statistics:
>      tx_packets: 1653804
>      rx_packets: 1555966
>      tx_errors: 0
>      rx_errors: 0
>      rx_missed: 0
>      align_errors: 0
>      tx_single_collisions: 0
>      tx_multi_collisions: 0
>      unicast: 1555884
>      broadcast: 78
>      multicast: 4
>      tx_aborted: 0
>      tx_underrun: 0
> 
> iperf receive:
> -----------------------------------------------------------
> Server listening on 5201
> -----------------------------------------------------------
> Accepted connection from 10.x.x.x, port 55516
> [  5] local 10.x.x.x port 5201 connected to 10.x.x.x port 58172
> [ ID] Interval           Transfer     Bitrate
> [  5]   0.00-1.00   sec   108 MBytes   906 Mbits/sec
> [  5]   1.00-2.00   sec   112 MBytes   941 Mbits/sec
> [  5]   2.00-3.00   sec   112 MBytes   940 Mbits/sec
> [  5]   3.00-4.00   sec   112 MBytes   941 Mbits/sec
> [  5]   4.00-5.00   sec   112 MBytes   941 Mbits/sec
> [  5]   5.00-6.00   sec   112 MBytes   942 Mbits/sec
> [  5]   6.00-7.00   sec   112 MBytes   939 Mbits/sec
> [  5]   7.00-8.00   sec   112 MBytes   941 Mbits/sec
> [  5]   8.00-9.00   sec   112 MBytes   938 Mbits/sec
> [  5]   9.00-10.00  sec   112 MBytes   941 Mbits/sec
> [  5]  10.00-11.00  sec   112 MBytes   941 Mbits/sec
> [...]
> [  5]  50.00-51.00  sec   112 MBytes   941 Mbits/sec
> [  5]  51.00-52.00  sec   112 MBytes   941 Mbits/sec
> [  5]  52.00-53.00  sec   112 MBytes   942 Mbits/sec
> [  5]  53.00-54.00  sec   112 MBytes   941 Mbits/sec
> [  5]  54.00-55.00  sec   111 MBytes   934 Mbits/sec
> [  5]  55.00-56.00  sec   112 MBytes   942 Mbits/sec
> [  5]  56.00-57.00  sec   112 MBytes   937 Mbits/sec
> [  5]  57.00-58.00  sec   112 MBytes   941 Mbits/sec
> [  5]  58.00-59.00  sec   111 MBytes   932 Mbits/sec
> [  5]  59.00-60.00  sec   112 MBytes   942 Mbits/sec
> [  5]  60.00-60.04  sec  4.06 MBytes   939 Mbits/sec
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval           Transfer     Bitrate
> [  5]   0.00-60.04  sec  6.57 GBytes   940 Mbits/sec                  receiver
> 
> regards,
> David
> 
Heiner

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: r8169 Driver - Poor Network Performance Since Kernel 4.19
  2019-02-14  6:17                     ` Heiner Kallweit
@ 2019-02-15  2:51                       ` David Chang
  0 siblings, 0 replies; 25+ messages in thread
From: David Chang @ 2019-02-15  2:51 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Realtek linux nic maintainers, netdev, Martti Laaksonen

Hi Heiner,

On Feb 14, 2019 at 07:17:44 +0100, Heiner Kallweit wrote:
> Hi David,
> 
> On 14.02.2019 03:45, David Chang wrote:
> > Hi Heiner,
> > 
> > On Feb 05, 2019 at 19:50:30 +0100, Heiner Kallweit wrote:
> >> Hi David,
> >>
> >> meanwhile there's the following bug report matching what reported.
> >> It's even the same chip version (RTL8168h).
> >> https://bugzilla.redhat.com/show_bug.cgi?id=1671958
> >>
> >> Symptom there is also a significant number of rx_missed packets.
> >> Could you try what I mentioned there last:
> >> Try building a kernel with the call to rtl_hw_aspm_clkreq_enable(tp, true) at the
> >> end of rtl_hw_start_8168h_1() being disabled.
> > 
> > After disabled the aspm function that you mentioned, we finally got the
> > positive testing result. And the rx_missed error was gone. If without
> > the patch, the receive side get back to bad performance.
> > 
> Good to know, thanks. I also checked with Realtek, they confirmed that their Windows
> driver uses some heuristics to disable ASPM under high load. So it seems like there
> is some hw issue. Open so far is whether this affects certain chip versions only.
> Let's see whether they can provide more information.

Ok!

> Disabling ASPM in general would hurt notebook users because based on some past
> measurements we know ASPM can significantly save energy.

I understand, thanks!

regards,
David
> 
> > kernel: r8169: loading out-of-tree module taints kernel.
> > kernel: r8169: module verification failed: signature and/or required key missing - tainting kernel
> > kernel: libphy: r8169: probed
> > kernel: r8169 0000:01:00.0 eth0: RTL8168h/8111h, ec:8e:b5:5a:2c:f5, XID 54100880, IRQ 128
> > kernel: r8169 0000:01:00.0 eth0: jumbo features [frames: 9200 bytes, tx checksumming: ko]
> > kernel: r8169 0000:01:00.0 enp1s0: renamed from eth0
> > kernel: Generic PHY r8169-100:00: attached PHY driver [Generic PHY] (mii_bus:phy_addr=r8169-100:00, irq=IGNORE)
> > kernel: r8169 0000:01:00.0 enp1s0: Link is Up - 1Gbps/Full - flow control off
> > 
> > NIC statistics:
> >      tx_packets: 1653804
> >      rx_packets: 1555966
> >      tx_errors: 0
> >      rx_errors: 0
> >      rx_missed: 0
> >      align_errors: 0
> >      tx_single_collisions: 0
> >      tx_multi_collisions: 0
> >      unicast: 1555884
> >      broadcast: 78
> >      multicast: 4
> >      tx_aborted: 0
> >      tx_underrun: 0
> > 
> > iperf receive:
> > -----------------------------------------------------------
> > Server listening on 5201
> > -----------------------------------------------------------
> > Accepted connection from 10.x.x.x, port 55516
> > [  5] local 10.x.x.x port 5201 connected to 10.x.x.x port 58172
> > [ ID] Interval           Transfer     Bitrate
> > [  5]   0.00-1.00   sec   108 MBytes   906 Mbits/sec
> > [  5]   1.00-2.00   sec   112 MBytes   941 Mbits/sec
> > [  5]   2.00-3.00   sec   112 MBytes   940 Mbits/sec
> > [  5]   3.00-4.00   sec   112 MBytes   941 Mbits/sec
> > [  5]   4.00-5.00   sec   112 MBytes   941 Mbits/sec
> > [  5]   5.00-6.00   sec   112 MBytes   942 Mbits/sec
> > [  5]   6.00-7.00   sec   112 MBytes   939 Mbits/sec
> > [  5]   7.00-8.00   sec   112 MBytes   941 Mbits/sec
> > [  5]   8.00-9.00   sec   112 MBytes   938 Mbits/sec
> > [  5]   9.00-10.00  sec   112 MBytes   941 Mbits/sec
> > [  5]  10.00-11.00  sec   112 MBytes   941 Mbits/sec
> > [...]
> > [  5]  50.00-51.00  sec   112 MBytes   941 Mbits/sec
> > [  5]  51.00-52.00  sec   112 MBytes   941 Mbits/sec
> > [  5]  52.00-53.00  sec   112 MBytes   942 Mbits/sec
> > [  5]  53.00-54.00  sec   112 MBytes   941 Mbits/sec
> > [  5]  54.00-55.00  sec   111 MBytes   934 Mbits/sec
> > [  5]  55.00-56.00  sec   112 MBytes   942 Mbits/sec
> > [  5]  56.00-57.00  sec   112 MBytes   937 Mbits/sec
> > [  5]  57.00-58.00  sec   112 MBytes   941 Mbits/sec
> > [  5]  58.00-59.00  sec   111 MBytes   932 Mbits/sec
> > [  5]  59.00-60.00  sec   112 MBytes   942 Mbits/sec
> > [  5]  60.00-60.04  sec  4.06 MBytes   939 Mbits/sec
> > - - - - - - - - - - - - - - - - - - - - - - - - -
> > [ ID] Interval           Transfer     Bitrate
> > [  5]   0.00-60.04  sec  6.57 GBytes   940 Mbits/sec                  receiver
> > 
> > regards,
> > David
> > 
> Heiner
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, back to index

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-28 11:13 r8169 Driver - Poor Network Performance Since Kernel 4.19 Peter Ceiley
2019-01-28 18:28 ` Heiner Kallweit
2019-01-28 22:10   ` Peter Ceiley
2019-01-29  6:16     ` Heiner Kallweit
2019-01-29  6:20       ` Peter Ceiley
2019-01-29  6:44         ` Heiner Kallweit
2019-01-30  9:59           ` Peter Ceiley
2019-01-30 19:15             ` Heiner Kallweit
2019-01-31  2:32               ` David Chang
2019-01-31  6:21                 ` Heiner Kallweit
2019-01-31  6:35                   ` Heiner Kallweit
2019-01-31  6:49                     ` Heiner Kallweit
2019-01-31  7:23                     ` David Chang
2019-01-31 12:09                       ` Peter Ceiley
2019-01-31 18:28                         ` Heiner Kallweit
2019-02-01  4:27                           ` David Chang
2019-02-01  4:29                     ` David Chang
2019-02-01  6:32                       ` Heiner Kallweit
2019-02-02 12:25                 ` Heiner Kallweit
2019-02-05 18:50                 ` Heiner Kallweit
2019-02-05 18:53                   ` Heiner Kallweit
2019-02-11  6:23                   ` David Chang
2019-02-14  2:45                   ` David Chang
2019-02-14  6:17                     ` Heiner Kallweit
2019-02-15  2:51                       ` David Chang

Netdev Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/netdev/0 netdev/git/0.git
	git clone --mirror https://lore.kernel.org/netdev/1 netdev/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 netdev netdev/ https://lore.kernel.org/netdev \
		netdev@vger.kernel.org netdev@archiver.kernel.org
	public-inbox-index netdev


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.netdev


AGPL code for this site: git clone https://public-inbox.org/ public-inbox