linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* kernel 4.18.5 Realtek 8111G network adapter stops responding under high system load
@ 2018-09-04  6:19 David Arendt
  2018-09-15 21:23 ` David Arendt
  0 siblings, 1 reply; 14+ messages in thread
From: David Arendt @ 2018-09-04  6:19 UTC (permalink / raw)
  To: linux-kernel

Hi,

When using kernel 4.18.5 the Realtek 8111G network adapter stops
responding under high system load.

Dmesg is showing no errors.

Sometimes an ifconfig enp3s0 down followed by an ifconfig enp3s0 up is
enough for the network adapter to restart responding. Sometimes a reboot
is necessary.

When copying r8169.c from 4.17.14 to the 4.18.5 kernel, networking works
perfectly stable on 4.18.5 so the problem seems r8169.c related.

Here the output from lshw:

        *-pci:2
             description: PCI bridge
             product: 8 Series/C220 Series Chipset Family PCI Express
Root Port #3
             vendor: Intel Corporation
             physical id: 1c.2
             bus info: pci@0000:00:1c.2
             version: d5
             width: 32 bits
             clock: 33MHz
             capabilities: pci pciexpress msi pm normal_decode
bus_master cap_list
             configuration: driver=pcieport
             resources: irq:18 ioport:d000(size=4096)
memory:f7300000-f73fffff ioport:f2100000(size=1048576)
           *-network
                description: Ethernet interface
                product: RTL8111/8168/8411 PCI Express Gigabit Ethernet
Controller
                vendor: Realtek Semiconductor Co., Ltd.
                physical id: 0
                bus info: pci@0000:03:00.0
                logical name: enp3s0
                version: 0c
                serial: <hidden>
                size: 1Gbit/s
                capacity: 1Gbit/s
                width: 64 bits
                clock: 33MHz
                capabilities: pm msi pciexpress msix vpd bus_master
cap_list ethernet physical tp mii 10bt 10bt-fd 100bt 100bt-fd 1000bt
1000bt-fd autonegotiation
                configuration: autonegotiation=on broadcast=yes
driver=r8169 driverversion=2.3LK-NAPI duplex=full
firmware=rtl8168g-2_0.0.1 02/06/13 latency=0 link=yes multicast=yes
port=MII speed=1Gbit/s
                resources: irq:18 ioport:d000(size=256)
memory:f7300000-f7300fff memory:f2100000-f2103fff

Thanks in advance for looking into this,

David Arendt



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kernel 4.18.5 Realtek 8111G network adapter stops responding under high system load
  2018-09-04  6:19 kernel 4.18.5 Realtek 8111G network adapter stops responding under high system load David Arendt
@ 2018-09-15 21:23 ` David Arendt
  2018-09-15 23:54   ` Maciej S. Szmigiero
  0 siblings, 1 reply; 14+ messages in thread
From: David Arendt @ 2018-09-15 21:23 UTC (permalink / raw)
  To: linux-kernel

Hi,

just a follow up:

In kernel 4.18.8 the behaviour is different.

The network is not reachable a number of times, but restarting to be
reachable by itself before it finally is no longer reachable at all.

Here the logging output:

Sep 15 17:44:43 server kernel: NETDEV WATCHDOG: enp3s0 (r8169): transmit
queue 0 timed out
Sep 15 17:44:43 server kernel: r8169 0000:03:00.0 enp3s0: link up
Sep 15 18:10:26 server kernel: r8169 0000:03:00.0 enp3s0: link up
Sep 15 18:12:24 server kernel: r8169 0000:03:00.0 enp3s0: link up
Sep 15 18:13:19 server kernel: r8169 0000:03:00.0 enp3s0: link up
Sep 15 18:14:48 server kernel: r8169 0000:03:00.0 enp3s0: link up
Sep 15 18:20:24 server kernel: r8169 0000:03:00.0 enp3s0: link up
Sep 15 18:34:19 server kernel: r8169 0000:03:00.0 enp3s0: link up
Sep 15 18:43:43 server kernel: r8169 0000:03:00.0 enp3s0: link up
Sep 15 18:46:26 server kernel: r8169 0000:03:00.0 enp3s0: link up
Sep 15 19:00:24 server kernel: r8169 0000:03:00.0 enp3s0: link up

From 17:44 ro 18:46 the network is recovering automatically. After the
up from 19:00, the network is no longer reachable without any additional
message.

If looking at ifconfig, the counter for TX packets is incrementing, the
counter for RX packets not.

Here again the driver from 4.17.14 is working flawlessly.

Thanks in advance,
David Arendt


On 9/4/18 8:19 AM, David Arendt wrote:
> Hi,
>
> When using kernel 4.18.5 the Realtek 8111G network adapter stops
> responding under high system load.
>
> Dmesg is showing no errors.
>
> Sometimes an ifconfig enp3s0 down followed by an ifconfig enp3s0 up is
> enough for the network adapter to restart responding. Sometimes a reboot
> is necessary.
>
> When copying r8169.c from 4.17.14 to the 4.18.5 kernel, networking works
> perfectly stable on 4.18.5 so the problem seems r8169.c related.
>
> Here the output from lshw:
>
>         *-pci:2
>              description: PCI bridge
>              product: 8 Series/C220 Series Chipset Family PCI Express
> Root Port #3
>              vendor: Intel Corporation
>              physical id: 1c.2
>              bus info: pci@0000:00:1c.2
>              version: d5
>              width: 32 bits
>              clock: 33MHz
>              capabilities: pci pciexpress msi pm normal_decode
> bus_master cap_list
>              configuration: driver=pcieport
>              resources: irq:18 ioport:d000(size=4096)
> memory:f7300000-f73fffff ioport:f2100000(size=1048576)
>            *-network
>                 description: Ethernet interface
>                 product: RTL8111/8168/8411 PCI Express Gigabit Ethernet
> Controller
>                 vendor: Realtek Semiconductor Co., Ltd.
>                 physical id: 0
>                 bus info: pci@0000:03:00.0
>                 logical name: enp3s0
>                 version: 0c
>                 serial: <hidden>
>                 size: 1Gbit/s
>                 capacity: 1Gbit/s
>                 width: 64 bits
>                 clock: 33MHz
>                 capabilities: pm msi pciexpress msix vpd bus_master
> cap_list ethernet physical tp mii 10bt 10bt-fd 100bt 100bt-fd 1000bt
> 1000bt-fd autonegotiation
>                 configuration: autonegotiation=on broadcast=yes
> driver=r8169 driverversion=2.3LK-NAPI duplex=full
> firmware=rtl8168g-2_0.0.1 02/06/13 latency=0 link=yes multicast=yes
> port=MII speed=1Gbit/s
>                 resources: irq:18 ioport:d000(size=256)
> memory:f7300000-f7300fff memory:f2100000-f2103fff
>
> Thanks in advance for looking into this,
>
> David Arendt
>
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kernel 4.18.5 Realtek 8111G network adapter stops responding under high system load
  2018-09-15 21:23 ` David Arendt
@ 2018-09-15 23:54   ` Maciej S. Szmigiero
  2018-09-16 12:38     ` David Arendt
  2018-09-18 10:23     ` David Arendt
  0 siblings, 2 replies; 14+ messages in thread
From: Maciej S. Szmigiero @ 2018-09-15 23:54 UTC (permalink / raw)
  To: David Arendt; +Cc: linux-kernel, nic_swsd, netdev

[ I've added Realtek Linux NIC and netdev mailing lists to CC ]

Hi David,

On 15.09.2018 23:23, David Arendt wrote:
> Hi,
> 
> just a follow up:
> 
> In kernel 4.18.8 the behaviour is different.
> 
> The network is not reachable a number of times, but restarting to be
> reachable by itself before it finally is no longer reachable at all.
> 
> Here the logging output:
> 
> Sep 15 17:44:43 server kernel: NETDEV WATCHDOG: enp3s0 (r8169): transmit
> queue 0 timed out
> Sep 15 17:44:43 server kernel: r8169 0000:03:00.0 enp3s0: link up
> Sep 15 18:10:26 server kernel: r8169 0000:03:00.0 enp3s0: link up
> Sep 15 18:12:24 server kernel: r8169 0000:03:00.0 enp3s0: link up
> Sep 15 18:13:19 server kernel: r8169 0000:03:00.0 enp3s0: link up
> Sep 15 18:14:48 server kernel: r8169 0000:03:00.0 enp3s0: link up
> Sep 15 18:20:24 server kernel: r8169 0000:03:00.0 enp3s0: link up
> Sep 15 18:34:19 server kernel: r8169 0000:03:00.0 enp3s0: link up
> Sep 15 18:43:43 server kernel: r8169 0000:03:00.0 enp3s0: link up
> Sep 15 18:46:26 server kernel: r8169 0000:03:00.0 enp3s0: link up
> Sep 15 19:00:24 server kernel: r8169 0000:03:00.0 enp3s0: link up
> 
> From 17:44 ro 18:46 the network is recovering automatically. After the
> up from 19:00, the network is no longer reachable without any additional
> message.
> 
> If looking at ifconfig, the counter for TX packets is incrementing, the
> counter for RX packets not.
> 
> Here again the driver from 4.17.14 is working flawlessly.

Could you please try this patch on top of 4.18.8:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f74dd480cf4e31e12971c58a1d832044db945670

In my case the problem fixed by the above commit was only limited to
bad TX performance but my r8169 NIC models were different from what
you have.

If this does not help then try bisecting the issue
(maybe limited to drivers/net/ethernet/realtek/r8169.c to save time).
If the NIC dies after a heavy load it might be possible to generate
such load quickly by in-kernel pktgen.

If that's not possible then at please least compare NIC register
values displayed by "ethtool -d enp3s0" between working and
non-working kernels.

> Thanks in advance,
> David Arendt

Maciej

> 
> 
> On 9/4/18 8:19 AM, David Arendt wrote:
>> Hi,
>>
>> When using kernel 4.18.5 the Realtek 8111G network adapter stops
>> responding under high system load.
>>
>> Dmesg is showing no errors.
>>
>> Sometimes an ifconfig enp3s0 down followed by an ifconfig enp3s0 up is
>> enough for the network adapter to restart responding. Sometimes a reboot
>> is necessary.
>>
>> When copying r8169.c from 4.17.14 to the 4.18.5 kernel, networking works
>> perfectly stable on 4.18.5 so the problem seems r8169.c related.
>>
>> Here the output from lshw:
>>
>>         *-pci:2
>>              description: PCI bridge
>>              product: 8 Series/C220 Series Chipset Family PCI Express
>> Root Port #3
>>              vendor: Intel Corporation
>>              physical id: 1c.2
>>              bus info: pci@0000:00:1c.2
>>              version: d5
>>              width: 32 bits
>>              clock: 33MHz
>>              capabilities: pci pciexpress msi pm normal_decode
>> bus_master cap_list
>>              configuration: driver=pcieport
>>              resources: irq:18 ioport:d000(size=4096)
>> memory:f7300000-f73fffff ioport:f2100000(size=1048576)
>>            *-network
>>                 description: Ethernet interface
>>                 product: RTL8111/8168/8411 PCI Express Gigabit Ethernet
>> Controller
>>                 vendor: Realtek Semiconductor Co., Ltd.
>>                 physical id: 0
>>                 bus info: pci@0000:03:00.0
>>                 logical name: enp3s0
>>                 version: 0c
>>                 serial: <hidden>
>>                 size: 1Gbit/s
>>                 capacity: 1Gbit/s
>>                 width: 64 bits
>>                 clock: 33MHz
>>                 capabilities: pm msi pciexpress msix vpd bus_master
>> cap_list ethernet physical tp mii 10bt 10bt-fd 100bt 100bt-fd 1000bt
>> 1000bt-fd autonegotiation
>>                 configuration: autonegotiation=on broadcast=yes
>> driver=r8169 driverversion=2.3LK-NAPI duplex=full
>> firmware=rtl8168g-2_0.0.1 02/06/13 latency=0 link=yes multicast=yes
>> port=MII speed=1Gbit/s
>>                 resources: irq:18 ioport:d000(size=256)
>> memory:f7300000-f7300fff memory:f2100000-f2103fff
>>
>> Thanks in advance for looking into this,
>>
>> David Arendt
>>
>>
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kernel 4.18.5 Realtek 8111G network adapter stops responding under high system load
  2018-09-15 23:54   ` Maciej S. Szmigiero
@ 2018-09-16 12:38     ` David Arendt
  2018-09-16 23:11       ` Maciej S. Szmigiero
  2018-09-18 10:23     ` David Arendt
  1 sibling, 1 reply; 14+ messages in thread
From: David Arendt @ 2018-09-16 12:38 UTC (permalink / raw)
  To: Maciej S. Szmigiero; +Cc: linux-kernel, nic_swsd, netdev

Hi,

I have applied the patch one hour ago. So far there are no problems but
because sometimes the problems only appeared after a few hours, I will
only definitively know tomorrow if the patch helped or not.

If not, I will try bisecting the problem.

For information here the differences from ethtool between the working
driver from 4.17.14 and the patched one fom 4.18.8:

--- working.txt 2018-09-16 14:14:00.544376935 +0200
+++ patched.txt 2018-09-16 14:20:09.445660915 +0200
@@ -5,2 +5,2 @@
-0x10: Dump Tally Counter Command   0xf900c000 0x00000007
-0x20: Tx Normal Priority Ring Addr 0xf3aa7000 0x00000007
+0x10: Dump Tally Counter Command   0xf9260000 0x00000007
+0x20: Tx Normal Priority Ring Addr 0xebb73000 0x00000007
@@ -17 +17 @@
-0x40: Tx Configuration                        0x4f000f80
+0x40: Tx Configuration                        0x4f000f00
@@ -31,2 +31,2 @@
-0x64: TBI control and status                  0x17ffff01
-0x68: TBI Autonegotiation advertisement (ANAR)    0xf70c
+0x64: TBI control and status                  0x00000000
+0x68: TBI Autonegotiation advertisement (ANAR)    0x0000
@@ -35 +35 @@
-0x84: PM wakeup frame 0            0x04000000 0x7c5b5c95
+0x84: PM wakeup frame 0            0x04000000 0x710b8deb
@@ -57 +57 @@
-0xE4: Rx Ring Addr                 0xf3b64000 0x00000007
+0xE4: Rx Ring Addr                 0xef9f0000 0x00000007

Thanks in advance,
David Arendt

On 9/16/18 1:54 AM, Maciej S. Szmigiero wrote:
> [ I've added Realtek Linux NIC and netdev mailing lists to CC ]
>
> Hi David,
>
> On 15.09.2018 23:23, David Arendt wrote:
>> Hi,
>>
>> just a follow up:
>>
>> In kernel 4.18.8 the behaviour is different.
>>
>> The network is not reachable a number of times, but restarting to be
>> reachable by itself before it finally is no longer reachable at all.
>>
>> Here the logging output:
>>
>> Sep 15 17:44:43 server kernel: NETDEV WATCHDOG: enp3s0 (r8169): transmit
>> queue 0 timed out
>> Sep 15 17:44:43 server kernel: r8169 0000:03:00.0 enp3s0: link up
>> Sep 15 18:10:26 server kernel: r8169 0000:03:00.0 enp3s0: link up
>> Sep 15 18:12:24 server kernel: r8169 0000:03:00.0 enp3s0: link up
>> Sep 15 18:13:19 server kernel: r8169 0000:03:00.0 enp3s0: link up
>> Sep 15 18:14:48 server kernel: r8169 0000:03:00.0 enp3s0: link up
>> Sep 15 18:20:24 server kernel: r8169 0000:03:00.0 enp3s0: link up
>> Sep 15 18:34:19 server kernel: r8169 0000:03:00.0 enp3s0: link up
>> Sep 15 18:43:43 server kernel: r8169 0000:03:00.0 enp3s0: link up
>> Sep 15 18:46:26 server kernel: r8169 0000:03:00.0 enp3s0: link up
>> Sep 15 19:00:24 server kernel: r8169 0000:03:00.0 enp3s0: link up
>>
>> From 17:44 ro 18:46 the network is recovering automatically. After the
>> up from 19:00, the network is no longer reachable without any additional
>> message.
>>
>> If looking at ifconfig, the counter for TX packets is incrementing, the
>> counter for RX packets not.
>>
>> Here again the driver from 4.17.14 is working flawlessly.
> Could you please try this patch on top of 4.18.8:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f74dd480cf4e31e12971c58a1d832044db945670
>
> In my case the problem fixed by the above commit was only limited to
> bad TX performance but my r8169 NIC models were different from what
> you have.
>
> If this does not help then try bisecting the issue
> (maybe limited to drivers/net/ethernet/realtek/r8169.c to save time).
> If the NIC dies after a heavy load it might be possible to generate
> such load quickly by in-kernel pktgen.
>
> If that's not possible then at please least compare NIC register
> values displayed by "ethtool -d enp3s0" between working and
> non-working kernels.
>
>> Thanks in advance,
>> David Arendt
> Maciej
>
>>
>> On 9/4/18 8:19 AM, David Arendt wrote:
>>> Hi,
>>>
>>> When using kernel 4.18.5 the Realtek 8111G network adapter stops
>>> responding under high system load.
>>>
>>> Dmesg is showing no errors.
>>>
>>> Sometimes an ifconfig enp3s0 down followed by an ifconfig enp3s0 up is
>>> enough for the network adapter to restart responding. Sometimes a reboot
>>> is necessary.
>>>
>>> When copying r8169.c from 4.17.14 to the 4.18.5 kernel, networking works
>>> perfectly stable on 4.18.5 so the problem seems r8169.c related.
>>>
>>> Here the output from lshw:
>>>
>>>         *-pci:2
>>>              description: PCI bridge
>>>              product: 8 Series/C220 Series Chipset Family PCI Express
>>> Root Port #3
>>>              vendor: Intel Corporation
>>>              physical id: 1c.2
>>>              bus info: pci@0000:00:1c.2
>>>              version: d5
>>>              width: 32 bits
>>>              clock: 33MHz
>>>              capabilities: pci pciexpress msi pm normal_decode
>>> bus_master cap_list
>>>              configuration: driver=pcieport
>>>              resources: irq:18 ioport:d000(size=4096)
>>> memory:f7300000-f73fffff ioport:f2100000(size=1048576)
>>>            *-network
>>>                 description: Ethernet interface
>>>                 product: RTL8111/8168/8411 PCI Express Gigabit Ethernet
>>> Controller
>>>                 vendor: Realtek Semiconductor Co., Ltd.
>>>                 physical id: 0
>>>                 bus info: pci@0000:03:00.0
>>>                 logical name: enp3s0
>>>                 version: 0c
>>>                 serial: <hidden>
>>>                 size: 1Gbit/s
>>>                 capacity: 1Gbit/s
>>>                 width: 64 bits
>>>                 clock: 33MHz
>>>                 capabilities: pm msi pciexpress msix vpd bus_master
>>> cap_list ethernet physical tp mii 10bt 10bt-fd 100bt 100bt-fd 1000bt
>>> 1000bt-fd autonegotiation
>>>                 configuration: autonegotiation=on broadcast=yes
>>> driver=r8169 driverversion=2.3LK-NAPI duplex=full
>>> firmware=rtl8168g-2_0.0.1 02/06/13 latency=0 link=yes multicast=yes
>>> port=MII speed=1Gbit/s
>>>                 resources: irq:18 ioport:d000(size=256)
>>> memory:f7300000-f7300fff memory:f2100000-f2103fff
>>>
>>> Thanks in advance for looking into this,
>>>
>>> David Arendt
>>>
>>>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kernel 4.18.5 Realtek 8111G network adapter stops responding under high system load
  2018-09-16 12:38     ` David Arendt
@ 2018-09-16 23:11       ` Maciej S. Szmigiero
  0 siblings, 0 replies; 14+ messages in thread
From: Maciej S. Szmigiero @ 2018-09-16 23:11 UTC (permalink / raw)
  To: David Arendt; +Cc: linux-kernel, nic_swsd, netdev, Heiner Kallweit

On 16.09.2018 14:38, David Arendt wrote:
> Hi,
> 
(..)
> 
> For information here the differences from ethtool between the working
> driver from 4.17.14 and the patched one fom 4.18.8:
> 
> --- working.txt 2018-09-16 14:14:00.544376935 +0200
> +++ patched.txt 2018-09-16 14:20:09.445660915 +0200
> @@ -17 +17 @@
> -0x40: Tx Configuration                        0x4f000f80
> +0x40: Tx Configuration                        0x4f000f00

TXCFG_AUTO_FIFO was set by the working driver in TxConfig but the current
driver version seems unable to do it.

Looking at your NIC model config code (guess it is XID 4c000800 or
RTL_GIGA_MAC_VER_40) that bit should be set by rtl_hw_start_8168g(),
that is called from rtl_hw_start_8168g_1(), that in turn is called
from rtl_hw_start_8168().

However, after rtl_hw_start_8168() is called from rtl_hw_start()
(as tp->hw_start(tp)) a call to rtl_set_tx_config_registers() is made
which overwrites TxConfig completely, zeroing the aforementioned bit.

It looks like this was first introduced by commit
4fd48c4ac0a0 ("r8169: move common initializations to tp->hw_start").
I have added its author (Heiner Kallweit) to CC.

@Heiner: could you have a look at this?

Maciej

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kernel 4.18.5 Realtek 8111G network adapter stops responding under high system load
  2018-09-15 23:54   ` Maciej S. Szmigiero
  2018-09-16 12:38     ` David Arendt
@ 2018-09-18 10:23     ` David Arendt
  2018-09-18 22:30       ` Maciej S. Szmigiero
  1 sibling, 1 reply; 14+ messages in thread
From: David Arendt @ 2018-09-18 10:23 UTC (permalink / raw)
  To: Maciej S. Szmigiero; +Cc: linux-kernel, nic_swsd, netdev

Hi,

Today I had the network adapter problems again.  So the patch doesn't seem to change anything regarding this problem. This week my time is unfortunately very limited, but I will try to find some time next weekend to look a bit more into the issue.

Thanks in advance,
David Arendt


Maciej S. Szmigiero – Sun, 16. September 2018 2:12
> [ I've added Realtek Linux NIC and netdev mailing lists to CC ]
> 
> Hi David,
> 
> On 15.09.2018 23:23, David Arendt wrote:
> > Hi,
> > 
> > just a follow up:
> > 
> > In kernel 4.18.8 the behaviour is different.
> > 
> > The network is not reachable a number of times, but restarting to be
> > reachable by itself before it finally is no longer reachable at all.
> > 
> > Here the logging output:
> > 
> > Sep 15 17:44:43 server kernel: NETDEV WATCHDOG: enp3s0 (r8169): transmit
> > queue 0 timed out
> > Sep 15 17:44:43 server kernel: r8169 0000:03:00.0 enp3s0: link up
> > Sep 15 18:10:26 server kernel: r8169 0000:03:00.0 enp3s0: link up
> > Sep 15 18:12:24 server kernel: r8169 0000:03:00.0 enp3s0: link up
> > Sep 15 18:13:19 server kernel: r8169 0000:03:00.0 enp3s0: link up
> > Sep 15 18:14:48 server kernel: r8169 0000:03:00.0 enp3s0: link up
> > Sep 15 18:20:24 server kernel: r8169 0000:03:00.0 enp3s0: link up
> > Sep 15 18:34:19 server kernel: r8169 0000:03:00.0 enp3s0: link up
> > Sep 15 18:43:43 server kernel: r8169 0000:03:00.0 enp3s0: link up
> > Sep 15 18:46:26 server kernel: r8169 0000:03:00.0 enp3s0: link up
> > Sep 15 19:00:24 server kernel: r8169 0000:03:00.0 enp3s0: link up
> > 
> > From 17:44 ro 18:46 the network is recovering automatically. After the
> > up from 19:00, the network is no longer reachable without any additional
> > message.
> > 
> > If looking at ifconfig, the counter for TX packets is incrementing, the
> > counter for RX packets not.
> > 
> > Here again the driver from 4.17.14 is working flawlessly.
> 
> Could you please try this patch on top of 4.18.8:
> git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/
> 
> In my case the problem fixed by the above commit was only limited to
> bad TX performance but my r8169 NIC models were different from what
> you have.
> 
> If this does not help then try bisecting the issue
> (maybe limited to drivers/net/ethernet/realtek/r8169.c to save time).
> If the NIC dies after a heavy load it might be possible to generate
> such load quickly by in-kernel pktgen.
> 
> If that's not possible then at please least compare NIC register
> values displayed by "ethtool -d enp3s0" between working and
> non-working kernels.
> 
> > Thanks in advance,
> > David Arendt
> 
> Maciej
> 
> > 
> > 
> > On 9/4/18 8:19 AM, David Arendt wrote:
> >> Hi,
> >>
> >> When using kernel 4.18.5 the Realtek 8111G network adapter stops
> >> responding under high system load.
> >>
> >> Dmesg is showing no errors.
> >>
> >> Sometimes an ifconfig enp3s0 down followed by an ifconfig enp3s0 up is
> >> enough for the network adapter to restart responding. Sometimes a reboot
> >> is necessary.
> >>
> >> When copying r8169.c from 4.17.14 to the 4.18.5 kernel, networking works
> >> perfectly stable on 4.18.5 so the problem seems r8169.c related.
> >>
> >> Here the output from lshw:
> >>
> >> *-pci:2
> >> description: PCI bridge
> >> product: 8 Series/C220 Series Chipset Family PCI Express
> >> Root Port #3
> >> vendor: Intel Corporation
> >> physical id: 1c.2
> >> bus info: pci@0000:00:1c.2
> >> version: d5
> >> width: 32 bits
> >> clock: 33MHz
> >> capabilities: pci pciexpress msi pm normal_decode
> >> bus_master cap_list
> >> configuration: driver=pcieport
> >> resources: irq:18 ioport:d000(size=4096)
> >> memory:f7300000-f73fffff ioport:f2100000(size=1048576)
> >> *-network
> >> description: Ethernet interface
> >> product: RTL8111/8168/8411 PCI Express Gigabit Ethernet
> >> Controller
> >> vendor: Realtek Semiconductor Co., Ltd.
> >> physical id: 0
> >> bus info: pci@0000:03:00.0
> >> logical name: enp3s0
> >> version: 0c
> >> serial: <hidden>
> >> size: 1Gbit/s
> >> capacity: 1Gbit/s
> >> width: 64 bits
> >> clock: 33MHz
> >> capabilities: pm msi pciexpress msix vpd bus_master
> >> cap_list ethernet physical tp mii 10bt 10bt-fd 100bt 100bt-fd 1000bt
> >> 1000bt-fd autonegotiation
> >> configuration: autonegotiation=on broadcast=yes
> >> driver=r8169 driverversion=2.3LK-NAPI duplex=full
> >> firmware=rtl8168g-2_0.0.1 02/06/13 latency=0 link=yes multicast=yes
> >> port=MII speed=1Gbit/s
> >> resources: irq:18 ioport:d000(size=256)
> >> memory:f7300000-f7300fff memory:f2100000-f2103fff
> >>
> >> Thanks in advance for looking into this,
> >>
> >> David Arendt
> >>
> >>
> >

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kernel 4.18.5 Realtek 8111G network adapter stops responding under high system load
  2018-09-18 10:23     ` David Arendt
@ 2018-09-18 22:30       ` Maciej S. Szmigiero
  2018-09-19  4:12         ` David Arendt
  2018-09-19 16:34         ` David Arendt
  0 siblings, 2 replies; 14+ messages in thread
From: Maciej S. Szmigiero @ 2018-09-18 22:30 UTC (permalink / raw)
  To: David Arendt; +Cc: linux-kernel, nic_swsd, netdev, Heiner Kallweit

Hi,

On 18.09.2018 12:23, David Arendt wrote:
> Hi,
> 
> Today I had the network adapter problems again.
> So the patch doesn't seem to change anything regarding this problem.
> This week my time is unfortunately very limited, but I will try to
> find some time next weekend to look a bit more into the issue.

If the problem is caused by missing TXCFG_AUTO_FIFO bit in TxConfig,
as the register difference would suggest, then you can try applying
the following patch (hack) on top of 4.18.8 that is already patched
with commit f74dd480cf4e:
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -5043,7 +5043,8 @@
 {
 	/* Set DMA burst size and Interframe Gap Time */
 	RTL_W32(tp, TxConfig, (TX_DMA_BURST << TxDMAShift) |
-		(InterFrameGap << TxInterFrameGapShift));
+		(InterFrameGap << TxInterFrameGapShift)
+		| TXCFG_AUTO_FIFO);
 }
 
 static void rtl_set_rx_max_size(struct rtl8169_private *tp)

This hack will probably only work properly on RTL_GIGA_MAC_VER_40 or
later NICs.

Before running any tests please verify with "ethtool -d enp3s0" that
TxConfig register now contains 0x4f000f80, as it did in the old,
working driver version.

If this does not help then a bisection will most likely be needed.

> Thanks in advance,
> David Arendt

Maciej

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kernel 4.18.5 Realtek 8111G network adapter stops responding under high system load
  2018-09-18 22:30       ` Maciej S. Szmigiero
@ 2018-09-19  4:12         ` David Arendt
  2018-09-25 21:03           ` Heiner Kallweit
  2018-09-19 16:34         ` David Arendt
  1 sibling, 1 reply; 14+ messages in thread
From: David Arendt @ 2018-09-19  4:12 UTC (permalink / raw)
  To: Maciej S. Szmigiero; +Cc: linux-kernel, nic_swsd, netdev, Heiner Kallweit

Hi,

Thanks for the patch.

I just applied it and the TxConfig register now contains 0x4f000f80.
The next day will show if it really solves the problem.

Thanks in advance,
David Arendt

On 9/19/18 12:30 AM, Maciej S. Szmigiero wrote:
> Hi,
>
> On 18.09.2018 12:23, David Arendt wrote:
>> Hi,
>>
>> Today I had the network adapter problems again.
>> So the patch doesn't seem to change anything regarding this problem.
>> This week my time is unfortunately very limited, but I will try to
>> find some time next weekend to look a bit more into the issue.
> If the problem is caused by missing TXCFG_AUTO_FIFO bit in TxConfig,
> as the register difference would suggest, then you can try applying
> the following patch (hack) on top of 4.18.8 that is already patched
> with commit f74dd480cf4e:
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
> @@ -5043,7 +5043,8 @@
>  {
>  	/* Set DMA burst size and Interframe Gap Time */
>  	RTL_W32(tp, TxConfig, (TX_DMA_BURST << TxDMAShift) |
> -		(InterFrameGap << TxInterFrameGapShift));
> +		(InterFrameGap << TxInterFrameGapShift)
> +		| TXCFG_AUTO_FIFO);
>  }
>  
>  static void rtl_set_rx_max_size(struct rtl8169_private *tp)
>
> This hack will probably only work properly on RTL_GIGA_MAC_VER_40 or
> later NICs.
>
> Before running any tests please verify with "ethtool -d enp3s0" that
> TxConfig register now contains 0x4f000f80, as it did in the old,
> working driver version.
>
> If this does not help then a bisection will most likely be needed.
>
>> Thanks in advance,
>> David Arendt
> Maciej



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kernel 4.18.5 Realtek 8111G network adapter stops responding under high system load
  2018-09-18 22:30       ` Maciej S. Szmigiero
  2018-09-19  4:12         ` David Arendt
@ 2018-09-19 16:34         ` David Arendt
  2018-09-21 22:28           ` Maciej S. Szmigiero
  1 sibling, 1 reply; 14+ messages in thread
From: David Arendt @ 2018-09-19 16:34 UTC (permalink / raw)
  To: Maciej S. Szmigiero; +Cc: linux-kernel, nic_swsd, netdev, Heiner Kallweit

Hi,

the networking problem did not occur for 12 hours now, so I think this
patch resolved the problem.

Thanks,
David Arendt

On 9/19/18 12:30 AM, Maciej S. Szmigiero wrote:
> Hi,
>
> On 18.09.2018 12:23, David Arendt wrote:
>> Hi,
>>
>> Today I had the network adapter problems again.
>> So the patch doesn't seem to change anything regarding this problem.
>> This week my time is unfortunately very limited, but I will try to
>> find some time next weekend to look a bit more into the issue.
> If the problem is caused by missing TXCFG_AUTO_FIFO bit in TxConfig,
> as the register difference would suggest, then you can try applying
> the following patch (hack) on top of 4.18.8 that is already patched
> with commit f74dd480cf4e:
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
> @@ -5043,7 +5043,8 @@
>  {
>  	/* Set DMA burst size and Interframe Gap Time */
>  	RTL_W32(tp, TxConfig, (TX_DMA_BURST << TxDMAShift) |
> -		(InterFrameGap << TxInterFrameGapShift));
> +		(InterFrameGap << TxInterFrameGapShift)
> +		| TXCFG_AUTO_FIFO);
>  }
>  
>  static void rtl_set_rx_max_size(struct rtl8169_private *tp)
>
> This hack will probably only work properly on RTL_GIGA_MAC_VER_40 or
> later NICs.
>
> Before running any tests please verify with "ethtool -d enp3s0" that
> TxConfig register now contains 0x4f000f80, as it did in the old,
> working driver version.
>
> If this does not help then a bisection will most likely be needed.
>
>> Thanks in advance,
>> David Arendt
> Maciej



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kernel 4.18.5 Realtek 8111G network adapter stops responding under high system load
  2018-09-19 16:34         ` David Arendt
@ 2018-09-21 22:28           ` Maciej S. Szmigiero
  0 siblings, 0 replies; 14+ messages in thread
From: Maciej S. Szmigiero @ 2018-09-21 22:28 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: David Arendt, linux-kernel, nic_swsd, netdev

On 19.09.2018 18:34, David Arendt wrote:
> Hi,
> 
> the networking problem did not occur for 12 hours now, so I think this
> patch resolved the problem.

@Heiner:
It seems that the regression was introduced by your
commit 4fd48c4ac0a0 ("r8169: move common initializations to tp->hw_start").

Will you submit a fix for it?

Thanks,
Maciej

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kernel 4.18.5 Realtek 8111G network adapter stops responding under high system load
  2018-09-19  4:12         ` David Arendt
@ 2018-09-25 21:03           ` Heiner Kallweit
  2018-09-26 16:44             ` David Arendt
                               ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Heiner Kallweit @ 2018-09-25 21:03 UTC (permalink / raw)
  To: David Arendt, Maciej S. Szmigiero, Gabriel C, Ortwin Glück
  Cc: linux-kernel, nic_swsd, netdev

On 19.09.2018 06:12, David Arendt wrote:
> Hi,
> 
> Thanks for the patch.
> 
> I just applied it and the TxConfig register now contains 0x4f000f80.
> The next day will show if it really solves the problem.
> 
> Thanks in advance,
> David Arendt
> 
> On 9/19/18 12:30 AM, Maciej S. Szmigiero wrote:
>> Hi,
>>
>> On 18.09.2018 12:23, David Arendt wrote:
>>> Hi,
>>>
>>> Today I had the network adapter problems again.
>>> So the patch doesn't seem to change anything regarding this problem.
>>> This week my time is unfortunately very limited, but I will try to
>>> find some time next weekend to look a bit more into the issue.
>> If the problem is caused by missing TXCFG_AUTO_FIFO bit in TxConfig,
>> as the register difference would suggest, then you can try applying
>> the following patch (hack) on top of 4.18.8 that is already patched
>> with commit f74dd480cf4e:
>> --- a/drivers/net/ethernet/realtek/r8169.c
>> +++ b/drivers/net/ethernet/realtek/r8169.c
>> @@ -5043,7 +5043,8 @@
>>  {
>>  	/* Set DMA burst size and Interframe Gap Time */
>>  	RTL_W32(tp, TxConfig, (TX_DMA_BURST << TxDMAShift) |
>> -		(InterFrameGap << TxInterFrameGapShift));
>> +		(InterFrameGap << TxInterFrameGapShift)
>> +		| TXCFG_AUTO_FIFO);
>>  }
>>  
>>  static void rtl_set_rx_max_size(struct rtl8169_private *tp)
>>
>> This hack will probably only work properly on RTL_GIGA_MAC_VER_40 or
>> later NICs.
>>
>> Before running any tests please verify with "ethtool -d enp3s0" that
>> TxConfig register now contains 0x4f000f80, as it did in the old,
>> working driver version.
>>
>> If this does not help then a bisection will most likely be needed.
>>
>>> Thanks in advance,
>>> David Arendt
>> Maciej
> 
> 
> 
@Gabriel:
Thanks for the hint, I wasn't fully aware of this thread.
@Maciej:
Thanks for the analysis.

It seems that all chip versions from 34 (= RTL8168E-VL) with the
exception of version 39 (= RTL8106E, first sub-version) need
bit TXCFG_AUTO_FIFO.

And indeed, due to reordering of calls this bit is overwritten.
Following patch moves setting the bit from the chip-specific
hw_start function to rtl_set_tx_config_registers().

Whoever is hit by the issue and has the option to build a kernel,
could you please test whether the patch fixes the issue for you?

Thanks, Heiner

---
 drivers/net/ethernet/realtek/r8169.c | 20 ++++++++------------
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index f882be49f..ae8abe900 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -4514,9 +4514,14 @@ static void rtl8169_hw_reset(struct rtl8169_private *tp)
 
 static void rtl_set_tx_config_registers(struct rtl8169_private *tp)
 {
-	/* Set DMA burst size and Interframe Gap Time */
-	RTL_W32(tp, TxConfig, (TX_DMA_BURST << TxDMAShift) |
-		(InterFrameGap << TxInterFrameGapShift));
+	u32 val = TX_DMA_BURST << TxDMAShift |
+		  InterFrameGap << TxInterFrameGapShift;
+
+	if (tp->mac_version >= RTL_GIGA_MAC_VER_34 &&
+	    tp->mac_version != RTL_GIGA_MAC_VER_39)
+		val |= TXCFG_AUTO_FIFO;
+
+	RTL_W32(tp, TxConfig, val);
 }
 
 static void rtl_set_rx_max_size(struct rtl8169_private *tp)
@@ -5011,7 +5016,6 @@ static void rtl_hw_start_8168e_2(struct rtl8169_private *tp)
 
 	rtl_disable_clock_request(tp);
 
-	RTL_W32(tp, TxConfig, RTL_R32(tp, TxConfig) | TXCFG_AUTO_FIFO);
 	RTL_W8(tp, MCU, RTL_R8(tp, MCU) & ~NOW_IS_OOB);
 
 	/* Adjust EEE LED frequency */
@@ -5045,7 +5049,6 @@ static void rtl_hw_start_8168f(struct rtl8169_private *tp)
 
 	rtl_disable_clock_request(tp);
 
-	RTL_W32(tp, TxConfig, RTL_R32(tp, TxConfig) | TXCFG_AUTO_FIFO);
 	RTL_W8(tp, MCU, RTL_R8(tp, MCU) & ~NOW_IS_OOB);
 	RTL_W8(tp, DLLPR, RTL_R8(tp, DLLPR) | PFM_EN);
 	RTL_W32(tp, MISC, RTL_R32(tp, MISC) | PWM_EN);
@@ -5090,8 +5093,6 @@ static void rtl_hw_start_8411(struct rtl8169_private *tp)
 
 static void rtl_hw_start_8168g(struct rtl8169_private *tp)
 {
-	RTL_W32(tp, TxConfig, RTL_R32(tp, TxConfig) | TXCFG_AUTO_FIFO);
-
 	rtl_eri_write(tp, 0xc8, ERIAR_MASK_0101, 0x080002, ERIAR_EXGMAC);
 	rtl_eri_write(tp, 0xcc, ERIAR_MASK_0001, 0x38, ERIAR_EXGMAC);
 	rtl_eri_write(tp, 0xd0, ERIAR_MASK_0001, 0x48, ERIAR_EXGMAC);
@@ -5189,8 +5190,6 @@ static void rtl_hw_start_8168h_1(struct rtl8169_private *tp)
 	rtl_hw_aspm_clkreq_enable(tp, false);
 	rtl_ephy_init(tp, e_info_8168h_1, ARRAY_SIZE(e_info_8168h_1));
 
-	RTL_W32(tp, TxConfig, RTL_R32(tp, TxConfig) | TXCFG_AUTO_FIFO);
-
 	rtl_eri_write(tp, 0xc8, ERIAR_MASK_0101, 0x00080002, ERIAR_EXGMAC);
 	rtl_eri_write(tp, 0xcc, ERIAR_MASK_0001, 0x38, ERIAR_EXGMAC);
 	rtl_eri_write(tp, 0xd0, ERIAR_MASK_0001, 0x48, ERIAR_EXGMAC);
@@ -5273,8 +5272,6 @@ static void rtl_hw_start_8168ep(struct rtl8169_private *tp)
 {
 	rtl8168ep_stop_cmac(tp);
 
-	RTL_W32(tp, TxConfig, RTL_R32(tp, TxConfig) | TXCFG_AUTO_FIFO);
-
 	rtl_eri_write(tp, 0xc8, ERIAR_MASK_0101, 0x00080002, ERIAR_EXGMAC);
 	rtl_eri_write(tp, 0xcc, ERIAR_MASK_0001, 0x2f, ERIAR_EXGMAC);
 	rtl_eri_write(tp, 0xd0, ERIAR_MASK_0001, 0x5f, ERIAR_EXGMAC);
@@ -5596,7 +5593,6 @@ static void rtl_hw_start_8402(struct rtl8169_private *tp)
 	/* Force LAN exit from ASPM if Rx/Tx are not idle */
 	RTL_W32(tp, FuncEvent, RTL_R32(tp, FuncEvent) | 0x002800);
 
-	RTL_W32(tp, TxConfig, RTL_R32(tp, TxConfig) | TXCFG_AUTO_FIFO);
 	RTL_W8(tp, MCU, RTL_R8(tp, MCU) & ~NOW_IS_OOB);
 
 	rtl_ephy_init(tp, e_info_8402, ARRAY_SIZE(e_info_8402));
-- 
2.19.0



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: kernel 4.18.5 Realtek 8111G network adapter stops responding under high system load
  2018-09-25 21:03           ` Heiner Kallweit
@ 2018-09-26 16:44             ` David Arendt
  2018-09-27 15:48             ` Ortwin Glück
  2018-09-27 19:33             ` David Arendt
  2 siblings, 0 replies; 14+ messages in thread
From: David Arendt @ 2018-09-26 16:44 UTC (permalink / raw)
  To: Heiner Kallweit, Maciej S. Szmigiero, Gabriel C, Ortwin Glück
  Cc: linux-kernel, nic_swsd, netdev

Hi,

Thanks.

I have just applied Heiner Kallweit's patch on top of kernel 4.18.10 and
the TxConfig register contains 0x4f000f80.

I will give it 24 hours under high load and report back if the patch
really solves the problem.

Bye,
David Arendt

On 9/25/18 11:03 PM, Heiner Kallweit wrote:
> On 19.09.2018 06:12, David Arendt wrote:
>> Hi,
>>
>> Thanks for the patch.
>>
>> I just applied it and the TxConfig register now contains 0x4f000f80.
>> The next day will show if it really solves the problem.
>>
>> Thanks in advance,
>> David Arendt
>>
>> On 9/19/18 12:30 AM, Maciej S. Szmigiero wrote:
>>> Hi,
>>>
>>> On 18.09.2018 12:23, David Arendt wrote:
>>>> Hi,
>>>>
>>>> Today I had the network adapter problems again.
>>>> So the patch doesn't seem to change anything regarding this problem.
>>>> This week my time is unfortunately very limited, but I will try to
>>>> find some time next weekend to look a bit more into the issue.
>>> If the problem is caused by missing TXCFG_AUTO_FIFO bit in TxConfig,
>>> as the register difference would suggest, then you can try applying
>>> the following patch (hack) on top of 4.18.8 that is already patched
>>> with commit f74dd480cf4e:
>>> --- a/drivers/net/ethernet/realtek/r8169.c
>>> +++ b/drivers/net/ethernet/realtek/r8169.c
>>> @@ -5043,7 +5043,8 @@
>>>  {
>>>  	/* Set DMA burst size and Interframe Gap Time */
>>>  	RTL_W32(tp, TxConfig, (TX_DMA_BURST << TxDMAShift) |
>>> -		(InterFrameGap << TxInterFrameGapShift));
>>> +		(InterFrameGap << TxInterFrameGapShift)
>>> +		| TXCFG_AUTO_FIFO);
>>>  }
>>>  
>>>  static void rtl_set_rx_max_size(struct rtl8169_private *tp)
>>>
>>> This hack will probably only work properly on RTL_GIGA_MAC_VER_40 or
>>> later NICs.
>>>
>>> Before running any tests please verify with "ethtool -d enp3s0" that
>>> TxConfig register now contains 0x4f000f80, as it did in the old,
>>> working driver version.
>>>
>>> If this does not help then a bisection will most likely be needed.
>>>
>>>> Thanks in advance,
>>>> David Arendt
>>> Maciej
>>
>>
> @Gabriel:
> Thanks for the hint, I wasn't fully aware of this thread.
> @Maciej:
> Thanks for the analysis.
>
> It seems that all chip versions from 34 (= RTL8168E-VL) with the
> exception of version 39 (= RTL8106E, first sub-version) need
> bit TXCFG_AUTO_FIFO.
>
> And indeed, due to reordering of calls this bit is overwritten.
> Following patch moves setting the bit from the chip-specific
> hw_start function to rtl_set_tx_config_registers().
>
> Whoever is hit by the issue and has the option to build a kernel,
> could you please test whether the patch fixes the issue for you?
>
> Thanks, Heiner
>
> ---
>  drivers/net/ethernet/realtek/r8169.c | 20 ++++++++------------
>  1 file changed, 8 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
> index f882be49f..ae8abe900 100644
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
> @@ -4514,9 +4514,14 @@ static void rtl8169_hw_reset(struct rtl8169_private *tp)
>  
>  static void rtl_set_tx_config_registers(struct rtl8169_private *tp)
>  {
> -	/* Set DMA burst size and Interframe Gap Time */
> -	RTL_W32(tp, TxConfig, (TX_DMA_BURST << TxDMAShift) |
> -		(InterFrameGap << TxInterFrameGapShift));
> +	u32 val = TX_DMA_BURST << TxDMAShift |
> +		  InterFrameGap << TxInterFrameGapShift;
> +
> +	if (tp->mac_version >= RTL_GIGA_MAC_VER_34 &&
> +	    tp->mac_version != RTL_GIGA_MAC_VER_39)
> +		val |= TXCFG_AUTO_FIFO;
> +
> +	RTL_W32(tp, TxConfig, val);
>  }
>  
>  static void rtl_set_rx_max_size(struct rtl8169_private *tp)
> @@ -5011,7 +5016,6 @@ static void rtl_hw_start_8168e_2(struct rtl8169_private *tp)
>  
>  	rtl_disable_clock_request(tp);
>  
> -	RTL_W32(tp, TxConfig, RTL_R32(tp, TxConfig) | TXCFG_AUTO_FIFO);
>  	RTL_W8(tp, MCU, RTL_R8(tp, MCU) & ~NOW_IS_OOB);
>  
>  	/* Adjust EEE LED frequency */
> @@ -5045,7 +5049,6 @@ static void rtl_hw_start_8168f(struct rtl8169_private *tp)
>  
>  	rtl_disable_clock_request(tp);
>  
> -	RTL_W32(tp, TxConfig, RTL_R32(tp, TxConfig) | TXCFG_AUTO_FIFO);
>  	RTL_W8(tp, MCU, RTL_R8(tp, MCU) & ~NOW_IS_OOB);
>  	RTL_W8(tp, DLLPR, RTL_R8(tp, DLLPR) | PFM_EN);
>  	RTL_W32(tp, MISC, RTL_R32(tp, MISC) | PWM_EN);
> @@ -5090,8 +5093,6 @@ static void rtl_hw_start_8411(struct rtl8169_private *tp)
>  
>  static void rtl_hw_start_8168g(struct rtl8169_private *tp)
>  {
> -	RTL_W32(tp, TxConfig, RTL_R32(tp, TxConfig) | TXCFG_AUTO_FIFO);
> -
>  	rtl_eri_write(tp, 0xc8, ERIAR_MASK_0101, 0x080002, ERIAR_EXGMAC);
>  	rtl_eri_write(tp, 0xcc, ERIAR_MASK_0001, 0x38, ERIAR_EXGMAC);
>  	rtl_eri_write(tp, 0xd0, ERIAR_MASK_0001, 0x48, ERIAR_EXGMAC);
> @@ -5189,8 +5190,6 @@ static void rtl_hw_start_8168h_1(struct rtl8169_private *tp)
>  	rtl_hw_aspm_clkreq_enable(tp, false);
>  	rtl_ephy_init(tp, e_info_8168h_1, ARRAY_SIZE(e_info_8168h_1));
>  
> -	RTL_W32(tp, TxConfig, RTL_R32(tp, TxConfig) | TXCFG_AUTO_FIFO);
> -
>  	rtl_eri_write(tp, 0xc8, ERIAR_MASK_0101, 0x00080002, ERIAR_EXGMAC);
>  	rtl_eri_write(tp, 0xcc, ERIAR_MASK_0001, 0x38, ERIAR_EXGMAC);
>  	rtl_eri_write(tp, 0xd0, ERIAR_MASK_0001, 0x48, ERIAR_EXGMAC);
> @@ -5273,8 +5272,6 @@ static void rtl_hw_start_8168ep(struct rtl8169_private *tp)
>  {
>  	rtl8168ep_stop_cmac(tp);
>  
> -	RTL_W32(tp, TxConfig, RTL_R32(tp, TxConfig) | TXCFG_AUTO_FIFO);
> -
>  	rtl_eri_write(tp, 0xc8, ERIAR_MASK_0101, 0x00080002, ERIAR_EXGMAC);
>  	rtl_eri_write(tp, 0xcc, ERIAR_MASK_0001, 0x2f, ERIAR_EXGMAC);
>  	rtl_eri_write(tp, 0xd0, ERIAR_MASK_0001, 0x5f, ERIAR_EXGMAC);
> @@ -5596,7 +5593,6 @@ static void rtl_hw_start_8402(struct rtl8169_private *tp)
>  	/* Force LAN exit from ASPM if Rx/Tx are not idle */
>  	RTL_W32(tp, FuncEvent, RTL_R32(tp, FuncEvent) | 0x002800);
>  
> -	RTL_W32(tp, TxConfig, RTL_R32(tp, TxConfig) | TXCFG_AUTO_FIFO);
>  	RTL_W8(tp, MCU, RTL_R8(tp, MCU) & ~NOW_IS_OOB);
>  
>  	rtl_ephy_init(tp, e_info_8402, ARRAY_SIZE(e_info_8402));



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kernel 4.18.5 Realtek 8111G network adapter stops responding under high system load
  2018-09-25 21:03           ` Heiner Kallweit
  2018-09-26 16:44             ` David Arendt
@ 2018-09-27 15:48             ` Ortwin Glück
  2018-09-27 19:33             ` David Arendt
  2 siblings, 0 replies; 14+ messages in thread
From: Ortwin Glück @ 2018-09-27 15:48 UTC (permalink / raw)
  To: Heiner Kallweit, David Arendt, Maciej S. Szmigiero, Gabriel C
  Cc: linux-kernel, nic_swsd, netdev

On 25.09.18 23:03, Heiner Kallweit wrote:
> It seems that all chip versions from 34 (= RTL8168E-VL) with the
> exception of version 39 (= RTL8106E, first sub-version) need
> bit TXCFG_AUTO_FIFO.
> 
> And indeed, due to reordering of calls this bit is overwritten.
> Following patch moves setting the bit from the chip-specific
> hw_start function to rtl_set_tx_config_registers().
> 
> Whoever is hit by the issue and has the option to build a kernel,
> could you please test whether the patch fixes the issue for you?

Hi,

Looks good so far! No problems for almost 24 hours. This is on a router/firewall that links various 
sites via IPSec and other VPNs and has >10 network interfaces, 5 of which are Realtek ones.

Thanks,

Ortwin

# uname -a
Linux lofw 4.18.10+ #72 SMP PREEMPT Wed Sep 26 17:07:07 CEST 2018 x86_64 Intel(R) Core(TM) i5-7500 
CPU @ 3.40GHz GenuineIntel GNU/Linux
# uptime
  17:42:37 up 22:54,  1 user,  load average: 0.48, 0.38, 0.30
# ifconfig wan
wan: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
         inet 81.7.230.226  netmask 255.255.255.248  broadcast 81.7.230.231
         inet6 fe80::529a:4cff:fe2e:92be  prefixlen 64  scopeid 0x20<link>
         ether 50:9a:4c:2e:92:be  txqueuelen 100  (Ethernet)
         RX packets 56342905  bytes 40589502599 (37.8 GiB)
         RX errors 0  dropped 0  overruns 0  frame 0
         TX packets 54032328  bytes 44607761646 (41.5 GiB)
         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

# ifconfig lan
lan: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
         inet 10.11.1.1  netmask 255.255.255.0  broadcast 10.11.1.255
         inet6 fe80::20a:cdff:fe31:6022  prefixlen 64  scopeid 0x20<link>
         ether 00:0a:cd:31:60:22  txqueuelen 100  (Ethernet)
         RX packets 54799469  bytes 43111097607 (40.1 GiB)
         RX errors 0  dropped 0  overruns 0  frame 0
         TX packets 55158558  bytes 35746992802 (33.2 GiB)
         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: kernel 4.18.5 Realtek 8111G network adapter stops responding under high system load
  2018-09-25 21:03           ` Heiner Kallweit
  2018-09-26 16:44             ` David Arendt
  2018-09-27 15:48             ` Ortwin Glück
@ 2018-09-27 19:33             ` David Arendt
  2 siblings, 0 replies; 14+ messages in thread
From: David Arendt @ 2018-09-27 19:33 UTC (permalink / raw)
  To: Heiner Kallweit, Maciej S. Szmigiero, Gabriel C, Ortwin Glück
  Cc: linux-kernel, nic_swsd, netdev

Hi,

Heiner Kallweit's patch seems to resolve the problem. The machine was
under high disk and network io pressure today and networking was
perfectly stable.

Bye,
David Arendt

On 9/25/18 11:03 PM, Heiner Kallweit wrote:
> On 19.09.2018 06:12, David Arendt wrote:
>> Hi,
>>
>> Thanks for the patch.
>>
>> I just applied it and the TxConfig register now contains 0x4f000f80.
>> The next day will show if it really solves the problem.
>>
>> Thanks in advance,
>> David Arendt
>>
>> On 9/19/18 12:30 AM, Maciej S. Szmigiero wrote:
>>> Hi,
>>>
>>> On 18.09.2018 12:23, David Arendt wrote:
>>>> Hi,
>>>>
>>>> Today I had the network adapter problems again.
>>>> So the patch doesn't seem to change anything regarding this problem.
>>>> This week my time is unfortunately very limited, but I will try to
>>>> find some time next weekend to look a bit more into the issue.
>>> If the problem is caused by missing TXCFG_AUTO_FIFO bit in TxConfig,
>>> as the register difference would suggest, then you can try applying
>>> the following patch (hack) on top of 4.18.8 that is already patched
>>> with commit f74dd480cf4e:
>>> --- a/drivers/net/ethernet/realtek/r8169.c
>>> +++ b/drivers/net/ethernet/realtek/r8169.c
>>> @@ -5043,7 +5043,8 @@
>>>  {
>>>  	/* Set DMA burst size and Interframe Gap Time */
>>>  	RTL_W32(tp, TxConfig, (TX_DMA_BURST << TxDMAShift) |
>>> -		(InterFrameGap << TxInterFrameGapShift));
>>> +		(InterFrameGap << TxInterFrameGapShift)
>>> +		| TXCFG_AUTO_FIFO);
>>>  }
>>>  
>>>  static void rtl_set_rx_max_size(struct rtl8169_private *tp)
>>>
>>> This hack will probably only work properly on RTL_GIGA_MAC_VER_40 or
>>> later NICs.
>>>
>>> Before running any tests please verify with "ethtool -d enp3s0" that
>>> TxConfig register now contains 0x4f000f80, as it did in the old,
>>> working driver version.
>>>
>>> If this does not help then a bisection will most likely be needed.
>>>
>>>> Thanks in advance,
>>>> David Arendt
>>> Maciej
>>
>>
> @Gabriel:
> Thanks for the hint, I wasn't fully aware of this thread.
> @Maciej:
> Thanks for the analysis.
>
> It seems that all chip versions from 34 (= RTL8168E-VL) with the
> exception of version 39 (= RTL8106E, first sub-version) need
> bit TXCFG_AUTO_FIFO.
>
> And indeed, due to reordering of calls this bit is overwritten.
> Following patch moves setting the bit from the chip-specific
> hw_start function to rtl_set_tx_config_registers().
>
> Whoever is hit by the issue and has the option to build a kernel,
> could you please test whether the patch fixes the issue for you?
>
> Thanks, Heiner
>
> ---
>  drivers/net/ethernet/realtek/r8169.c | 20 ++++++++------------
>  1 file changed, 8 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
> index f882be49f..ae8abe900 100644
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
> @@ -4514,9 +4514,14 @@ static void rtl8169_hw_reset(struct rtl8169_private *tp)
>  
>  static void rtl_set_tx_config_registers(struct rtl8169_private *tp)
>  {
> -	/* Set DMA burst size and Interframe Gap Time */
> -	RTL_W32(tp, TxConfig, (TX_DMA_BURST << TxDMAShift) |
> -		(InterFrameGap << TxInterFrameGapShift));
> +	u32 val = TX_DMA_BURST << TxDMAShift |
> +		  InterFrameGap << TxInterFrameGapShift;
> +
> +	if (tp->mac_version >= RTL_GIGA_MAC_VER_34 &&
> +	    tp->mac_version != RTL_GIGA_MAC_VER_39)
> +		val |= TXCFG_AUTO_FIFO;
> +
> +	RTL_W32(tp, TxConfig, val);
>  }
>  
>  static void rtl_set_rx_max_size(struct rtl8169_private *tp)
> @@ -5011,7 +5016,6 @@ static void rtl_hw_start_8168e_2(struct rtl8169_private *tp)
>  
>  	rtl_disable_clock_request(tp);
>  
> -	RTL_W32(tp, TxConfig, RTL_R32(tp, TxConfig) | TXCFG_AUTO_FIFO);
>  	RTL_W8(tp, MCU, RTL_R8(tp, MCU) & ~NOW_IS_OOB);
>  
>  	/* Adjust EEE LED frequency */
> @@ -5045,7 +5049,6 @@ static void rtl_hw_start_8168f(struct rtl8169_private *tp)
>  
>  	rtl_disable_clock_request(tp);
>  
> -	RTL_W32(tp, TxConfig, RTL_R32(tp, TxConfig) | TXCFG_AUTO_FIFO);
>  	RTL_W8(tp, MCU, RTL_R8(tp, MCU) & ~NOW_IS_OOB);
>  	RTL_W8(tp, DLLPR, RTL_R8(tp, DLLPR) | PFM_EN);
>  	RTL_W32(tp, MISC, RTL_R32(tp, MISC) | PWM_EN);
> @@ -5090,8 +5093,6 @@ static void rtl_hw_start_8411(struct rtl8169_private *tp)
>  
>  static void rtl_hw_start_8168g(struct rtl8169_private *tp)
>  {
> -	RTL_W32(tp, TxConfig, RTL_R32(tp, TxConfig) | TXCFG_AUTO_FIFO);
> -
>  	rtl_eri_write(tp, 0xc8, ERIAR_MASK_0101, 0x080002, ERIAR_EXGMAC);
>  	rtl_eri_write(tp, 0xcc, ERIAR_MASK_0001, 0x38, ERIAR_EXGMAC);
>  	rtl_eri_write(tp, 0xd0, ERIAR_MASK_0001, 0x48, ERIAR_EXGMAC);
> @@ -5189,8 +5190,6 @@ static void rtl_hw_start_8168h_1(struct rtl8169_private *tp)
>  	rtl_hw_aspm_clkreq_enable(tp, false);
>  	rtl_ephy_init(tp, e_info_8168h_1, ARRAY_SIZE(e_info_8168h_1));
>  
> -	RTL_W32(tp, TxConfig, RTL_R32(tp, TxConfig) | TXCFG_AUTO_FIFO);
> -
>  	rtl_eri_write(tp, 0xc8, ERIAR_MASK_0101, 0x00080002, ERIAR_EXGMAC);
>  	rtl_eri_write(tp, 0xcc, ERIAR_MASK_0001, 0x38, ERIAR_EXGMAC);
>  	rtl_eri_write(tp, 0xd0, ERIAR_MASK_0001, 0x48, ERIAR_EXGMAC);
> @@ -5273,8 +5272,6 @@ static void rtl_hw_start_8168ep(struct rtl8169_private *tp)
>  {
>  	rtl8168ep_stop_cmac(tp);
>  
> -	RTL_W32(tp, TxConfig, RTL_R32(tp, TxConfig) | TXCFG_AUTO_FIFO);
> -
>  	rtl_eri_write(tp, 0xc8, ERIAR_MASK_0101, 0x00080002, ERIAR_EXGMAC);
>  	rtl_eri_write(tp, 0xcc, ERIAR_MASK_0001, 0x2f, ERIAR_EXGMAC);
>  	rtl_eri_write(tp, 0xd0, ERIAR_MASK_0001, 0x5f, ERIAR_EXGMAC);
> @@ -5596,7 +5593,6 @@ static void rtl_hw_start_8402(struct rtl8169_private *tp)
>  	/* Force LAN exit from ASPM if Rx/Tx are not idle */
>  	RTL_W32(tp, FuncEvent, RTL_R32(tp, FuncEvent) | 0x002800);
>  
> -	RTL_W32(tp, TxConfig, RTL_R32(tp, TxConfig) | TXCFG_AUTO_FIFO);
>  	RTL_W8(tp, MCU, RTL_R8(tp, MCU) & ~NOW_IS_OOB);
>  
>  	rtl_ephy_init(tp, e_info_8402, ARRAY_SIZE(e_info_8402));



^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2018-09-27 19:34 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-04  6:19 kernel 4.18.5 Realtek 8111G network adapter stops responding under high system load David Arendt
2018-09-15 21:23 ` David Arendt
2018-09-15 23:54   ` Maciej S. Szmigiero
2018-09-16 12:38     ` David Arendt
2018-09-16 23:11       ` Maciej S. Szmigiero
2018-09-18 10:23     ` David Arendt
2018-09-18 22:30       ` Maciej S. Szmigiero
2018-09-19  4:12         ` David Arendt
2018-09-25 21:03           ` Heiner Kallweit
2018-09-26 16:44             ` David Arendt
2018-09-27 15:48             ` Ortwin Glück
2018-09-27 19:33             ` David Arendt
2018-09-19 16:34         ` David Arendt
2018-09-21 22:28           ` Maciej S. Szmigiero

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).