regressions.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* [REGRESSION] r8169: RTL8168h "transmit queue 0 timed out" after ASPM L1 enablement
@ 2022-06-08  0:44 Bernhard Hampel-Waffenthal
  2022-06-08  7:30 ` Heiner Kallweit
  2022-06-08 13:19 ` [REGRESSION] r8169: RTL8168h "transmit queue 0 timed out" after ASPM L1 enablement Hans de Goede
  0 siblings, 2 replies; 5+ messages in thread
From: Bernhard Hampel-Waffenthal @ 2022-06-08  0:44 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: nic_swsd, netdev, regressions, Jakub Kicinski

#regzbot introduced: 4b5f82f6aaef3fa95cce52deb8510f55ddda6a71

Hi,

since the last major kernel version upgrade to 5.18 on Arch Linux I'm 
unable to get a usable ethernet connection on my desktop PC.

I can see a timeout in the logs

 > kernel: NETDEV WATCHDOG: enp37s0 (r8169): transmit queue 0 timed out

and regular very likely related errors after

 > kernel: r8169 0000:25:00.0 enp37s0: rtl_rxtx_empty_cond == 0 (loop: 
42, delay: 100).


The link does manage to go up at nominal full 1Gbps speed, but there is 
no usable connection to speak of and pings are very bursty and take 
multiple seconds.

I was able to pinpoint that the problems were introduced in commit 
4b5f82f6aaef3fa95cce52deb8510f55ddda6a71 with the enablement of ASPM 
L1/L1.1 for ">= RTL_GIGA_MAC_VER_45", which my chip falls under. Adding 
pcie_aspm=off the kernel command line or changing that check to ">= 
RTL_GIGA_MAC_VER_60" for testing purposes and recompiling the kernel 
fixes my problems.


I'm using a MSI B450I GAMING PLUS AC motherboard with a RTL8168h chip as 
per dmesg:

 > r8169 0000:25:00.0 eth0: RTL8168h/8111h, 30:9c:23:de:97:a9, XID 541, 
IRQ 101

lspci says:

 > 25:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. 
RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] 
(rev 15)
         Subsystem: Micro-Star International Co., Ltd. [MSI] Device 
[1462:7a40]
         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- 
ParErr- Stepping- SERR- FastB2B- DisINTx+
         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- 
<TAbort- <MAbort- >SERR- <PERR- INTx-
         Latency: 0, Cache Line Size: 64 bytes
         Interrupt: pin A routed to IRQ 30
         IOMMU group: 14
         Region 0: I/O ports at f000 [size=256]
         Region 2: Memory at fcb04000 (64-bit, non-prefetchable) [size=4K]
         Region 4: Memory at fcb00000 (64-bit, non-prefetchable) [size=16K]
         Capabilities: <access denied>
         Kernel driver in use: r8169
         Kernel modules: r8169


If you need more info I'll do my best to provide what I can, hope that 
helps already.

Regards,
Bernhard

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [REGRESSION] r8169: RTL8168h "transmit queue 0 timed out" after ASPM L1 enablement
  2022-06-08  0:44 [REGRESSION] r8169: RTL8168h "transmit queue 0 timed out" after ASPM L1 enablement Bernhard Hampel-Waffenthal
@ 2022-06-08  7:30 ` Heiner Kallweit
  2022-06-20  6:40   ` Thorsten Leemhuis
  2022-06-08 13:19 ` [REGRESSION] r8169: RTL8168h "transmit queue 0 timed out" after ASPM L1 enablement Hans de Goede
  1 sibling, 1 reply; 5+ messages in thread
From: Heiner Kallweit @ 2022-06-08  7:30 UTC (permalink / raw)
  To: Bernhard Hampel-Waffenthal; +Cc: nic_swsd, netdev, regressions, Jakub Kicinski

On 08.06.2022 02:44, Bernhard Hampel-Waffenthal wrote:
> #regzbot introduced: 4b5f82f6aaef3fa95cce52deb8510f55ddda6a71
>
> Hi,
>
> since the last major kernel version upgrade to 5.18 on Arch Linux I'm unable to get a usable ethernet connection on my desktop PC.
>
> I can see a timeout in the logs
>
> > kernel: NETDEV WATCHDOG: enp37s0 (r8169): transmit queue 0 timed out
>
> and regular very likely related errors after
>
> > kernel: r8169 0000:25:00.0 enp37s0: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100).
>
>
> The link does manage to go up at nominal full 1Gbps speed, but there is no usable connection to speak of and pings are very bursty and take multiple seconds.
>
> I was able to pinpoint that the problems were introduced in commit 4b5f82f6aaef3fa95cce52deb8510f55ddda6a71 with the enablement of ASPM L1/L1.1 for ">= RTL_GIGA_MAC_VER_45", which my chip falls under. Adding pcie_aspm=off the kernel command line or changing that check to ">= RTL_GIGA_MAC_VER_60" for testing purposes and recompiling the kernel fixes my problems.
>
>
> I'm using a MSI B450I GAMING PLUS AC motherboard with a RTL8168h chip as per dmesg:
>
> > r8169 0000:25:00.0 eth0: RTL8168h/8111h, 30:9c:23:de:97:a9, XID 541, IRQ 101
>
> lspci says:
>
> > 25:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 15)
>         Subsystem: Micro-Star International Co., Ltd. [MSI] Device [1462:7a40]
>         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Latency: 0, Cache Line Size: 64 bytes
>         Interrupt: pin A routed to IRQ 30
>         IOMMU group: 14
>         Region 0: I/O ports at f000 [size=256]
>         Region 2: Memory at fcb04000 (64-bit, non-prefetchable) [size=4K]
>         Region 4: Memory at fcb00000 (64-bit, non-prefetchable) [size=16K]
>         Capabilities: <access denied>
>         Kernel driver in use: r8169
>         Kernel modules: r8169
>
Thanks for the report. On my test systems RTL8168h works fine with ASPM L1 and L1.1, so it seems to be
a board-specific issue. Some reports in the past indicated that changing IOMMU settings may help,
you can also use the ASPM sysfs link attributes to disable selected ASPM states for just this link.

>
> If you need more info I'll do my best to provide what I can, hope that helps already.
>
> Regards,
> Bernhard



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [REGRESSION] r8169: RTL8168h "transmit queue 0 timed out" after ASPM L1 enablement
  2022-06-08  0:44 [REGRESSION] r8169: RTL8168h "transmit queue 0 timed out" after ASPM L1 enablement Bernhard Hampel-Waffenthal
  2022-06-08  7:30 ` Heiner Kallweit
@ 2022-06-08 13:19 ` Hans de Goede
  1 sibling, 0 replies; 5+ messages in thread
From: Hans de Goede @ 2022-06-08 13:19 UTC (permalink / raw)
  To: Bernhard Hampel-Waffenthal, Heiner Kallweit
  Cc: nic_swsd, netdev, regressions, Jakub Kicinski

Hi,

On 6/8/22 02:44, Bernhard Hampel-Waffenthal wrote:
> #regzbot introduced: 4b5f82f6aaef3fa95cce52deb8510f55ddda6a71
> 
> Hi,
> 
> since the last major kernel version upgrade to 5.18 on Arch Linux I'm unable to get a usable ethernet connection on my desktop PC.
> 
> I can see a timeout in the logs
> 
>> kernel: NETDEV WATCHDOG: enp37s0 (r8169): transmit queue 0 timed out
> 
> and regular very likely related errors after
> 
>> kernel: r8169 0000:25:00.0 enp37s0: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100).
> 
> 
> The link does manage to go up at nominal full 1Gbps speed, but there is no usable connection to speak of and pings are very bursty and take multiple seconds.
> 
> I was able to pinpoint that the problems were introduced in commit 4b5f82f6aaef3fa95cce52deb8510f55ddda6a71 with the enablement of ASPM L1/L1.1 for ">= RTL_GIGA_MAC_VER_45", which my chip falls under. Adding pcie_aspm=off the kernel command line or changing that check to ">= RTL_GIGA_MAC_VER_60" for testing purposes and recompiling the kernel fixes my problems.
> 
> 
> I'm using a MSI B450I GAMING PLUS AC motherboard with a RTL8168h chip as per dmesg:

Hmm, my main workstation has a "MSI B550M PRO-VDH" which is similar(ish)
to your motherboard and is using the exact same ethernet controller and
I'm not seeing any issues with 5.18.0.

ASPM issues may be BIOS related, are you at the latest BIOS version?

And are all your (relevant) BIOS settings set to the default settings?

Regards,

Hans


> 
>> r8169 0000:25:00.0 eth0: RTL8168h/8111h, 30:9c:23:de:97:a9, XID 541, IRQ 101
> 
> lspci says:
> 
>> 25:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 15)
>         Subsystem: Micro-Star International Co., Ltd. [MSI] Device [1462:7a40]
>         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Latency: 0, Cache Line Size: 64 bytes
>         Interrupt: pin A routed to IRQ 30
>         IOMMU group: 14
>         Region 0: I/O ports at f000 [size=256]
>         Region 2: Memory at fcb04000 (64-bit, non-prefetchable) [size=4K]
>         Region 4: Memory at fcb00000 (64-bit, non-prefetchable) [size=16K]
>         Capabilities: <access denied>
>         Kernel driver in use: r8169
>         Kernel modules: r8169
> 
> 
> If you need more info I'll do my best to provide what I can, hope that helps already.
> 
> Regards,
> Bernhard
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [REGRESSION] r8169: RTL8168h "transmit queue 0 timed out" after ASPM L1 enablement
  2022-06-08  7:30 ` Heiner Kallweit
@ 2022-06-20  6:40   ` Thorsten Leemhuis
  2022-07-04 10:56     ` [REGRESSION] r8169: RTL8168h "transmit queue 0 timed out" after ASPM L1 enablement #forregzbot Thorsten Leemhuis
  0 siblings, 1 reply; 5+ messages in thread
From: Thorsten Leemhuis @ 2022-06-20  6:40 UTC (permalink / raw)
  To: Heiner Kallweit, Bernhard Hampel-Waffenthal
  Cc: nic_swsd, netdev, regressions, Jakub Kicinski



On 08.06.22 09:30, Heiner Kallweit wrote:
> On 08.06.2022 02:44, Bernhard Hampel-Waffenthal wrote:
>> #regzbot introduced: 4b5f82f6aaef3fa95cce52deb8510f55ddda6a71
>>
>> since the last major kernel version upgrade to 5.18 on Arch Linux I'm unable to get a usable ethernet connection on my desktop PC.
>>
>> I can see a timeout in the logs
>>
>>> kernel: NETDEV WATCHDOG: enp37s0 (r8169): transmit queue 0 timed out
>>
>> and regular very likely related errors after
>>
>>> kernel: r8169 0000:25:00.0 enp37s0: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100).
>>
>> The link does manage to go up at nominal full 1Gbps speed, but there is no usable connection to speak of and pings are very bursty and take multiple seconds.
>>
>> I was able to pinpoint that the problems were introduced in commit 4b5f82f6aaef3fa95cce52deb8510f55ddda6a71 with the enablement of ASPM L1/L1.1 for ">= RTL_GIGA_MAC_VER_45", which my chip falls under. Adding pcie_aspm=off the kernel command line or changing that check to ">= RTL_GIGA_MAC_VER_60" for testing purposes and recompiling the kernel fixes my problems.
>>
>> I'm using a MSI B450I GAMING PLUS AC motherboard with a RTL8168h chip as per dmesg:
>>
>>> r8169 0000:25:00.0 eth0: RTL8168h/8111h, 30:9c:23:de:97:a9, XID 541, IRQ 101
>>
>> lspci says:
>>
>>> 25:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 15)
>>         Subsystem: Micro-Star International Co., Ltd. [MSI] Device [1462:7a40]
>>         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
>>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>>         Latency: 0, Cache Line Size: 64 bytes
>>         Interrupt: pin A routed to IRQ 30
>>         IOMMU group: 14
>>         Region 0: I/O ports at f000 [size=256]
>>         Region 2: Memory at fcb04000 (64-bit, non-prefetchable) [size=4K]
>>         Region 4: Memory at fcb00000 (64-bit, non-prefetchable) [size=16K]
>>         Capabilities: <access denied>
>>         Kernel driver in use: r8169
>>         Kernel modules: r8169
>>
> Thanks for the report. On my test systems RTL8168h works fine with ASPM L1 and L1.1, so it seems to be
> a board-specific issue.

Well, we already removed changes like the one causing this if things
like ASPM cause regressions only for some users because their HW is
flawky. But I'd prefer to avoid that myself.

> Some reports in the past indicated that changing IOMMU settings may help,
> you can also use the ASPM sysfs link attributes to disable selected ASPM states for just this link.

Bernhard, did this or the suggestions from Hans help to solve the
problem for you?

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.

>> If you need more info I'll do my best to provide what I can, hope that helps already.
>>
>> Regards,
>> Bernhard
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [REGRESSION] r8169: RTL8168h "transmit queue 0 timed out" after ASPM L1 enablement #forregzbot
  2022-06-20  6:40   ` Thorsten Leemhuis
@ 2022-07-04 10:56     ` Thorsten Leemhuis
  0 siblings, 0 replies; 5+ messages in thread
From: Thorsten Leemhuis @ 2022-07-04 10:56 UTC (permalink / raw)
  To: regressions

TWIMC: this mail is primarily send for documentation purposes and for
regzbot, my Linux kernel regression tracking bot. These mails usually
contain '#forregzbot' in the subject, to make them easy to spot and filter.

On 20.06.22 08:40, Thorsten Leemhuis wrote:
> On 08.06.22 09:30, Heiner Kallweit wrote:
>> On 08.06.2022 02:44, Bernhard Hampel-Waffenthal wrote:
>>> #regzbot introduced: 4b5f82f6aaef3fa95cce52deb8510f55ddda6a71
>>>
>>> since the last major kernel version upgrade to 5.18 on Arch Linux I'm unable to get a usable ethernet connection on my desktop PC.
>>>
>>> I can see a timeout in the logs
>>>
>>>> kernel: NETDEV WATCHDOG: enp37s0 (r8169): transmit queue 0 timed out
>>>
>>> and regular very likely related errors after
>>>
>>>> kernel: r8169 0000:25:00.0 enp37s0: rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100).
>>>
>>> The link does manage to go up at nominal full 1Gbps speed, but there is no usable connection to speak of and pings are very bursty and take multiple seconds.
>>>
>>> I was able to pinpoint that the problems were introduced in commit 4b5f82f6aaef3fa95cce52deb8510f55ddda6a71 with the enablement of ASPM L1/L1.1 for ">= RTL_GIGA_MAC_VER_45", which my chip falls under. Adding pcie_aspm=off the kernel command line or changing that check to ">= RTL_GIGA_MAC_VER_60" for testing purposes and recompiling the kernel fixes my problems.
>>>
>>> I'm using a MSI B450I GAMING PLUS AC motherboard with a RTL8168h chip as per dmesg:
>>>
>>>> r8169 0000:25:00.0 eth0: RTL8168h/8111h, 30:9c:23:de:97:a9, XID 541, IRQ 101
>>>
>>> lspci says:
>>>
>>>> 25:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 15)
>>>         Subsystem: Micro-Star International Co., Ltd. [MSI] Device [1462:7a40]
>>>         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
>>>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>>>         Latency: 0, Cache Line Size: 64 bytes
>>>         Interrupt: pin A routed to IRQ 30
>>>         IOMMU group: 14
>>>         Region 0: I/O ports at f000 [size=256]
>>>         Region 2: Memory at fcb04000 (64-bit, non-prefetchable) [size=4K]
>>>         Region 4: Memory at fcb00000 (64-bit, non-prefetchable) [size=16K]
>>>         Capabilities: <access denied>
>>>         Kernel driver in use: r8169
>>>         Kernel modules: r8169
>>>
>> Thanks for the report. On my test systems RTL8168h works fine with ASPM L1 and L1.1, so it seems to be
>> a board-specific issue.
> 
> Well, we already removed changes like the one causing this if things
> like ASPM cause regressions only for some users because their HW is
> flawky. But I'd prefer to avoid that myself.
> 
>> Some reports in the past indicated that changing IOMMU settings may help,
>> you can also use the ASPM sysfs link attributes to disable selected ASPM states for just this link.
> 
> Bernhard, did this or the suggestions from Hans help to solve the
> problem for you?
> 
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> 
> P.S.: As the Linux kernel's regression tracker I deal with a lot of
> reports and sometimes miss something important when writing mails like
> this. If that's the case here, don't hesitate to tell me in a public
> reply, it's in everyone's interest to set the public record straight.

#regzbot invalid: reporter didn't reply, might be satisfied with the
help he got, and a tricky situation anyway

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-07-04 10:56 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-08  0:44 [REGRESSION] r8169: RTL8168h "transmit queue 0 timed out" after ASPM L1 enablement Bernhard Hampel-Waffenthal
2022-06-08  7:30 ` Heiner Kallweit
2022-06-20  6:40   ` Thorsten Leemhuis
2022-07-04 10:56     ` [REGRESSION] r8169: RTL8168h "transmit queue 0 timed out" after ASPM L1 enablement #forregzbot Thorsten Leemhuis
2022-06-08 13:19 ` [REGRESSION] r8169: RTL8168h "transmit queue 0 timed out" after ASPM L1 enablement Hans de Goede

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).