linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] r8169: don't use MSI-X on RTL8106e
@ 2018-08-15  6:21 jian-hong
  2018-08-16 19:21 ` David Miller
  2018-08-17  5:07 ` [PATCH v2 net] " Jian-Hong Pan
  0 siblings, 2 replies; 20+ messages in thread
From: jian-hong @ 2018-08-15  6:21 UTC (permalink / raw)
  To: Realtek linux nic maintainers, David S. Miller, netdev,
	linux-kernel, linux
  Cc: Jian-Hong Pan

From: Jian-Hong Pan <jian-hong@endlessm.com>

Found the ethernet network on ASUS X441UAR doesn't come back on resume
from suspend when using MSI-X.  The chip is RTL8106e - version 39.

asus@endless:~$ dmesg | grep r8169
[   21.848357] libphy: r8169: probed
[   21.848473] r8169 0000:02:00.0 eth0: RTL8106e, 0c:9d:92:32:67:b4, XID
44900000, IRQ 127
[   22.518860] r8169 0000:02:00.0 enp2s0: renamed from eth0
[   29.458041] Generic PHY r8169-200:00: attached PHY driver [Generic
PHY] (mii_bus:phy_addr=r8169-200:00, irq=IGNORE)
[   63.227398] r8169 0000:02:00.0 enp2s0: Link is Up - 100Mbps/Full -
flow control off
[  124.514648] Generic PHY r8169-200:00: attached PHY driver [Generic
PHY] (mii_bus:phy_addr=r8169-200:00, irq=IGNORE)

Here is the ethernet controller in detail:

asus@endless:~$ sudo lspci -nnvs 02:00.0
[sudo] password for asus:
02:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd.
RTL8101/2/6E PCI Express Fast/Gigabit Ethernet controller [10ec:8136]
(rev 07)
	Subsystem: ASUSTeK Computer Inc. RTL810xE PCI Express Fast
Ethernet controller [1043:200f]
	Flags: bus master, fast devsel, latency 0, IRQ 16
	I/O ports at e000 [size=256]
	Memory at ef100000 (64-bit, non-prefetchable) [size=4K]
	Memory at e0000000 (64-bit, prefetchable) [size=16K]
	Capabilities: [40] Power Management version 3
	Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
	Capabilities: [70] Express Endpoint, MSI 01
	Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
	Capabilities: [d0] Vital Product Data
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [140] Virtual Channel
	Capabilities: [160] Device Serial Number 01-00-00-00-36-4c-e0-00
	Capabilities: [170] Latency Tolerance Reporting
	Kernel driver in use: r8169
	Kernel modules: r8169

Here is the system interrupt table:

asus@endless:~$ cat /proc/interrupts
            CPU0       CPU1       CPU2       CPU3
   0:         22          0          0          0   IO-APIC    2-edge
timer
   1:        157         42          0          0   IO-APIC    1-edge
i8042
   8:          0          0          1          0   IO-APIC    8-edge
rtc0
   9:         10         13          0          0   IO-APIC    9-fasteoi
acpi
  16:          0          0          0          0   IO-APIC   16-fasteoi
i2c_designware.0, i801_smbus
  17:       2445          0       3453          0   IO-APIC   17-fasteoi
i2c_designware.1, rtl_pci
 109:          2          0          0          1   IO-APIC  109-fasteoi
FTE1200:00
 120:          0          0          0          0   PCI-MSI 458752-edge
PCIe PME
 121:          0          0          0          0   PCI-MSI 466944-edge
PCIe PME
 122:          0          0          0          0   PCI-MSI 468992-edge
PCIe PME
 123:       1465          0          0      21263   PCI-MSI 376832-edge
ahci[0000:00:17.0]
 124:          0        530          0          0   PCI-MSI 327680-edge
xhci_hcd
 125:       5204          0          0          0   PCI-MSI 32768-edge
i915
 126:          0          0        149          0   PCI-MSI 514048-edge
snd_hda_intel:card0
 127:          0          0        337          0   PCI-MSI 1048576-edge
enp2s0
 NMI:          0          0          0          0   Non-maskable
interrupts
 LOC:      45049      39474      38978      46677   Local timer
interrupts
 SPU:          0          0          0          0   Spurious interrupts
 PMI:          0          0          0          0   Performance
monitoring interrupts
 IWI:        619          8          0          1   IRQ work interrupts
 RTR:          6          0          0          0   APIC ICR read
retries
 RES:       4918       4436       3835       2943   Rescheduling
interrupts
 CAL:       1399       1478       1598       1465   Function call
interrupts
 TLB:        608        513        723        559   TLB shootdowns
 TRM:          0          0          0          0   Thermal event
interrupts
 THR:          0          0          0          0   Threshold APIC
interrupts
 DFR:          0          0          0          0   Deferred Error APIC
interrupts
 MCE:          0          0          0          0   Machine check
exceptions
 MCP:          3          4          4          4   Machine check polls
 ERR:          0
 MIS:          0
 PIN:          0          0          0          0   Posted-interrupt
notification event
 NPI:          0          0          0          0   Nested
posted-interrupt event
 PIW:          0          0          0          0   Posted-interrupt
wakeup event

It is the IRQ 127 - PCI-MSI used by enp2s0.  However, lspci lists MSI is
disabled and MSI-X is enabled which conflicts to the interrupt table.

Falling back to MSI fixes the issue.

Here is the test result with this patch in dmesg:

asus@endless:~$ dmesg | grep r8169
[   22.017477] libphy: r8169: probed
[   22.017735] r8169 0000:02:00.0 eth0: RTL8106e, 0c:9d:92:32:67:b4, XID
44900000, IRQ 127
[   22.041489] r8169 0000:02:00.0 enp2s0: renamed from eth0
[   29.138312] Generic PHY r8169-200:00: attached PHY driver [Generic
PHY] (mii_bus:phy_addr=r8169-200:00, irq=IGNORE)
[   30.927359] r8169 0000:02:00.0 enp2s0: Link is Up - 100Mbps/Full -
flow control off
[  289.998077] r8169 0000:02:00.0 enp2s0: Link is Up - 100Mbps/Full -
flow control off
[  290.508084] Generic PHY r8169-200:00: attached PHY driver [Generic
PHY] (mii_bus:phy_addr=r8169-200:00, irq=IGNORE)
[  290.745690] r8169 0000:02:00.0 enp2s0: Link is Down
[  292.367717] r8169 0000:02:00.0 enp2s0: Link is Up - 100Mbps/Full -
flow control off

lspci lists MSI is enabled and MSI-X is disabled with this patch:

asus@endless:~/linux-net$ sudo lspci -nnvs 02:00.0
[sudo] password for asus:
02:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd.
RTL8101/2/6E PCI Express Fast/Gigabit Ethernet controller [10ec:8136]
(rev 07)
	Subsystem: ASUSTeK Computer Inc. RTL810xE PCI Express Fast
Ethernet controller [1043:200f]
	Flags: bus master, fast devsel, latency 0, IRQ 127
	I/O ports at e000 [size=256]
	Memory at ef100000 (64-bit, non-prefetchable) [size=4K]
	Memory at e0000000 (64-bit, prefetchable) [size=16K]
	Capabilities: [40] Power Management version 3
	Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
	Capabilities: [70] Express Endpoint, MSI 01
	Capabilities: [b0] MSI-X: Enable- Count=4 Masked-
	Capabilities: [d0] Vital Product Data
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [140] Virtual Channel
	Capabilities: [160] Device Serial Number 01-00-00-00-36-4c-e0-00
	Capabilities: [170] Latency Tolerance Reporting
	Kernel driver in use: r8169
	Kernel modules: r8169

Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com>
---
 drivers/net/ethernet/realtek/r8169.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index 0d9c3831838f..0efa977c422d 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -7071,17 +7071,20 @@ static int rtl_alloc_irq(struct rtl8169_private *tp)
 {
 	unsigned int flags;
 
-	if (tp->mac_version <= RTL_GIGA_MAC_VER_06) {
+	switch (tp->mac_version) {
+	case RTL_GIGA_MAC_VER_01 ... RTL_GIGA_MAC_VER_06:
 		RTL_W8(tp, Cfg9346, Cfg9346_Unlock);
 		RTL_W8(tp, Config2, RTL_R8(tp, Config2) & ~MSIEnable);
 		RTL_W8(tp, Cfg9346, Cfg9346_Lock);
 		flags = PCI_IRQ_LEGACY;
-	} else if (tp->mac_version == RTL_GIGA_MAC_VER_40) {
+		break;
+	case RTL_GIGA_MAC_VER_39 ... RTL_GIGA_MAC_VER_40:
 		/* This version was reported to have issues with resume
 		 * from suspend when using MSI-X
 		 */
 		flags = PCI_IRQ_LEGACY | PCI_IRQ_MSI;
-	} else {
+		break;
+	default:
 		flags = PCI_IRQ_ALL_TYPES;
 	}
 
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] r8169: don't use MSI-X on RTL8106e
  2018-08-15  6:21 [PATCH] r8169: don't use MSI-X on RTL8106e jian-hong
@ 2018-08-16 19:21 ` David Miller
  2018-08-16 19:37   ` Heiner Kallweit
  2018-08-17  5:07 ` [PATCH v2 net] " Jian-Hong Pan
  1 sibling, 1 reply; 20+ messages in thread
From: David Miller @ 2018-08-16 19:21 UTC (permalink / raw)
  To: jian-hong; +Cc: nic_swsd, netdev, hkallweit1, linux-kernel, linux

From: <jian-hong@endlessm.com>
Date: Wed, 15 Aug 2018 14:21:10 +0800

> Found the ethernet network on ASUS X441UAR doesn't come back on resume
> from suspend when using MSI-X.  The chip is RTL8106e - version 39.

Heiner, please take a look at this.

You recently disabled MSI-X on RTL8168g for similar reasons.

Now that we've seen two chips like this, maybe there is some other
problem afoot.

Thanks.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] r8169: don't use MSI-X on RTL8106e
  2018-08-16 19:21 ` David Miller
@ 2018-08-16 19:37   ` Heiner Kallweit
  2018-08-16 19:39     ` David Miller
  0 siblings, 1 reply; 20+ messages in thread
From: Heiner Kallweit @ 2018-08-16 19:37 UTC (permalink / raw)
  To: David Miller, jian-hong; +Cc: nic_swsd, netdev, linux-kernel, linux

On 16.08.2018 21:21, David Miller wrote:
> From: <jian-hong@endlessm.com>
> Date: Wed, 15 Aug 2018 14:21:10 +0800
> 
>> Found the ethernet network on ASUS X441UAR doesn't come back on resume
>> from suspend when using MSI-X.  The chip is RTL8106e - version 39.
> 
> Heiner, please take a look at this.
> 
> You recently disabled MSI-X on RTL8168g for similar reasons.
> 
> Now that we've seen two chips like this, maybe there is some other
> problem afoot.
> 
Thanks for the hint. I saw it already and just contacted Realtek
whether they are aware of any MSI-X issues with particular chip
versions. With the chip versions I have access to MSI-X works fine.

There's also the theoretical option that the issues are caused by
broken BIOS's. But so far only chip versions have been reported
which are very similar, at least with regard to version number
(2x VER_40, 1x VER_39). So they may share some buggy component.

Let's see whether Realtek can provide some hint.
If more chip versions are reported having problems with MSI-X,
then we could switch to a whitelist or disable MSI-X in general.

Heiner

> Thanks.
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] r8169: don't use MSI-X on RTL8106e
  2018-08-16 19:37   ` Heiner Kallweit
@ 2018-08-16 19:39     ` David Miller
  2018-08-16 19:50       ` Heiner Kallweit
  0 siblings, 1 reply; 20+ messages in thread
From: David Miller @ 2018-08-16 19:39 UTC (permalink / raw)
  To: hkallweit1; +Cc: jian-hong, nic_swsd, netdev, linux-kernel, linux

From: Heiner Kallweit <hkallweit1@gmail.com>
Date: Thu, 16 Aug 2018 21:37:31 +0200

> On 16.08.2018 21:21, David Miller wrote:
>> From: <jian-hong@endlessm.com>
>> Date: Wed, 15 Aug 2018 14:21:10 +0800
>> 
>>> Found the ethernet network on ASUS X441UAR doesn't come back on resume
>>> from suspend when using MSI-X.  The chip is RTL8106e - version 39.
>> 
>> Heiner, please take a look at this.
>> 
>> You recently disabled MSI-X on RTL8168g for similar reasons.
>> 
>> Now that we've seen two chips like this, maybe there is some other
>> problem afoot.
>> 
> Thanks for the hint. I saw it already and just contacted Realtek
> whether they are aware of any MSI-X issues with particular chip
> versions. With the chip versions I have access to MSI-X works fine.
> 
> There's also the theoretical option that the issues are caused by
> broken BIOS's. But so far only chip versions have been reported
> which are very similar, at least with regard to version number
> (2x VER_40, 1x VER_39). So they may share some buggy component.
> 
> Let's see whether Realtek can provide some hint.
> If more chip versions are reported having problems with MSI-X,
> then we could switch to a whitelist or disable MSI-X in general.

It could be that we need to reprogram some register(s) on resume,
which normally might not be needed, and that is what is causing the
problem with some chips.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] r8169: don't use MSI-X on RTL8106e
  2018-08-16 19:39     ` David Miller
@ 2018-08-16 19:50       ` Heiner Kallweit
  2018-08-20 18:44         ` Bjorn Helgaas
  2018-08-20 20:40         ` Florian Fainelli
  0 siblings, 2 replies; 20+ messages in thread
From: Heiner Kallweit @ 2018-08-16 19:50 UTC (permalink / raw)
  To: David Miller; +Cc: jian-hong, nic_swsd, netdev, linux-kernel, linux

On 16.08.2018 21:39, David Miller wrote:
> From: Heiner Kallweit <hkallweit1@gmail.com>
> Date: Thu, 16 Aug 2018 21:37:31 +0200
> 
>> On 16.08.2018 21:21, David Miller wrote:
>>> From: <jian-hong@endlessm.com>
>>> Date: Wed, 15 Aug 2018 14:21:10 +0800
>>>
>>>> Found the ethernet network on ASUS X441UAR doesn't come back on resume
>>>> from suspend when using MSI-X.  The chip is RTL8106e - version 39.
>>>
>>> Heiner, please take a look at this.
>>>
>>> You recently disabled MSI-X on RTL8168g for similar reasons.
>>>
>>> Now that we've seen two chips like this, maybe there is some other
>>> problem afoot.
>>>
>> Thanks for the hint. I saw it already and just contacted Realtek
>> whether they are aware of any MSI-X issues with particular chip
>> versions. With the chip versions I have access to MSI-X works fine.
>>
>> There's also the theoretical option that the issues are caused by
>> broken BIOS's. But so far only chip versions have been reported
>> which are very similar, at least with regard to version number
>> (2x VER_40, 1x VER_39). So they may share some buggy component.
>>
>> Let's see whether Realtek can provide some hint.
>> If more chip versions are reported having problems with MSI-X,
>> then we could switch to a whitelist or disable MSI-X in general.
> 
> It could be that we need to reprogram some register(s) on resume,
> which normally might not be needed, and that is what is causing the
> problem with some chips.
> 
Indeed. That's what I'm checking with Realtek.
In the register list in the r8169 driver there's one entry which
seems to indicate that there are MSI-X specific settings.
However this register isn't used, and the r8168 vendor driver
uses only MSI. And there are no public datasheets.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 net] r8169: don't use MSI-X on RTL8106e
  2018-08-15  6:21 [PATCH] r8169: don't use MSI-X on RTL8106e jian-hong
  2018-08-16 19:21 ` David Miller
@ 2018-08-17  5:07 ` Jian-Hong Pan
  2018-08-19 18:01   ` David Miller
  1 sibling, 1 reply; 20+ messages in thread
From: Jian-Hong Pan @ 2018-08-17  5:07 UTC (permalink / raw)
  To: Heiner Kallweit, David Miller, nic_swsd, netdev, linux-kernel, linux
  Cc: Jian-Hong Pan

Found the ethernet network on ASUS X441UAR doesn't come back on resume
from suspend when using MSI-X.  The chip is RTL8106e - version 39.

[   21.848357] libphy: r8169: probed
[   21.848473] r8169 0000:02:00.0 eth0: RTL8106e, 0c:9d:92:32:67:b4, XID
44900000, IRQ 127
[   22.518860] r8169 0000:02:00.0 enp2s0: renamed from eth0
[   29.458041] Generic PHY r8169-200:00: attached PHY driver [Generic
PHY] (mii_bus:phy_addr=r8169-200:00, irq=IGNORE)
[   63.227398] r8169 0000:02:00.0 enp2s0: Link is Up - 100Mbps/Full -
flow control off
[  124.514648] Generic PHY r8169-200:00: attached PHY driver [Generic
PHY] (mii_bus:phy_addr=r8169-200:00, irq=IGNORE)

Here is the ethernet controller in detail:

02:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd.
RTL8101/2/6E PCI Express Fast/Gigabit Ethernet controller [10ec:8136]
(rev 07)
	Subsystem: ASUSTeK Computer Inc. RTL810xE PCI Express Fast
Ethernet controller [1043:200f]
	Flags: bus master, fast devsel, latency 0, IRQ 16
	I/O ports at e000 [size=256]
	Memory at ef100000 (64-bit, non-prefetchable) [size=4K]
	Memory at e0000000 (64-bit, prefetchable) [size=16K]
	Capabilities: <access denied>
	Kernel driver in use: r8169
	Kernel modules: r8169

Falling back to MSI fixes the issue.

Fixes: 6c6aa15fdea5 ("r8169: improve interrupt handling")
Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com>
---
Changes in v2:
  - Make the commit message shorter
  - Add "Fixes" tag in the commit message

 drivers/net/ethernet/realtek/r8169.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index 0d9c3831838f..0efa977c422d 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -7071,17 +7071,20 @@ static int rtl_alloc_irq(struct rtl8169_private *tp)
 {
 	unsigned int flags;
 
-	if (tp->mac_version <= RTL_GIGA_MAC_VER_06) {
+	switch (tp->mac_version) {
+	case RTL_GIGA_MAC_VER_01 ... RTL_GIGA_MAC_VER_06:
 		RTL_W8(tp, Cfg9346, Cfg9346_Unlock);
 		RTL_W8(tp, Config2, RTL_R8(tp, Config2) & ~MSIEnable);
 		RTL_W8(tp, Cfg9346, Cfg9346_Lock);
 		flags = PCI_IRQ_LEGACY;
-	} else if (tp->mac_version == RTL_GIGA_MAC_VER_40) {
+		break;
+	case RTL_GIGA_MAC_VER_39 ... RTL_GIGA_MAC_VER_40:
 		/* This version was reported to have issues with resume
 		 * from suspend when using MSI-X
 		 */
 		flags = PCI_IRQ_LEGACY | PCI_IRQ_MSI;
-	} else {
+		break;
+	default:
 		flags = PCI_IRQ_ALL_TYPES;
 	}
 
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 net] r8169: don't use MSI-X on RTL8106e
  2018-08-17  5:07 ` [PATCH v2 net] " Jian-Hong Pan
@ 2018-08-19 18:01   ` David Miller
  0 siblings, 0 replies; 20+ messages in thread
From: David Miller @ 2018-08-19 18:01 UTC (permalink / raw)
  To: jian-hong; +Cc: hkallweit1, nic_swsd, netdev, linux-kernel, linux

From: Jian-Hong Pan <jian-hong@endlessm.com>
Date: Fri, 17 Aug 2018 13:07:35 +0800

> Found the ethernet network on ASUS X441UAR doesn't come back on resume
> from suspend when using MSI-X.  The chip is RTL8106e - version 39.
 ...
> Here is the ethernet controller in detail:
 ...
> Falling back to MSI fixes the issue.
> 
> Fixes: 6c6aa15fdea5 ("r8169: improve interrupt handling")
> Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com>
> ---
> Changes in v2:
>   - Make the commit message shorter
>   - Add "Fixes" tag in the commit message

I'm going to apply this for now, and queue it up for -stable.

If we hear back from Realtek on something we can do to make MSI-X
work on these chips, we can deal with it as a follow-up.

Thanks.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] r8169: don't use MSI-X on RTL8106e
  2018-08-16 19:50       ` Heiner Kallweit
@ 2018-08-20 18:44         ` Bjorn Helgaas
  2018-08-20 20:46           ` Heiner Kallweit
  2018-08-21  8:28           ` Marc Zyngier
  2018-08-20 20:40         ` Florian Fainelli
  1 sibling, 2 replies; 20+ messages in thread
From: Bjorn Helgaas @ 2018-08-20 18:44 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: David Miller, jian-hong, nic_swsd, netdev, linux-kernel, linux,
	linux-pci, Marc Zyngier, Thomas Gleixner, Christoph Hellwig

[+cc Marc, Thomas, Christoph, linux-pci)
(beginning of thread at [1])

On Thu, Aug 16, 2018 at 09:50:48PM +0200, Heiner Kallweit wrote:
> On 16.08.2018 21:39, David Miller wrote:
> > From: Heiner Kallweit <hkallweit1@gmail.com>
> > Date: Thu, 16 Aug 2018 21:37:31 +0200
> > 
> >> On 16.08.2018 21:21, David Miller wrote:
> >>> From: <jian-hong@endlessm.com>
> >>> Date: Wed, 15 Aug 2018 14:21:10 +0800
> >>>
> >>>> Found the ethernet network on ASUS X441UAR doesn't come back on resume
> >>>> from suspend when using MSI-X.  The chip is RTL8106e - version 39.
> >>>
> >>> Heiner, please take a look at this.
> >>>
> >>> You recently disabled MSI-X on RTL8168g for similar reasons.
> >>>
> >>> Now that we've seen two chips like this, maybe there is some other
> >>> problem afoot.
> >>>
> >> Thanks for the hint. I saw it already and just contacted Realtek
> >> whether they are aware of any MSI-X issues with particular chip
> >> versions. With the chip versions I have access to MSI-X works fine.
> >>
> >> There's also the theoretical option that the issues are caused by
> >> broken BIOS's. But so far only chip versions have been reported
> >> which are very similar, at least with regard to version number
> >> (2x VER_40, 1x VER_39). So they may share some buggy component.
> >>
> >> Let's see whether Realtek can provide some hint.
> >> If more chip versions are reported having problems with MSI-X,
> >> then we could switch to a whitelist or disable MSI-X in general.
> > 
> > It could be that we need to reprogram some register(s) on resume,
> > which normally might not be needed, and that is what is causing the
> > problem with some chips.
> > 
> Indeed. That's what I'm checking with Realtek.
> In the register list in the r8169 driver there's one entry which
> seems to indicate that there are MSI-X specific settings.
> However this register isn't used, and the r8168 vendor driver
> uses only MSI. And there are no public datasheets.

Do we have any information about these chip versions in other systems?
Or other devices using MSI-X in the same ASUS system?  It seems
possible that there's some PCI core or suspend/resume issue with MSI-X
and this patch just avoids it without fixing the root cause.

It might be useful to have a kernel.org bugzilla with the complete
dmesg, "sudo lspci -vv" output, and /proc/interrupts contents archived
for future reference.

[1] https://lkml.kernel.org/r/20180815062110.16155-1-jian-hong@endlessm.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] r8169: don't use MSI-X on RTL8106e
  2018-08-16 19:50       ` Heiner Kallweit
  2018-08-20 18:44         ` Bjorn Helgaas
@ 2018-08-20 20:40         ` Florian Fainelli
  2018-08-20 20:56           ` Heiner Kallweit
  1 sibling, 1 reply; 20+ messages in thread
From: Florian Fainelli @ 2018-08-20 20:40 UTC (permalink / raw)
  To: Heiner Kallweit, David Miller
  Cc: jian-hong, nic_swsd, netdev, linux-kernel, linux

On 08/16/2018 12:50 PM, Heiner Kallweit wrote:
> On 16.08.2018 21:39, David Miller wrote:
>> From: Heiner Kallweit <hkallweit1@gmail.com>
>> Date: Thu, 16 Aug 2018 21:37:31 +0200
>>
>>> On 16.08.2018 21:21, David Miller wrote:
>>>> From: <jian-hong@endlessm.com>
>>>> Date: Wed, 15 Aug 2018 14:21:10 +0800
>>>>
>>>>> Found the ethernet network on ASUS X441UAR doesn't come back on resume
>>>>> from suspend when using MSI-X.  The chip is RTL8106e - version 39.
>>>>
>>>> Heiner, please take a look at this.
>>>>
>>>> You recently disabled MSI-X on RTL8168g for similar reasons.
>>>>
>>>> Now that we've seen two chips like this, maybe there is some other
>>>> problem afoot.
>>>>
>>> Thanks for the hint. I saw it already and just contacted Realtek
>>> whether they are aware of any MSI-X issues with particular chip
>>> versions. With the chip versions I have access to MSI-X works fine.
>>>
>>> There's also the theoretical option that the issues are caused by
>>> broken BIOS's. But so far only chip versions have been reported
>>> which are very similar, at least with regard to version number
>>> (2x VER_40, 1x VER_39). So they may share some buggy component.
>>>
>>> Let's see whether Realtek can provide some hint.
>>> If more chip versions are reported having problems with MSI-X,
>>> then we could switch to a whitelist or disable MSI-X in general.
>>
>> It could be that we need to reprogram some register(s) on resume,
>> which normally might not be needed, and that is what is causing the
>> problem with some chips.
>>
> Indeed. That's what I'm checking with Realtek.
> In the register list in the r8169 driver there's one entry which
> seems to indicate that there are MSI-X specific settings.
> However this register isn't used, and the r8168 vendor driver
> uses only MSI. And there are no public datasheets.

Stupid question, but should not we be asking the reporter to try again with:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bfdd19ad80f203f42f05fd32a31c678c9c524ef9

applied? The original report shows the Generic PHY being used, not the
Realtek PHY driver being used, is this possibly contributing to the problem?
-- 
Florian

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] r8169: don't use MSI-X on RTL8106e
  2018-08-20 18:44         ` Bjorn Helgaas
@ 2018-08-20 20:46           ` Heiner Kallweit
  2018-08-21 19:31             ` David Miller
  2018-08-21  8:28           ` Marc Zyngier
  1 sibling, 1 reply; 20+ messages in thread
From: Heiner Kallweit @ 2018-08-20 20:46 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: David Miller, jian-hong, nic_swsd, netdev, linux-kernel, linux,
	linux-pci, Marc Zyngier, Thomas Gleixner, Christoph Hellwig

On 20.08.2018 20:44, Bjorn Helgaas wrote:
> [+cc Marc, Thomas, Christoph, linux-pci)
> (beginning of thread at [1])
> 
> On Thu, Aug 16, 2018 at 09:50:48PM +0200, Heiner Kallweit wrote:
>> On 16.08.2018 21:39, David Miller wrote:
>>> From: Heiner Kallweit <hkallweit1@gmail.com>
>>> Date: Thu, 16 Aug 2018 21:37:31 +0200
>>>
>>>> On 16.08.2018 21:21, David Miller wrote:
>>>>> From: <jian-hong@endlessm.com>
>>>>> Date: Wed, 15 Aug 2018 14:21:10 +0800
>>>>>
>>>>>> Found the ethernet network on ASUS X441UAR doesn't come back on resume
>>>>>> from suspend when using MSI-X.  The chip is RTL8106e - version 39.
>>>>>
>>>>> Heiner, please take a look at this.
>>>>>
>>>>> You recently disabled MSI-X on RTL8168g for similar reasons.
>>>>>
>>>>> Now that we've seen two chips like this, maybe there is some other
>>>>> problem afoot.
>>>>>
>>>> Thanks for the hint. I saw it already and just contacted Realtek
>>>> whether they are aware of any MSI-X issues with particular chip
>>>> versions. With the chip versions I have access to MSI-X works fine.
>>>>
>>>> There's also the theoretical option that the issues are caused by
>>>> broken BIOS's. But so far only chip versions have been reported
>>>> which are very similar, at least with regard to version number
>>>> (2x VER_40, 1x VER_39). So they may share some buggy component.
>>>>
>>>> Let's see whether Realtek can provide some hint.
>>>> If more chip versions are reported having problems with MSI-X,
>>>> then we could switch to a whitelist or disable MSI-X in general.
>>>
>>> It could be that we need to reprogram some register(s) on resume,
>>> which normally might not be needed, and that is what is causing the
>>> problem with some chips.
>>>
>> Indeed. That's what I'm checking with Realtek.
>> In the register list in the r8169 driver there's one entry which
>> seems to indicate that there are MSI-X specific settings.
>> However this register isn't used, and the r8168 vendor driver
>> uses only MSI. And there are no public datasheets.
> 
> Do we have any information about these chip versions in other systems?
> Or other devices using MSI-X in the same ASUS system?  It seems
> possible that there's some PCI core or suspend/resume issue with MSI-X
> and this patch just avoids it without fixing the root cause.
> 
I'm in contact with Realtek and according to them few chip versions
seem to clear MSI-X table entries on resume from suspend. Checking
with them how this could be fixed / worked around.
Worst case we may have to disable MSI-X in general.

> It might be useful to have a kernel.org bugzilla with the complete
> dmesg, "sudo lspci -vv" output, and /proc/interrupts contents archived
> for future reference.
> 
> [1] https://lkml.kernel.org/r/20180815062110.16155-1-jian-hong@endlessm.com
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] r8169: don't use MSI-X on RTL8106e
  2018-08-20 20:40         ` Florian Fainelli
@ 2018-08-20 20:56           ` Heiner Kallweit
  0 siblings, 0 replies; 20+ messages in thread
From: Heiner Kallweit @ 2018-08-20 20:56 UTC (permalink / raw)
  To: Florian Fainelli, David Miller
  Cc: jian-hong, nic_swsd, netdev, linux-kernel, linux

On 20.08.2018 22:40, Florian Fainelli wrote:
> On 08/16/2018 12:50 PM, Heiner Kallweit wrote:
>> On 16.08.2018 21:39, David Miller wrote:
>>> From: Heiner Kallweit <hkallweit1@gmail.com>
>>> Date: Thu, 16 Aug 2018 21:37:31 +0200
>>>
>>>> On 16.08.2018 21:21, David Miller wrote:
>>>>> From: <jian-hong@endlessm.com>
>>>>> Date: Wed, 15 Aug 2018 14:21:10 +0800
>>>>>
>>>>>> Found the ethernet network on ASUS X441UAR doesn't come back on resume
>>>>>> from suspend when using MSI-X.  The chip is RTL8106e - version 39.
>>>>>
>>>>> Heiner, please take a look at this.
>>>>>
>>>>> You recently disabled MSI-X on RTL8168g for similar reasons.
>>>>>
>>>>> Now that we've seen two chips like this, maybe there is some other
>>>>> problem afoot.
>>>>>
>>>> Thanks for the hint. I saw it already and just contacted Realtek
>>>> whether they are aware of any MSI-X issues with particular chip
>>>> versions. With the chip versions I have access to MSI-X works fine.
>>>>
>>>> There's also the theoretical option that the issues are caused by
>>>> broken BIOS's. But so far only chip versions have been reported
>>>> which are very similar, at least with regard to version number
>>>> (2x VER_40, 1x VER_39). So they may share some buggy component.
>>>>
>>>> Let's see whether Realtek can provide some hint.
>>>> If more chip versions are reported having problems with MSI-X,
>>>> then we could switch to a whitelist or disable MSI-X in general.
>>>
>>> It could be that we need to reprogram some register(s) on resume,
>>> which normally might not be needed, and that is what is causing the
>>> problem with some chips.
>>>
>> Indeed. That's what I'm checking with Realtek.
>> In the register list in the r8169 driver there's one entry which
>> seems to indicate that there are MSI-X specific settings.
>> However this register isn't used, and the r8168 vendor driver
>> uses only MSI. And there are no public datasheets.
> 
> Stupid question, but should not we be asking the reporter to try again with:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bfdd19ad80f203f42f05fd32a31c678c9c524ef9
> 
> applied? The original report shows the Generic PHY being used, not the
> Realtek PHY driver being used, is this possibly contributing to the problem?
> 
I don't think it's related, because falling back to MSI fixes the issue for
the reporter. And some chip versions report a generic Realtek PHY ID which
isn't covered by any Realtek PHY driver. These chip versions seem to work
fine with the generic PHY driver. So he may have Realtek PHY drivers enabled
or not. But indeed, would be good to have this info to get the full picture.

See also the mail I wrote few minutes ago, there it's described what we know
about the reason of the MSI-X issue so far.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] r8169: don't use MSI-X on RTL8106e
  2018-08-20 18:44         ` Bjorn Helgaas
  2018-08-20 20:46           ` Heiner Kallweit
@ 2018-08-21  8:28           ` Marc Zyngier
  2018-08-21 20:54             ` Heiner Kallweit
  1 sibling, 1 reply; 20+ messages in thread
From: Marc Zyngier @ 2018-08-21  8:28 UTC (permalink / raw)
  To: Bjorn Helgaas, Heiner Kallweit, jian-hong
  Cc: David Miller, nic_swsd, netdev, linux-kernel, linux, linux-pci,
	Thomas Gleixner, Christoph Hellwig

On 20/08/18 19:44, Bjorn Helgaas wrote:
> [+cc Marc, Thomas, Christoph, linux-pci)
> (beginning of thread at [1])
> 
> On Thu, Aug 16, 2018 at 09:50:48PM +0200, Heiner Kallweit wrote:
>> On 16.08.2018 21:39, David Miller wrote:
>>> From: Heiner Kallweit <hkallweit1@gmail.com>
>>> Date: Thu, 16 Aug 2018 21:37:31 +0200
>>>
>>>> On 16.08.2018 21:21, David Miller wrote:
>>>>> From: <jian-hong@endlessm.com>
>>>>> Date: Wed, 15 Aug 2018 14:21:10 +0800
>>>>>
>>>>>> Found the ethernet network on ASUS X441UAR doesn't come back on resume
>>>>>> from suspend when using MSI-X.  The chip is RTL8106e - version 39.
>>>>>
>>>>> Heiner, please take a look at this.
>>>>>
>>>>> You recently disabled MSI-X on RTL8168g for similar reasons.
>>>>>
>>>>> Now that we've seen two chips like this, maybe there is some other
>>>>> problem afoot.
>>>>>
>>>> Thanks for the hint. I saw it already and just contacted Realtek
>>>> whether they are aware of any MSI-X issues with particular chip
>>>> versions. With the chip versions I have access to MSI-X works fine.
>>>>
>>>> There's also the theoretical option that the issues are caused by
>>>> broken BIOS's. But so far only chip versions have been reported
>>>> which are very similar, at least with regard to version number
>>>> (2x VER_40, 1x VER_39). So they may share some buggy component.
>>>>
>>>> Let's see whether Realtek can provide some hint.
>>>> If more chip versions are reported having problems with MSI-X,
>>>> then we could switch to a whitelist or disable MSI-X in general.
>>>
>>> It could be that we need to reprogram some register(s) on resume,
>>> which normally might not be needed, and that is what is causing the
>>> problem with some chips.
>>>
>> Indeed. That's what I'm checking with Realtek.
>> In the register list in the r8169 driver there's one entry which
>> seems to indicate that there are MSI-X specific settings.
>> However this register isn't used, and the r8168 vendor driver
>> uses only MSI. And there are no public datasheets.
> 
> Do we have any information about these chip versions in other systems?
> Or other devices using MSI-X in the same ASUS system?  It seems
> possible that there's some PCI core or suspend/resume issue with MSI-X
> and this patch just avoids it without fixing the root cause.
> 
> It might be useful to have a kernel.org bugzilla with the complete
> dmesg, "sudo lspci -vv" output, and /proc/interrupts contents archived
> for future reference.

The one system I have with a Realtek chip seems happy enough with MSI-X,
but it never gets suspended. There is comment in the patch that I don't
quite get:

> It is the IRQ 127 - PCI-MSI used by enp2s0.  However, lspci lists MSI is
> disabled and MSI-X is enabled which conflicts to the interrupt table.

What do you mean by "conflicts"? With what? Another question is whether
you've loaded any firmware (some versions of the Realtek HW seem to require
it).

For the posterity, some data from my own system, which I don't know if it
has any relevance to the problem at hand.

Thanks,

	M.

[    2.624963] r8169 0000:02:00.0 eth0: RTL8168g/8111g, 5a:fe:ad:ce:11:00, XID 4c000800, IRQ 26
[    2.633398] r8169 0000:02:00.0 eth0: jumbo features [frames: 9200 bytes, tx checksumming: ko]

 26:         50     997005          0          0       MSI 1048576 Edge      enp2s0

02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
	Subsystem: Realtek Semiconductor Co., Ltd. RTL8111/8168 PCI Express Gigabit Ethernet controller
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 25
	Region 0: I/O ports at 1000 [size=256]
	Region 2: Memory at 100004000 (64-bit, prefetchable) [size=4K]
	Region 4: Memory at 100000000 (64-bit, prefetchable) [size=16K]
	Capabilities: [40] Power Management version 3
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [70] Express (v2) Endpoint, MSI 01
		DevCap:	MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 128 bytes, MaxReadReq 4096 bytes
		DevSta:	CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend-
		LnkCap:	Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s unlimited, L1 <64us
			ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, OBFF Via message/WAKE#
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
		LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
		Vector table: BAR=4 offset=00000000
		PBA: BAR=4 offset=00000800
	Capabilities: [d0] Vital Product Data
pcilib: sysfs_read_vpd: read failed: Input/output error
		Not readable
	Capabilities: [100 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr+ BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
	Capabilities: [140 v1] Virtual Channel
		Caps:	LPEVC=0 RefClk=100ns PATEntryBits=1
		Arb:	Fixed- WRR32- WRR64- WRR128-
		Ctrl:	ArbSelect=Fixed
		Status:	InProgress-
		VC0:	Caps:	PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
			Arb:	Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
			Ctrl:	Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
			Status:	NegoPending- InProgress-
	Capabilities: [160 v1] Device Serial Number 00-00-00-00-00-00-00-00
	Capabilities: [170 v1] Latency Tolerance Reporting
		Max snoop latency: 0ns
		Max no snoop latency: 0ns
	Kernel driver in use: r8169


-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] r8169: don't use MSI-X on RTL8106e
  2018-08-20 20:46           ` Heiner Kallweit
@ 2018-08-21 19:31             ` David Miller
  2018-08-21 20:48               ` Heiner Kallweit
  0 siblings, 1 reply; 20+ messages in thread
From: David Miller @ 2018-08-21 19:31 UTC (permalink / raw)
  To: hkallweit1
  Cc: helgaas, jian-hong, nic_swsd, netdev, linux-kernel, linux,
	linux-pci, marc.zyngier, tglx, hch

From: Heiner Kallweit <hkallweit1@gmail.com>
Date: Mon, 20 Aug 2018 22:46:48 +0200

> I'm in contact with Realtek and according to them few chip versions
> seem to clear MSI-X table entries on resume from suspend. Checking
> with them how this could be fixed / worked around.
> Worst case we may have to disable MSI-X in general.

I worry that if the chip does this, and somehow MSI-X is enabled and
an interrupt is generated, the chip will write to the cleared out
MSI-X address.  This will either write garbage into memory or cause
a bus error and require PCI error recovery.

It also looks like your test patch doesn't fix things for people who
have tested it.

Hmmm...

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] r8169: don't use MSI-X on RTL8106e
  2018-08-21 19:31             ` David Miller
@ 2018-08-21 20:48               ` Heiner Kallweit
  2018-08-22 11:44                 ` Thomas Gleixner
  0 siblings, 1 reply; 20+ messages in thread
From: Heiner Kallweit @ 2018-08-21 20:48 UTC (permalink / raw)
  To: David Miller
  Cc: helgaas, jian-hong, nic_swsd, netdev, linux-kernel, linux,
	linux-pci, marc.zyngier, tglx, hch

On 21.08.2018 21:31, David Miller wrote:
> From: Heiner Kallweit <hkallweit1@gmail.com>
> Date: Mon, 20 Aug 2018 22:46:48 +0200
> 
>> I'm in contact with Realtek and according to them few chip versions
>> seem to clear MSI-X table entries on resume from suspend. Checking
>> with them how this could be fixed / worked around.
>> Worst case we may have to disable MSI-X in general.
> 
> I worry that if the chip does this, and somehow MSI-X is enabled and
> an interrupt is generated, the chip will write to the cleared out
> MSI-X address.  This will either write garbage into memory or cause
> a bus error and require PCI error recovery.
> 
> It also looks like your test patch doesn't fix things for people who
> have tested it.
> 
The test patch was based on the first info from Realtek which made me
think that the base address of the MSI-X table is cleared, what
obviously is not the case.

After some further tests it seems that the solution isn't as simple
as storing the MSI-X table entries on suspend and restore them on
resume. On my system (where MSI-X works fine) MSI-X table entries
on resume are partially different from the ones on suspend.

Unfortunately I don't have affected test hardware, currently I'm
waiting for further feedback from Realtek.

> Hmmm...
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] r8169: don't use MSI-X on RTL8106e
  2018-08-21  8:28           ` Marc Zyngier
@ 2018-08-21 20:54             ` Heiner Kallweit
  0 siblings, 0 replies; 20+ messages in thread
From: Heiner Kallweit @ 2018-08-21 20:54 UTC (permalink / raw)
  To: Marc Zyngier, Bjorn Helgaas, jian-hong
  Cc: David Miller, nic_swsd, netdev, linux-kernel, linux, linux-pci,
	Thomas Gleixner, Christoph Hellwig

On 21.08.2018 10:28, Marc Zyngier wrote:
> On 20/08/18 19:44, Bjorn Helgaas wrote:
>> [+cc Marc, Thomas, Christoph, linux-pci)
>> (beginning of thread at [1])
>>
>> On Thu, Aug 16, 2018 at 09:50:48PM +0200, Heiner Kallweit wrote:
>>> On 16.08.2018 21:39, David Miller wrote:
>>>> From: Heiner Kallweit <hkallweit1@gmail.com>
>>>> Date: Thu, 16 Aug 2018 21:37:31 +0200
>>>>
>>>>> On 16.08.2018 21:21, David Miller wrote:
>>>>>> From: <jian-hong@endlessm.com>
>>>>>> Date: Wed, 15 Aug 2018 14:21:10 +0800
>>>>>>
>>>>>>> Found the ethernet network on ASUS X441UAR doesn't come back on resume
>>>>>>> from suspend when using MSI-X.  The chip is RTL8106e - version 39.
>>>>>>
>>>>>> Heiner, please take a look at this.
>>>>>>
>>>>>> You recently disabled MSI-X on RTL8168g for similar reasons.
>>>>>>
>>>>>> Now that we've seen two chips like this, maybe there is some other
>>>>>> problem afoot.
>>>>>>
>>>>> Thanks for the hint. I saw it already and just contacted Realtek
>>>>> whether they are aware of any MSI-X issues with particular chip
>>>>> versions. With the chip versions I have access to MSI-X works fine.
>>>>>
>>>>> There's also the theoretical option that the issues are caused by
>>>>> broken BIOS's. But so far only chip versions have been reported
>>>>> which are very similar, at least with regard to version number
>>>>> (2x VER_40, 1x VER_39). So they may share some buggy component.
>>>>>
>>>>> Let's see whether Realtek can provide some hint.
>>>>> If more chip versions are reported having problems with MSI-X,
>>>>> then we could switch to a whitelist or disable MSI-X in general.
>>>>
>>>> It could be that we need to reprogram some register(s) on resume,
>>>> which normally might not be needed, and that is what is causing the
>>>> problem with some chips.
>>>>
>>> Indeed. That's what I'm checking with Realtek.
>>> In the register list in the r8169 driver there's one entry which
>>> seems to indicate that there are MSI-X specific settings.
>>> However this register isn't used, and the r8168 vendor driver
>>> uses only MSI. And there are no public datasheets.
>>
>> Do we have any information about these chip versions in other systems?
>> Or other devices using MSI-X in the same ASUS system?  It seems
>> possible that there's some PCI core or suspend/resume issue with MSI-X
>> and this patch just avoids it without fixing the root cause.
>>
>> It might be useful to have a kernel.org bugzilla with the complete
>> dmesg, "sudo lspci -vv" output, and /proc/interrupts contents archived
>> for future reference.
> 
> The one system I have with a Realtek chip seems happy enough with MSI-X,
> but it never gets suspended.

Other owners of affected chip versiosn made the same experience, MSI-X
works fine until resume from suspend.

> There is comment in the patch that I don't quite get:
> 
>> It is the IRQ 127 - PCI-MSI used by enp2s0.  However, lspci lists MSI is
>> disabled and MSI-X is enabled which conflicts to the interrupt table.
> 
> What do you mean by "conflicts"? With what? Another question is whether
> you've loaded any firmware (some versions of the Realtek HW seem to require
> it).
> 
These "conflicts" were a misunderstanding which was clarified with the
reporter. "PCI-MSI" as irq chip name in /proc/interrupts output was
interpreted in a way that a MSI irq is used, not a MSI-X irq.

The firmware is for the PHY only, that's at least my experience on
the chip versions I have for testing.

> For the posterity, some data from my own system, which I don't know if it
> has any relevance to the problem at hand.
> 
> Thanks,
> 
> 	M.
> 
> [    2.624963] r8169 0000:02:00.0 eth0: RTL8168g/8111g, 5a:fe:ad:ce:11:00, XID 4c000800, IRQ 26
> [    2.633398] r8169 0000:02:00.0 eth0: jumbo features [frames: 9200 bytes, tx checksumming: ko]
> 
>  26:         50     997005          0          0       MSI 1048576 Edge      enp2s0
> 
> 02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
> 	Subsystem: Realtek Semiconductor Co., Ltd. RTL8111/8168 PCI Express Gigabit Ethernet controller
> 	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
> 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> 	Latency: 0, Cache Line Size: 64 bytes
> 	Interrupt: pin A routed to IRQ 25
> 	Region 0: I/O ports at 1000 [size=256]
> 	Region 2: Memory at 100004000 (64-bit, prefetchable) [size=4K]
> 	Region 4: Memory at 100000000 (64-bit, prefetchable) [size=16K]
> 	Capabilities: [40] Power Management version 3
> 		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
> 		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> 	Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
> 		Address: 0000000000000000  Data: 0000
> 	Capabilities: [70] Express (v2) Endpoint, MSI 01
> 		DevCap:	MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
> 			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W
> 		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
> 			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
> 			MaxPayload 128 bytes, MaxReadReq 4096 bytes
> 		DevSta:	CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend-
> 		LnkCap:	Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s unlimited, L1 <64us
> 			ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
> 		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
> 			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> 		LnkSta:	Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> 		DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, OBFF Via message/WAKE#
> 		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
> 		LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
> 			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
> 			 Compliance De-emphasis: -6dB
> 		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
> 			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
> 	Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
> 		Vector table: BAR=4 offset=00000000
> 		PBA: BAR=4 offset=00000800
> 	Capabilities: [d0] Vital Product Data
> pcilib: sysfs_read_vpd: read failed: Input/output error
> 		Not readable
> 	Capabilities: [100 v1] Advanced Error Reporting
> 		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> 		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> 		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> 		CESta:	RxErr+ BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
> 		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
> 		AERCap:	First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
> 	Capabilities: [140 v1] Virtual Channel
> 		Caps:	LPEVC=0 RefClk=100ns PATEntryBits=1
> 		Arb:	Fixed- WRR32- WRR64- WRR128-
> 		Ctrl:	ArbSelect=Fixed
> 		Status:	InProgress-
> 		VC0:	Caps:	PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
> 			Arb:	Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
> 			Ctrl:	Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
> 			Status:	NegoPending- InProgress-
> 	Capabilities: [160 v1] Device Serial Number 00-00-00-00-00-00-00-00
> 	Capabilities: [170 v1] Latency Tolerance Reporting
> 		Max snoop latency: 0ns
> 		Max no snoop latency: 0ns
> 	Kernel driver in use: r8169
> 
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] r8169: don't use MSI-X on RTL8106e
  2018-08-21 20:48               ` Heiner Kallweit
@ 2018-08-22 11:44                 ` Thomas Gleixner
  2018-08-22 19:49                   ` Heiner Kallweit
  0 siblings, 1 reply; 20+ messages in thread
From: Thomas Gleixner @ 2018-08-22 11:44 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: David Miller, helgaas, jian-hong, nic_swsd, netdev, linux-kernel,
	linux, linux-pci, marc.zyngier, hch

On Tue, 21 Aug 2018, Heiner Kallweit wrote:
> On 21.08.2018 21:31, David Miller wrote:
> > From: Heiner Kallweit <hkallweit1@gmail.com>
> > Date: Mon, 20 Aug 2018 22:46:48 +0200
> > 
> >> I'm in contact with Realtek and according to them few chip versions
> >> seem to clear MSI-X table entries on resume from suspend. Checking
> >> with them how this could be fixed / worked around.
> >> Worst case we may have to disable MSI-X in general.
> > 
> > I worry that if the chip does this, and somehow MSI-X is enabled and
> > an interrupt is generated, the chip will write to the cleared out
> > MSI-X address.  This will either write garbage into memory or cause
> > a bus error and require PCI error recovery.
> > 
> > It also looks like your test patch doesn't fix things for people who
> > have tested it.
> > 
> The test patch was based on the first info from Realtek which made me
> think that the base address of the MSI-X table is cleared, what
> obviously is not the case.
> 
> After some further tests it seems that the solution isn't as simple
> as storing the MSI-X table entries on suspend and restore them on
> resume. On my system (where MSI-X works fine) MSI-X table entries
> on resume are partially different from the ones on suspend.

Which is not a surprise. Please don't try to fiddle with that at the driver
level. The irq and PCI core code are the ones in charge and if you'd
restore at the wrong point then hell breaks lose.

Can you please do the following:

 1) Store the PCI config space at suspend time
 2) Compare the PCI config space at resume time and print the difference

Do that on a working and a non-working version of Realtek NICs.

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] r8169: don't use MSI-X on RTL8106e
  2018-08-22 11:44                 ` Thomas Gleixner
@ 2018-08-22 19:49                   ` Heiner Kallweit
  2018-08-23 10:46                     ` Jian-Hong Pan
  0 siblings, 1 reply; 20+ messages in thread
From: Heiner Kallweit @ 2018-08-22 19:49 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: David Miller, helgaas, jian-hong, nic_swsd, netdev, linux-kernel,
	linux, linux-pci, marc.zyngier, hch

On 22.08.2018 13:44, Thomas Gleixner wrote:
> On Tue, 21 Aug 2018, Heiner Kallweit wrote:
>> On 21.08.2018 21:31, David Miller wrote:
>>> From: Heiner Kallweit <hkallweit1@gmail.com>
>>> Date: Mon, 20 Aug 2018 22:46:48 +0200
>>>
>>>> I'm in contact with Realtek and according to them few chip versions
>>>> seem to clear MSI-X table entries on resume from suspend. Checking
>>>> with them how this could be fixed / worked around.
>>>> Worst case we may have to disable MSI-X in general.
>>>
>>> I worry that if the chip does this, and somehow MSI-X is enabled and
>>> an interrupt is generated, the chip will write to the cleared out
>>> MSI-X address.  This will either write garbage into memory or cause
>>> a bus error and require PCI error recovery.
>>>
>>> It also looks like your test patch doesn't fix things for people who
>>> have tested it.
>>>
>> The test patch was based on the first info from Realtek which made me
>> think that the base address of the MSI-X table is cleared, what
>> obviously is not the case.
>>
>> After some further tests it seems that the solution isn't as simple
>> as storing the MSI-X table entries on suspend and restore them on
>> resume. On my system (where MSI-X works fine) MSI-X table entries
>> on resume are partially different from the ones on suspend.
> 
> Which is not a surprise. Please don't try to fiddle with that at the driver
> level. The irq and PCI core code are the ones in charge and if you'd
> restore at the wrong point then hell breaks lose.
> 
Instead of spending a lot of effort on a workaround which may not be
acceptable, it may be better to fall back to MSI on all affected chip
versions. For two chip versions which were reported to have this issues
we're doing this already. I asked Realtek whether they have an overview
which chip versions are affected, let's see ..

The Realtek chips provide an alternative, register-based way to access
the MSI-X table, and their Windows driver seems to use it. See here:
https://patchwork.kernel.org/patch/4149171/

But as we handle all MSI-X basics in the PCI core, this isn't an option.


> Can you please do the following:
> 
>  1) Store the PCI config space at suspend time
>  2) Compare the PCI config space at resume time and print the difference
> 
> Do that on a working and a non-working version of Realtek NICs.
> 
> Thanks,
> 
> 	tglx
> 
> 
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] r8169: don't use MSI-X on RTL8106e
  2018-08-22 19:49                   ` Heiner Kallweit
@ 2018-08-23 10:46                     ` Jian-Hong Pan
  2018-08-23 13:38                       ` Bjorn Helgaas
  0 siblings, 1 reply; 20+ messages in thread
From: Jian-Hong Pan @ 2018-08-23 10:46 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Thomas Gleixner, David Miller, helgaas,
	Realtek linux nic maintainers, netdev, Linux Kernel,
	Linux Upstreaming Team, linux-pci, marc.zyngier, hch

2018-08-23 3:49 GMT+08:00 Heiner Kallweit <hkallweit1@gmail.com>:
> On 22.08.2018 13:44, Thomas Gleixner wrote:
>> On Tue, 21 Aug 2018, Heiner Kallweit wrote:
>>> On 21.08.2018 21:31, David Miller wrote:
>>>> From: Heiner Kallweit <hkallweit1@gmail.com>
>>>> Date: Mon, 20 Aug 2018 22:46:48 +0200
>>>>
>>>>> I'm in contact with Realtek and according to them few chip versions
>>>>> seem to clear MSI-X table entries on resume from suspend. Checking
>>>>> with them how this could be fixed / worked around.
>>>>> Worst case we may have to disable MSI-X in general.
>>>>
>>>> I worry that if the chip does this, and somehow MSI-X is enabled and
>>>> an interrupt is generated, the chip will write to the cleared out
>>>> MSI-X address.  This will either write garbage into memory or cause
>>>> a bus error and require PCI error recovery.
>>>>
>>>> It also looks like your test patch doesn't fix things for people who
>>>> have tested it.
>>>>
>>> The test patch was based on the first info from Realtek which made me
>>> think that the base address of the MSI-X table is cleared, what
>>> obviously is not the case.
>>>
>>> After some further tests it seems that the solution isn't as simple
>>> as storing the MSI-X table entries on suspend and restore them on
>>> resume. On my system (where MSI-X works fine) MSI-X table entries
>>> on resume are partially different from the ones on suspend.
>>
>> Which is not a surprise. Please don't try to fiddle with that at the driver
>> level. The irq and PCI core code are the ones in charge and if you'd
>> restore at the wrong point then hell breaks lose.
>>
> Instead of spending a lot of effort on a workaround which may not be
> acceptable, it may be better to fall back to MSI on all affected chip
> versions. For two chip versions which were reported to have this issues
> we're doing this already. I asked Realtek whether they have an overview
> which chip versions are affected, let's see ..
>
> The Realtek chips provide an alternative, register-based way to access
> the MSI-X table, and their Windows driver seems to use it. See here:
> https://patchwork.kernel.org/patch/4149171/
>
> But as we handle all MSI-X basics in the PCI core, this isn't an option.
>
>
>> Can you please do the following:

Tested on ASUS X441AUR equipped with RTL8106e.
This is the laptop whose ethernet does not come back after resume, if
it does not fallback to MSI.

Here is the full dmesg:
https://gist.github.com/starnight/e65a97c9bf2d558926895ab76974687e

>>  1) Store the PCI config space at suspend time

Before suspend:

dev@endless:~$ sudo lspci -xnnvvs 02:00.0
02:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd.
RTL8101/2/6E PCI Express Fast/Gigabit Ethernet controller [10ec:8136]
(rev 07)
Subsystem: ASUSTeK Computer Inc. RTL810xE PCI Express Fast Ethernet
controller [1043:200f]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 16
Region 0: I/O ports at e000 [size=256]
Region 2: Memory at ef100000 (64-bit, non-prefetchable) [size=4K]
Region 4: Memory at e0000000 (64-bit, prefetchable) [size=16K]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
Address: 0000000000000000  Data: 0000
Capabilities: [70] Express (v2) Endpoint, MSI 01
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 10.000W
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency
L0s unlimited, L1 <64us
ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive-
BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, OBFF Via
message/WAKE#
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled
LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-,
EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
Vector table: BAR=4 offset=00000000
PBA: BAR=4 offset=00000800
Capabilities: [d0] Vital Product Data
pcilib: sysfs_read_vpd: read failed: Input/output error
Not readable
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF-
MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF-
MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+
MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
Capabilities: [140 v1] Virtual Channel
Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
Arb: Fixed- WRR32- WRR64- WRR128-
Ctrl: ArbSelect=Fixed
Status: InProgress-
VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
Status: NegoPending- InProgress-
Capabilities: [160 v1] Device Serial Number 01-00-00-00-36-4c-e0-00
Capabilities: [170 v1] Latency Tolerance Reporting
Max snoop latency: 3145728ns
Max no snoop latency: 3145728ns
Kernel driver in use: r8169
Kernel modules: r8169
00: ec 10 36 81 07 04 10 00 07 00 00 02 10 00 00 00
10: 01 e0 00 00 00 00 00 00 04 00 10 ef 00 00 00 00
20: 0c 00 00 e0 00 00 00 00 00 00 00 00 43 10 0f 20
30: 00 00 00 00 40 00 00 00 00 00 00 00 ff 01 00 00

>>  2) Compare the PCI config space at resume time and print the difference

After resume:

dev@endless:~$ sudo lspci -xnnvvs 02:00.0
02:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd.
RTL8101/2/6E PCI Express Fast/Gigabit Ethernet controller [10ec:8136]
(rev 07)
Subsystem: ASUSTeK Computer Inc. RTL810xE PCI Express Fast Ethernet
controller [1043:200f]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 16
Region 0: I/O ports at e000 [size=256]
Region 2: Memory at ef100000 (64-bit, non-prefetchable) [size=4K]
Region 4: Memory at e0000000 (64-bit, prefetchable) [size=16K]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
Address: 0000000000000000  Data: 0000
Capabilities: [70] Express (v2) Endpoint, MSI 01
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 10.000W
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency
L0s unlimited, L1 <64us
ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive-
BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, OBFF Via
message/WAKE#
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled
LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-,
EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
Vector table: BAR=4 offset=00000000
PBA: BAR=4 offset=00000800
Capabilities: [d0] Vital Product Data
pcilib: sysfs_read_vpd: read failed: Input/output error
Not readable
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF-
MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF-
MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+
MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
Capabilities: [140 v1] Virtual Channel
Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
Arb: Fixed- WRR32- WRR64- WRR128-
Ctrl: ArbSelect=Fixed
Status: InProgress-
VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
Status: NegoPending- InProgress-
Capabilities: [160 v1] Device Serial Number 01-00-00-00-36-4c-e0-00
Capabilities: [170 v1] Latency Tolerance Reporting
Max snoop latency: 3145728ns
Max no snoop latency: 3145728ns
Kernel driver in use: r8169
Kernel modules: r8169
00: ec 10 36 81 07 04 10 00 07 00 00 02 10 00 00 00
10: 01 e0 00 00 00 00 00 00 04 00 10 ef 00 00 00 00
20: 0c 00 00 e0 00 00 00 00 00 00 00 00 43 10 0f 20
30: 00 00 00 00 40 00 00 00 00 00 00 00 ff 01 00 00

After comparing, there is no difference between before suspend and after resume.

Regards,
Jian-Hong Pan

>> Do that on a working and a non-working version of Realtek NICs.
>>
>> Thanks,
>>
>>       tglx
>>
>>
>>
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] r8169: don't use MSI-X on RTL8106e
  2018-08-23 10:46                     ` Jian-Hong Pan
@ 2018-08-23 13:38                       ` Bjorn Helgaas
  2018-08-27 10:46                         ` Jian-Hong Pan
  0 siblings, 1 reply; 20+ messages in thread
From: Bjorn Helgaas @ 2018-08-23 13:38 UTC (permalink / raw)
  To: Jian-Hong Pan
  Cc: Heiner Kallweit, Thomas Gleixner, David Miller,
	Realtek linux nic maintainers, netdev, Linux Kernel,
	Linux Upstreaming Team, linux-pci, marc.zyngier, hch

On Thu, Aug 23, 2018 at 06:46:28PM +0800, Jian-Hong Pan wrote:
> > On 22.08.2018 13:44, Thomas Gleixner wrote:
> >> Can you please do the following:
> 
> Tested on ASUS X441AUR equipped with RTL8106e.
> This is the laptop whose ethernet does not come back after resume, if
> it does not fallback to MSI.
> ...

> dev@endless:~$ sudo lspci -xnnvvs 02:00.0
> ...
> 00: ec 10 36 81 07 04 10 00 07 00 00 02 10 00 00 00
> 10: 01 e0 00 00 00 00 00 00 04 00 10 ef 00 00 00 00
> 20: 0c 00 00 e0 00 00 00 00 00 00 00 00 43 10 0f 20
> 30: 00 00 00 00 40 00 00 00 00 00 00 00 ff 01 00 00
> 
> After comparing, there is no difference between before suspend and
> after resume.

It'd be better to compare the hex data directly and ignore the lspci
decoding, since lspci doesn't decode everything.  You only dumped the
first 0x40 bytes of config space, and all capabilities, including the
MSI and MSI-X capabilities, are past that:

> Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
> Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
> Vector table: BAR=4 offset=00000000
> PBA: BAR=4 offset=00000800

In addition, some of the MSI-X information for this device is in BAR
4.  "lspci -xxx" will dump all config space, and you can use a tool
like http://cmp.felk.cvut.cz/~pisa/linux/rdwrmem.c or
https://github.com/billfarrow/pcimem to dump the BAR contents.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] r8169: don't use MSI-X on RTL8106e
  2018-08-23 13:38                       ` Bjorn Helgaas
@ 2018-08-27 10:46                         ` Jian-Hong Pan
  0 siblings, 0 replies; 20+ messages in thread
From: Jian-Hong Pan @ 2018-08-27 10:46 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Heiner Kallweit, Thomas Gleixner, David Miller,
	Realtek linux nic maintainers, netdev, Linux Kernel,
	Linux Upstreaming Team, linux-pci, marc.zyngier, hch

2018-08-23 21:38 GMT+08:00 Bjorn Helgaas <helgaas@kernel.org>:
> On Thu, Aug 23, 2018 at 06:46:28PM +0800, Jian-Hong Pan wrote:
>> > On 22.08.2018 13:44, Thomas Gleixner wrote:
>> >> Can you please do the following:
>>
>> Tested on ASUS X441AUR equipped with RTL8106e.
>> This is the laptop whose ethernet does not come back after resume, if
>> it does not fallback to MSI.
>> ...
>
>> dev@endless:~$ sudo lspci -xnnvvs 02:00.0
>> ...
>> 00: ec 10 36 81 07 04 10 00 07 00 00 02 10 00 00 00
>> 10: 01 e0 00 00 00 00 00 00 04 00 10 ef 00 00 00 00
>> 20: 0c 00 00 e0 00 00 00 00 00 00 00 00 43 10 0f 20
>> 30: 00 00 00 00 40 00 00 00 00 00 00 00 ff 01 00 00
>>
>> After comparing, there is no difference between before suspend and
>> after resume.
>
> It'd be better to compare the hex data directly and ignore the lspci
> decoding, since lspci doesn't decode everything.  You only dumped the
> first 0x40 bytes of config space, and all capabilities, including the
> MSI and MSI-X capabilities, are past that:
>
>> Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
>> Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
>> Vector table: BAR=4 offset=00000000
>> PBA: BAR=4 offset=00000800
>
> In addition, some of the MSI-X information for this device is in BAR
> 4.  "lspci -xxx" will dump all config space, and you can use a tool
> like http://cmp.felk.cvut.cz/~pisa/linux/rdwrmem.c or
> https://github.com/billfarrow/pcimem to dump the BAR contents.

Tested on ASUS X441AUR equipped with RTL8106e without fallbacking to MSI again.
Use lspci and https://github.com/billfarrow/pcimem

Here is the status before suspend:

dev@endless:~$ sudo lspci -xxxs 02:00.0
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
RTL8101/2/6E PCI Express Fast/Gigabit Ethernet controller (rev 07)
00: ec 10 36 81 07 04 10 00 07 00 00 02 10 00 00 00
10: 01 e0 00 00 00 00 00 00 04 00 10 ef 00 00 00 00
20: 0c 00 00 e0 00 00 00 00 00 00 00 00 43 10 0f 20
30: 00 00 00 00 40 00 00 00 00 00 00 00 ff 01 00 00
40: 01 50 c3 ff 08 00 00 00 00 00 00 00 00 00 00 00
50: 05 70 80 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 10 b0 02 02 c0 8d 90 05 10 20 10 00 11 7c 47 00
80: 42 01 11 10 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 1f 08 0c 00 00 04 00 00 02 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 11 d0 03 80 04 00 00 00 04 08 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

dev@endless:~$ sudo ~/pcimem/pcimem
/sys/devices/pci0000\:00/0000\:00\:1c.4/0000\:02\:00.0/resource4 0
b*16384
[sudo] password for dev:
/sys/devices/pci0000:00/0000:00:1c.4/0000:02:00.0/resource4 opened.
Target offset is 0x0, page size is 4096
mmap(0, 16384, 0x3, 0x1, 3, 0x0)
PCI Memory mapped to address 0x7f15186d1000.
0x0000: 0x38
0x0001: 0x03
0x0002: 0xE0
0x0003: 0xFE
0x0004: 0x00
...
0x0010: 0x41
0x0011: 0x72
.
.
.
0x003C: 0x01
0x003D: 0x00
...
0x1000: 0x38
0x1001: 0x03
.
.
.

After resume:

dev@endless:~$ sudo lspci -xxxs 02:00.0
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
RTL8101/2/6E PCI Express Fast/Gigabit Ethernet controller (rev 07)
00: ec 10 36 81 07 04 10 00 07 00 00 02 10 00 00 00
10: 01 e0 00 00 00 00 00 00 04 00 10 ef 00 00 00 00
20: 0c 00 00 e0 00 00 00 00 00 00 00 00 43 10 0f 20
30: 00 00 00 00 40 00 00 00 00 00 00 00 ff 01 00 00
40: 01 50 c3 ff 08 00 00 00 00 00 00 00 00 00 00 00
50: 05 70 80 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 10 b0 02 02 c0 8d 90 05 10 20 10 00 11 7c 47 00
80: 42 01 11 10 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 1f 08 0c 00 00 04 00 00 02 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 11 d0 03 80 04 00 00 00 04 08 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

dev@endless:~$ sudo ~/pcimem/pcimem
/sys/devices/pci0000\:00/0000\:00\:1c.4/0000\:02\:00.0/resource4 0
b*16384
/sys/devices/pci0000:00/0000:00:1c.4/0000:02:00.0/resource4 opened.
Target offset is 0x0, page size is 4096
mmap(0, 16384, 0x3, 0x1, 3, 0x0)
PCI Memory mapped to address 0x7f8d68dd5000.
0x0000: 0xFF
...


The config is the same, but values in BAR=4 is weird after resume.
They all become 0xFF.

Regards,
Jian-Hong Pan

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2018-08-27 10:47 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-15  6:21 [PATCH] r8169: don't use MSI-X on RTL8106e jian-hong
2018-08-16 19:21 ` David Miller
2018-08-16 19:37   ` Heiner Kallweit
2018-08-16 19:39     ` David Miller
2018-08-16 19:50       ` Heiner Kallweit
2018-08-20 18:44         ` Bjorn Helgaas
2018-08-20 20:46           ` Heiner Kallweit
2018-08-21 19:31             ` David Miller
2018-08-21 20:48               ` Heiner Kallweit
2018-08-22 11:44                 ` Thomas Gleixner
2018-08-22 19:49                   ` Heiner Kallweit
2018-08-23 10:46                     ` Jian-Hong Pan
2018-08-23 13:38                       ` Bjorn Helgaas
2018-08-27 10:46                         ` Jian-Hong Pan
2018-08-21  8:28           ` Marc Zyngier
2018-08-21 20:54             ` Heiner Kallweit
2018-08-20 20:40         ` Florian Fainelli
2018-08-20 20:56           ` Heiner Kallweit
2018-08-17  5:07 ` [PATCH v2 net] " Jian-Hong Pan
2018-08-19 18:01   ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).