All of lore.kernel.org
 help / color / mirror / Atom feed
* RTL8402 stops working after hibernate/resume
@ 2020-07-15  8:28 Petr Tesarik
  2020-07-15 15:22 ` Heiner Kallweit
  2020-07-15 15:25 ` Heiner Kallweit
  0 siblings, 2 replies; 34+ messages in thread
From: Petr Tesarik @ 2020-07-15  8:28 UTC (permalink / raw)
  To: Realtek linux nic maintainers, Heiner Kallweit; +Cc: netdev

[-- Attachment #1: Type: text/plain, Size: 814 bytes --]

Hi all,

I've encountered some issues on an Asus laptop. The RTL8402 receive
queue behaves strangely after suspend to RAM and resume - many incoming
packets are truncated, but not all and not always to the same length
(most commonly 60 bytes, but I've also seen 150 bytes and other
lengths).

Reloading the driver can fix the problem, so I believe we must be
missing some initialization on resume. I've already done some
debugging, and the interface is not running when rtl8169_resume() is
called, so __rtl8169_resume() is skipped, which means that almost
nothing is done on resume.

Some more information can be found in this openSUSE bug report:

https://bugzilla.opensuse.org/show_bug.cgi?id=1174098

The laptop is not (yet) in production, so I can do further debugging if
needed.

Petr T

[-- Attachment #2: Digitální podpis OpenPGP --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RTL8402 stops working after hibernate/resume
  2020-07-15  8:28 RTL8402 stops working after hibernate/resume Petr Tesarik
@ 2020-07-15 15:22 ` Heiner Kallweit
  2020-07-16  8:58   ` Petr Tesarik
  2020-07-15 15:25 ` Heiner Kallweit
  1 sibling, 1 reply; 34+ messages in thread
From: Heiner Kallweit @ 2020-07-15 15:22 UTC (permalink / raw)
  To: Petr Tesarik, Realtek linux nic maintainers; +Cc: netdev

On 15.07.2020 10:28, Petr Tesarik wrote:
> Hi all,
> 
> I've encountered some issues on an Asus laptop. The RTL8402 receive
> queue behaves strangely after suspend to RAM and resume - many incoming
> packets are truncated, but not all and not always to the same length
> (most commonly 60 bytes, but I've also seen 150 bytes and other
> lengths).
> 
> Reloading the driver can fix the problem, so I believe we must be
> missing some initialization on resume. I've already done some
> debugging, and the interface is not running when rtl8169_resume() is
> called, so __rtl8169_resume() is skipped, which means that almost
> nothing is done on resume.
> 
The dmesg log part in the opensuse bug report indicates that a userspace
tool (e.g. NetworkManager) brings down the interface on suspend.
On resume the interface is brought up again, and PHY driver is loaded.
Therefore it's ok that rtl8169_resume() is a no-op.

The bug report mentions that the link was down before suspending.
Does the issue also happen if the link is up when suspending?

Interesting would also be a test w/o a network manager.
Means the interface stays up during suspend/resume cycle.

Unfortunately it's not known whether it's a regression, and I have no
test hw with this chip version.

Also you could test whether the same happens with the r8101 vendor driver.

> Some more information can be found in this openSUSE bug report:
> 
> https://bugzilla.opensuse.org/show_bug.cgi?id=1174098
> 
> The laptop is not (yet) in production, so I can do further debugging if
> needed.
> 
> Petr T
> 
Heiner

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RTL8402 stops working after hibernate/resume
  2020-07-15  8:28 RTL8402 stops working after hibernate/resume Petr Tesarik
  2020-07-15 15:22 ` Heiner Kallweit
@ 2020-07-15 15:25 ` Heiner Kallweit
  1 sibling, 0 replies; 34+ messages in thread
From: Heiner Kallweit @ 2020-07-15 15:25 UTC (permalink / raw)
  To: Petr Tesarik, Realtek linux nic maintainers; +Cc: netdev

On 15.07.2020 10:28, Petr Tesarik wrote:
> Hi all,
> 
> I've encountered some issues on an Asus laptop. The RTL8402 receive
> queue behaves strangely after suspend to RAM and resume - many incoming
> packets are truncated, but not all and not always to the same length
> (most commonly 60 bytes, but I've also seen 150 bytes and other
> lengths).
> 
> Reloading the driver can fix the problem, so I believe we must be
> missing some initialization on resume. I've already done some
> debugging, and the interface is not running when rtl8169_resume() is
> called, so __rtl8169_resume() is skipped, which means that almost
> nothing is done on resume.
> 
> Some more information can be found in this openSUSE bug report:
> 
> https://bugzilla.opensuse.org/show_bug.cgi?id=1174098
> 
> The laptop is not (yet) in production, so I can do further debugging if
> needed.
> 
> Petr T
> 
One additional question: Do you have the firmware in place?
Check "ethtool -i <if>" for a firmware version.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RTL8402 stops working after hibernate/resume
  2020-07-15 15:22 ` Heiner Kallweit
@ 2020-07-16  8:58   ` Petr Tesarik
  2020-07-18 12:07     ` Heiner Kallweit
  0 siblings, 1 reply; 34+ messages in thread
From: Petr Tesarik @ 2020-07-16  8:58 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Realtek linux nic maintainers, netdev

Hi Heiner,

first, thank you for looking into this!

On Wed, 15 Jul 2020 17:22:35 +0200
Heiner Kallweit <hkallweit1@gmail.com> wrote:

> On 15.07.2020 10:28, Petr Tesarik wrote:
> > Hi all,
> > 
> > I've encountered some issues on an Asus laptop. The RTL8402 receive
> > queue behaves strangely after suspend to RAM and resume - many incoming
> > packets are truncated, but not all and not always to the same length
> > (most commonly 60 bytes, but I've also seen 150 bytes and other
> > lengths).
> > 
> > Reloading the driver can fix the problem, so I believe we must be
> > missing some initialization on resume. I've already done some
> > debugging, and the interface is not running when rtl8169_resume() is
> > called, so __rtl8169_resume() is skipped, which means that almost
> > nothing is done on resume.
> >   
> The dmesg log part in the opensuse bug report indicates that a userspace
> tool (e.g. NetworkManager) brings down the interface on suspend.
> On resume the interface is brought up again, and PHY driver is loaded.
> Therefore it's ok that rtl8169_resume() is a no-op.
> 
> The bug report mentions that the link was down before suspending.
> Does the issue also happen if the link is up when suspending?

I have tried, and it makes no difference.

> Interesting would also be a test w/o a network manager.
> Means the interface stays up during suspend/resume cycle.

I have stopped NetworkManager and configured a static IP address for
the interface. Still the same result.

I have verified that the firmware is loaded, both before suspend and
after resume:

zabulon:~ # ethtool -i eth0 
driver: r8169
version: 5.7.7-1-default
firmware-version: rtl8402-1_0.0.1 10/26/11
expansion-rom-version: 
bus-info: 0000:03:00.2
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: yes
supports-priv-flags: no

> Unfortunately it's not known whether it's a regression, and I have no
> test hw with this chip version.
> 
> Also you could test whether the same happens with the r8101 vendor driver.

I was not aware of this alternative driver... Anyway, I have built
r8101 from git (v1.035.03) for kernel 5.7.7. When loaded, it hangs the
machine hard. I mean like not even SysRq+B works...

Petr T

> > Some more information can be found in this openSUSE bug report:
> > 
> > https://bugzilla.opensuse.org/show_bug.cgi?id=1174098
> > 
> > The laptop is not (yet) in production, so I can do further debugging if
> > needed.
> > 
> > Petr T
> >   
> Heiner


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RTL8402 stops working after hibernate/resume
  2020-07-16  8:58   ` Petr Tesarik
@ 2020-07-18 12:07     ` Heiner Kallweit
  2020-09-03  8:41       ` Petr Tesarik
  0 siblings, 1 reply; 34+ messages in thread
From: Heiner Kallweit @ 2020-07-18 12:07 UTC (permalink / raw)
  To: Petr Tesarik; +Cc: Realtek linux nic maintainers, netdev

On 16.07.2020 10:58, Petr Tesarik wrote:
> Hi Heiner,
> 
> first, thank you for looking into this!
> 
> On Wed, 15 Jul 2020 17:22:35 +0200
> Heiner Kallweit <hkallweit1@gmail.com> wrote:
> 
>> On 15.07.2020 10:28, Petr Tesarik wrote:
>>> Hi all,
>>>
>>> I've encountered some issues on an Asus laptop. The RTL8402 receive
>>> queue behaves strangely after suspend to RAM and resume - many incoming
>>> packets are truncated, but not all and not always to the same length
>>> (most commonly 60 bytes, but I've also seen 150 bytes and other
>>> lengths).
>>>
>>> Reloading the driver can fix the problem, so I believe we must be
>>> missing some initialization on resume. I've already done some
>>> debugging, and the interface is not running when rtl8169_resume() is
>>> called, so __rtl8169_resume() is skipped, which means that almost
>>> nothing is done on resume.
>>>   
>> The dmesg log part in the opensuse bug report indicates that a userspace
>> tool (e.g. NetworkManager) brings down the interface on suspend.
>> On resume the interface is brought up again, and PHY driver is loaded.
>> Therefore it's ok that rtl8169_resume() is a no-op.
>>
>> The bug report mentions that the link was down before suspending.
>> Does the issue also happen if the link is up when suspending?
> 
> I have tried, and it makes no difference.
> 
>> Interesting would also be a test w/o a network manager.
>> Means the interface stays up during suspend/resume cycle.
> 
> I have stopped NetworkManager and configured a static IP address for
> the interface. Still the same result.
> 
> I have verified that the firmware is loaded, both before suspend and
> after resume:
> 
> zabulon:~ # ethtool -i eth0 
> driver: r8169
> version: 5.7.7-1-default
> firmware-version: rtl8402-1_0.0.1 10/26/11
> expansion-rom-version: 
> bus-info: 0000:03:00.2
> supports-statistics: yes
> supports-test: no
> supports-eeprom-access: no
> supports-register-dump: yes
> supports-priv-flags: no
> 
>> Unfortunately it's not known whether it's a regression, and I have no
>> test hw with this chip version.
>>
>> Also you could test whether the same happens with the r8101 vendor driver.
> 
> I was not aware of this alternative driver... Anyway, I have built
> r8101 from git (v1.035.03) for kernel 5.7.7. When loaded, it hangs the
> machine hard. I mean like not even SysRq+B works...
> 
> Petr T
> 
>>> Some more information can be found in this openSUSE bug report:
>>>
>>> https://bugzilla.opensuse.org/show_bug.cgi?id=1174098
>>>
>>> The laptop is not (yet) in production, so I can do further debugging if
>>> needed.
>>>
>>> Petr T
>>>   
>> Heiner
> 

Maybe the following gives us an idea:
Please do "ethtool -d <if>" after boot and after resume from suspend,
and check for differences.







^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RTL8402 stops working after hibernate/resume
  2020-07-18 12:07     ` Heiner Kallweit
@ 2020-09-03  8:41       ` Petr Tesarik
  2020-09-17 14:10         ` Petr Tesarik
  2020-09-23  9:57         ` Heiner Kallweit
  0 siblings, 2 replies; 34+ messages in thread
From: Petr Tesarik @ 2020-09-03  8:41 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Realtek linux nic maintainers, netdev

[-- Attachment #1: Type: text/plain, Size: 1702 bytes --]

Hi Heiner,

this issue was on the back-burner for some time, but I've got some
interesting news now.

On Sat, 18 Jul 2020 14:07:50 +0200
Heiner Kallweit <hkallweit1@gmail.com> wrote:

>[...]
> Maybe the following gives us an idea:
> Please do "ethtool -d <if>" after boot and after resume from suspend,
> and check for differences.

The register dump did not reveal anything of interest - the only
differences were in the physical addresses after a device reopen.

However, knowing that reloading the driver can fix the issue, I copied
the initialization sequence from init_one() to rtl8169_resume() and
gave it a try. That works!

Then I started removing the initialization calls one by one. This
exercise left me with a call to rtl_init_rxcfg(), which simply sets the
RxConfig register. In other words, these is the difference between
5.8.4 and my working version:

--- linux-orig/drivers/net/ethernet/realtek/r8169_main.c	2020-09-02 22:43:09.361951750 +0200
+++ linux/drivers/net/ethernet/realtek/r8169_main.c	2020-09-03 10:36:23.915803703 +0200
@@ -4925,6 +4925,9 @@
 
 	clk_prepare_enable(tp->clk);
 
+	if (tp->mac_version == RTL_GIGA_MAC_VER_37)
+		RTL_W32(tp, RxConfig, RX128_INT_EN | RX_DMA_BURST);
+
 	if (netif_running(tp->dev))
 		__rtl8169_resume(tp);
 
This is quite surprising, at least when the device is managed by
NetworkManager, because then it is closed on wakeup, and the open
method should call rtl_init_rxcfg() anyway. So, it might be a timing
issue, or incorrect order of register writes.

Since I have no idea why the above change fixes my issue, I'm hesitant
to post it as a patch. It might break other people's systems...

Petr T

[-- Attachment #2: Digitální podpis OpenPGP --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RTL8402 stops working after hibernate/resume
  2020-09-03  8:41       ` Petr Tesarik
@ 2020-09-17 14:10         ` Petr Tesarik
  2020-09-23  9:57         ` Heiner Kallweit
  1 sibling, 0 replies; 34+ messages in thread
From: Petr Tesarik @ 2020-09-17 14:10 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Realtek linux nic maintainers, netdev

[-- Attachment #1: Type: text/plain, Size: 1914 bytes --]

Hi Heiner,

any comment on my findings?

On Thu, 3 Sep 2020 10:41:22 +0200
Petr Tesarik <ptesarik@suse.cz> wrote:

> Hi Heiner,
> 
> this issue was on the back-burner for some time, but I've got some
> interesting news now.
> 
> On Sat, 18 Jul 2020 14:07:50 +0200
> Heiner Kallweit <hkallweit1@gmail.com> wrote:
> 
> >[...]
> > Maybe the following gives us an idea:
> > Please do "ethtool -d <if>" after boot and after resume from suspend,
> > and check for differences.  
> 
> The register dump did not reveal anything of interest - the only
> differences were in the physical addresses after a device reopen.
> 
> However, knowing that reloading the driver can fix the issue, I copied
> the initialization sequence from init_one() to rtl8169_resume() and
> gave it a try. That works!
> 
> Then I started removing the initialization calls one by one. This
> exercise left me with a call to rtl_init_rxcfg(), which simply sets the
> RxConfig register. In other words, these is the difference between
> 5.8.4 and my working version:
> 
> --- linux-orig/drivers/net/ethernet/realtek/r8169_main.c	2020-09-02 22:43:09.361951750 +0200
> +++ linux/drivers/net/ethernet/realtek/r8169_main.c	2020-09-03 10:36:23.915803703 +0200
> @@ -4925,6 +4925,9 @@
>  
>  	clk_prepare_enable(tp->clk);
>  
> +	if (tp->mac_version == RTL_GIGA_MAC_VER_37)
> +		RTL_W32(tp, RxConfig, RX128_INT_EN | RX_DMA_BURST);
> +
>  	if (netif_running(tp->dev))
>  		__rtl8169_resume(tp);
>  
> This is quite surprising, at least when the device is managed by
> NetworkManager, because then it is closed on wakeup, and the open
> method should call rtl_init_rxcfg() anyway. So, it might be a timing
> issue, or incorrect order of register writes.
> 
> Since I have no idea why the above change fixes my issue, I'm hesitant
> to post it as a patch. It might break other people's systems...

Petr T

[-- Attachment #2: Digitální podpis OpenPGP --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RTL8402 stops working after hibernate/resume
  2020-09-03  8:41       ` Petr Tesarik
  2020-09-17 14:10         ` Petr Tesarik
@ 2020-09-23  9:57         ` Heiner Kallweit
  2020-09-24 19:14           ` Petr Tesarik
  1 sibling, 1 reply; 34+ messages in thread
From: Heiner Kallweit @ 2020-09-23  9:57 UTC (permalink / raw)
  To: Petr Tesarik; +Cc: Realtek linux nic maintainers, netdev

On 03.09.2020 10:41, Petr Tesarik wrote:
> Hi Heiner,
> 
> this issue was on the back-burner for some time, but I've got some
> interesting news now.
> 
> On Sat, 18 Jul 2020 14:07:50 +0200
> Heiner Kallweit <hkallweit1@gmail.com> wrote:
> 
>> [...]
>> Maybe the following gives us an idea:
>> Please do "ethtool -d <if>" after boot and after resume from suspend,
>> and check for differences.
> 
> The register dump did not reveal anything of interest - the only
> differences were in the physical addresses after a device reopen.
> 
> However, knowing that reloading the driver can fix the issue, I copied
> the initialization sequence from init_one() to rtl8169_resume() and
> gave it a try. That works!
> 
> Then I started removing the initialization calls one by one. This
> exercise left me with a call to rtl_init_rxcfg(), which simply sets the
> RxConfig register. In other words, these is the difference between
> 5.8.4 and my working version:
> 
> --- linux-orig/drivers/net/ethernet/realtek/r8169_main.c	2020-09-02 22:43:09.361951750 +0200
> +++ linux/drivers/net/ethernet/realtek/r8169_main.c	2020-09-03 10:36:23.915803703 +0200
> @@ -4925,6 +4925,9 @@
>  
>  	clk_prepare_enable(tp->clk);
>  
> +	if (tp->mac_version == RTL_GIGA_MAC_VER_37)
> +		RTL_W32(tp, RxConfig, RX128_INT_EN | RX_DMA_BURST);
> +
>  	if (netif_running(tp->dev))
>  		__rtl8169_resume(tp);
>  
> This is quite surprising, at least when the device is managed by
> NetworkManager, because then it is closed on wakeup, and the open
> method should call rtl_init_rxcfg() anyway. So, it might be a timing
> issue, or incorrect order of register writes.
> 
Thanks for the analysis. If you manually bring down and up the
interface, do you see the same issue?
What is the value of RxConfig when entering the resume function?

> Since I have no idea why the above change fixes my issue, I'm hesitant
> to post it as a patch. It might break other people's systems...
> 
> Petr T
> 
Heiner

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RTL8402 stops working after hibernate/resume
  2020-09-23  9:57         ` Heiner Kallweit
@ 2020-09-24 19:14           ` Petr Tesarik
  2020-09-24 19:37             ` Heiner Kallweit
  2020-09-24 20:12             ` Heiner Kallweit
  0 siblings, 2 replies; 34+ messages in thread
From: Petr Tesarik @ 2020-09-24 19:14 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Realtek linux nic maintainers, netdev

[-- Attachment #1: Type: text/plain, Size: 2648 bytes --]

On Wed, 23 Sep 2020 11:57:41 +0200
Heiner Kallweit <hkallweit1@gmail.com> wrote:

> On 03.09.2020 10:41, Petr Tesarik wrote:
> > Hi Heiner,
> > 
> > this issue was on the back-burner for some time, but I've got some
> > interesting news now.
> > 
> > On Sat, 18 Jul 2020 14:07:50 +0200
> > Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >   
> >> [...]
> >> Maybe the following gives us an idea:
> >> Please do "ethtool -d <if>" after boot and after resume from suspend,
> >> and check for differences.  
> > 
> > The register dump did not reveal anything of interest - the only
> > differences were in the physical addresses after a device reopen.
> > 
> > However, knowing that reloading the driver can fix the issue, I copied
> > the initialization sequence from init_one() to rtl8169_resume() and
> > gave it a try. That works!
> > 
> > Then I started removing the initialization calls one by one. This
> > exercise left me with a call to rtl_init_rxcfg(), which simply sets the
> > RxConfig register. In other words, these is the difference between
> > 5.8.4 and my working version:
> > 
> > --- linux-orig/drivers/net/ethernet/realtek/r8169_main.c	2020-09-02 22:43:09.361951750 +0200
> > +++ linux/drivers/net/ethernet/realtek/r8169_main.c	2020-09-03 10:36:23.915803703 +0200
> > @@ -4925,6 +4925,9 @@
> >  
> >  	clk_prepare_enable(tp->clk);
> >  
> > +	if (tp->mac_version == RTL_GIGA_MAC_VER_37)
> > +		RTL_W32(tp, RxConfig, RX128_INT_EN | RX_DMA_BURST);
> > +
> >  	if (netif_running(tp->dev))
> >  		__rtl8169_resume(tp);
> >  
> > This is quite surprising, at least when the device is managed by
> > NetworkManager, because then it is closed on wakeup, and the open
> > method should call rtl_init_rxcfg() anyway. So, it might be a timing
> > issue, or incorrect order of register writes.
> >   
> Thanks for the analysis. If you manually bring down and up the
> interface, do you see the same issue?

I'm not quite sure what you mean, but if the interface is configured
(and NetworkManager is stopped), I can do 'ip link set eth0 down' and
then 'ip link set eth0 up', and the interface is fully functional.

> What is the value of RxConfig when entering the resume function?

I added a dev_info() to rtl8169_resume(). First with NetworkManager
active (i.e. interface down on suspend):

[  525.956675] r8169 0000:03:00.2: RxConfig after resume: 0x0002400f

Then I re-tried with NetworkManager stopped (i.e. interface up on
suspend). Same result:

[  785.413887] r8169 0000:03:00.2: RxConfig after resume: 0x0002400f

I hope that's what you were asking for...

Petr T

[-- Attachment #2: Digitální podpis OpenPGP --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RTL8402 stops working after hibernate/resume
  2020-09-24 19:14           ` Petr Tesarik
@ 2020-09-24 19:37             ` Heiner Kallweit
  2020-09-24 20:12             ` Heiner Kallweit
  1 sibling, 0 replies; 34+ messages in thread
From: Heiner Kallweit @ 2020-09-24 19:37 UTC (permalink / raw)
  To: Petr Tesarik; +Cc: Realtek linux nic maintainers, netdev

On 24.09.2020 21:14, Petr Tesarik wrote:
> On Wed, 23 Sep 2020 11:57:41 +0200
> Heiner Kallweit <hkallweit1@gmail.com> wrote:
> 
>> On 03.09.2020 10:41, Petr Tesarik wrote:
>>> Hi Heiner,
>>>
>>> this issue was on the back-burner for some time, but I've got some
>>> interesting news now.
>>>
>>> On Sat, 18 Jul 2020 14:07:50 +0200
>>> Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>   
>>>> [...]
>>>> Maybe the following gives us an idea:
>>>> Please do "ethtool -d <if>" after boot and after resume from suspend,
>>>> and check for differences.  
>>>
>>> The register dump did not reveal anything of interest - the only
>>> differences were in the physical addresses after a device reopen.
>>>
>>> However, knowing that reloading the driver can fix the issue, I copied
>>> the initialization sequence from init_one() to rtl8169_resume() and
>>> gave it a try. That works!
>>>
>>> Then I started removing the initialization calls one by one. This
>>> exercise left me with a call to rtl_init_rxcfg(), which simply sets the
>>> RxConfig register. In other words, these is the difference between
>>> 5.8.4 and my working version:
>>>
>>> --- linux-orig/drivers/net/ethernet/realtek/r8169_main.c	2020-09-02 22:43:09.361951750 +0200
>>> +++ linux/drivers/net/ethernet/realtek/r8169_main.c	2020-09-03 10:36:23.915803703 +0200
>>> @@ -4925,6 +4925,9 @@
>>>  
>>>  	clk_prepare_enable(tp->clk);
>>>  
>>> +	if (tp->mac_version == RTL_GIGA_MAC_VER_37)
>>> +		RTL_W32(tp, RxConfig, RX128_INT_EN | RX_DMA_BURST);
>>> +
>>>  	if (netif_running(tp->dev))
>>>  		__rtl8169_resume(tp);
>>>  
>>> This is quite surprising, at least when the device is managed by
>>> NetworkManager, because then it is closed on wakeup, and the open
>>> method should call rtl_init_rxcfg() anyway. So, it might be a timing
>>> issue, or incorrect order of register writes.
>>>   
>> Thanks for the analysis. If you manually bring down and up the
>> interface, do you see the same issue?
> 
> I'm not quite sure what you mean, but if the interface is configured
> (and NetworkManager is stopped), I can do 'ip link set eth0 down' and
> then 'ip link set eth0 up', and the interface is fully functional.
> 
>> What is the value of RxConfig when entering the resume function?
> 
> I added a dev_info() to rtl8169_resume(). First with NetworkManager
> active (i.e. interface down on suspend):
> 
> [  525.956675] r8169 0000:03:00.2: RxConfig after resume: 0x0002400f
> 
> Then I re-tried with NetworkManager stopped (i.e. interface up on
> suspend). Same result:
> 
> [  785.413887] r8169 0000:03:00.2: RxConfig after resume: 0x0002400f
> 
Thanks for checking this. The value is not what I would have expected,
it may be a BIOS bug or HW issue with RTL8402. I'll slightly modify
your proposed patch and then let you re-test it.

> I hope that's what you were asking for...
> 
> Petr T
> 
Heiner

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RTL8402 stops working after hibernate/resume
  2020-09-24 19:14           ` Petr Tesarik
  2020-09-24 19:37             ` Heiner Kallweit
@ 2020-09-24 20:12             ` Heiner Kallweit
  2020-09-25  7:30               ` Petr Tesarik
  1 sibling, 1 reply; 34+ messages in thread
From: Heiner Kallweit @ 2020-09-24 20:12 UTC (permalink / raw)
  To: Petr Tesarik; +Cc: Realtek linux nic maintainers, netdev

On 24.09.2020 21:14, Petr Tesarik wrote:
> On Wed, 23 Sep 2020 11:57:41 +0200
> Heiner Kallweit <hkallweit1@gmail.com> wrote:
> 
>> On 03.09.2020 10:41, Petr Tesarik wrote:
>>> Hi Heiner,
>>>
>>> this issue was on the back-burner for some time, but I've got some
>>> interesting news now.
>>>
>>> On Sat, 18 Jul 2020 14:07:50 +0200
>>> Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>   
>>>> [...]
>>>> Maybe the following gives us an idea:
>>>> Please do "ethtool -d <if>" after boot and after resume from suspend,
>>>> and check for differences.  
>>>
>>> The register dump did not reveal anything of interest - the only
>>> differences were in the physical addresses after a device reopen.
>>>
>>> However, knowing that reloading the driver can fix the issue, I copied
>>> the initialization sequence from init_one() to rtl8169_resume() and
>>> gave it a try. That works!
>>>
>>> Then I started removing the initialization calls one by one. This
>>> exercise left me with a call to rtl_init_rxcfg(), which simply sets the
>>> RxConfig register. In other words, these is the difference between
>>> 5.8.4 and my working version:
>>>
>>> --- linux-orig/drivers/net/ethernet/realtek/r8169_main.c	2020-09-02 22:43:09.361951750 +0200
>>> +++ linux/drivers/net/ethernet/realtek/r8169_main.c	2020-09-03 10:36:23.915803703 +0200
>>> @@ -4925,6 +4925,9 @@
>>>  
>>>  	clk_prepare_enable(tp->clk);
>>>  
>>> +	if (tp->mac_version == RTL_GIGA_MAC_VER_37)
>>> +		RTL_W32(tp, RxConfig, RX128_INT_EN | RX_DMA_BURST);
>>> +
>>>  	if (netif_running(tp->dev))
>>>  		__rtl8169_resume(tp);
>>>  
>>> This is quite surprising, at least when the device is managed by
>>> NetworkManager, because then it is closed on wakeup, and the open
>>> method should call rtl_init_rxcfg() anyway. So, it might be a timing
>>> issue, or incorrect order of register writes.
>>>   
>> Thanks for the analysis. If you manually bring down and up the
>> interface, do you see the same issue?
> 
> I'm not quite sure what you mean, but if the interface is configured
> (and NetworkManager is stopped), I can do 'ip link set eth0 down' and
> then 'ip link set eth0 up', and the interface is fully functional.
> 
>> What is the value of RxConfig when entering the resume function?
> 
> I added a dev_info() to rtl8169_resume(). First with NetworkManager
> active (i.e. interface down on suspend):
> 
> [  525.956675] r8169 0000:03:00.2: RxConfig after resume: 0x0002400f
> 
> Then I re-tried with NetworkManager stopped (i.e. interface up on
> suspend). Same result:
> 
> [  785.413887] r8169 0000:03:00.2: RxConfig after resume: 0x0002400f
> 
> I hope that's what you were asking for...
> 
> Petr T
> 

rtl8169_resume() has been changed in 5.9, therefore the patch doesn't
apply cleanly on older kernel versions. Can you test the following
on a 5.9-rc version or linux-next?

Thanks, Heiner


diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index 9e4e6a883..4fb49fd0d 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -4837,6 +4837,10 @@ static int rtl8169_resume(struct device *device)
 
 	rtl_rar_set(tp, tp->dev->dev_addr);
 
+	/* Reportedly at least Asus X453MA corrupts packets otherwise */
+	if (tp->mac_version == RTL_GIGA_MAC_VER_37)
+		rtl_init_rxcfg(tp);
+
 	if (tp->TxDescArray)
 		rtl8169_up(tp);
 
-- 
2.28.0

















^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: RTL8402 stops working after hibernate/resume
  2020-09-24 20:12             ` Heiner Kallweit
@ 2020-09-25  7:30               ` Petr Tesarik
  2020-09-25  8:54                 ` Petr Tesarik
  0 siblings, 1 reply; 34+ messages in thread
From: Petr Tesarik @ 2020-09-25  7:30 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Realtek linux nic maintainers, netdev

[-- Attachment #1: Type: text/plain, Size: 3447 bytes --]

On Thu, 24 Sep 2020 22:12:24 +0200
Heiner Kallweit <hkallweit1@gmail.com> wrote:

> On 24.09.2020 21:14, Petr Tesarik wrote:
> > On Wed, 23 Sep 2020 11:57:41 +0200
> > Heiner Kallweit <hkallweit1@gmail.com> wrote:
> > 
> >> On 03.09.2020 10:41, Petr Tesarik wrote:
> >>> Hi Heiner,
> >>>
> >>> this issue was on the back-burner for some time, but I've got some
> >>> interesting news now.
> >>>
> >>> On Sat, 18 Jul 2020 14:07:50 +0200
> >>> Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>>   
> >>>> [...]
> >>>> Maybe the following gives us an idea:
> >>>> Please do "ethtool -d <if>" after boot and after resume from suspend,
> >>>> and check for differences.  
> >>>
> >>> The register dump did not reveal anything of interest - the only
> >>> differences were in the physical addresses after a device reopen.
> >>>
> >>> However, knowing that reloading the driver can fix the issue, I copied
> >>> the initialization sequence from init_one() to rtl8169_resume() and
> >>> gave it a try. That works!
> >>>
> >>> Then I started removing the initialization calls one by one. This
> >>> exercise left me with a call to rtl_init_rxcfg(), which simply sets the
> >>> RxConfig register. In other words, these is the difference between
> >>> 5.8.4 and my working version:
> >>>
> >>> --- linux-orig/drivers/net/ethernet/realtek/r8169_main.c	2020-09-02 22:43:09.361951750 +0200
> >>> +++ linux/drivers/net/ethernet/realtek/r8169_main.c	2020-09-03 10:36:23.915803703 +0200
> >>> @@ -4925,6 +4925,9 @@
> >>>  
> >>>  	clk_prepare_enable(tp->clk);
> >>>  
> >>> +	if (tp->mac_version == RTL_GIGA_MAC_VER_37)
> >>> +		RTL_W32(tp, RxConfig, RX128_INT_EN | RX_DMA_BURST);
> >>> +
> >>>  	if (netif_running(tp->dev))
> >>>  		__rtl8169_resume(tp);
> >>>  
> >>> This is quite surprising, at least when the device is managed by
> >>> NetworkManager, because then it is closed on wakeup, and the open
> >>> method should call rtl_init_rxcfg() anyway. So, it might be a timing
> >>> issue, or incorrect order of register writes.
> >>>   
> >> Thanks for the analysis. If you manually bring down and up the
> >> interface, do you see the same issue?
> > 
> > I'm not quite sure what you mean, but if the interface is configured
> > (and NetworkManager is stopped), I can do 'ip link set eth0 down' and
> > then 'ip link set eth0 up', and the interface is fully functional.
> > 
> >> What is the value of RxConfig when entering the resume function?
> > 
> > I added a dev_info() to rtl8169_resume(). First with NetworkManager
> > active (i.e. interface down on suspend):
> > 
> > [  525.956675] r8169 0000:03:00.2: RxConfig after resume: 0x0002400f
> > 
> > Then I re-tried with NetworkManager stopped (i.e. interface up on
> > suspend). Same result:
> > 
> > [  785.413887] r8169 0000:03:00.2: RxConfig after resume: 0x0002400f
> > 
> > I hope that's what you were asking for...
> > 
> > Petr T
> > 
> 
> rtl8169_resume() has been changed in 5.9, therefore the patch doesn't
> apply cleanly on older kernel versions. Can you test the following
> on a 5.9-rc version or linux-next?

I tried installing 5.9-rc6, but it freezes hard at boot, last message is:

[   14.916259] libphy: r8169: probed

At this point, I suspect you're right that the BIOS is seriously buggy.
Let me check if ASUSTek has released any update for this model.

Thanks for your effort so far!

Petr T

[-- Attachment #2: Digitální podpis OpenPGP --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RTL8402 stops working after hibernate/resume
  2020-09-25  7:30               ` Petr Tesarik
@ 2020-09-25  8:54                 ` Petr Tesarik
  2020-09-25  9:44                   ` Heiner Kallweit
  0 siblings, 1 reply; 34+ messages in thread
From: Petr Tesarik @ 2020-09-25  8:54 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Realtek linux nic maintainers, netdev

[-- Attachment #1: Type: text/plain, Size: 4263 bytes --]

On Fri, 25 Sep 2020 09:30:37 +0200
Petr Tesarik <ptesarik@suse.cz> wrote:

> On Thu, 24 Sep 2020 22:12:24 +0200
> Heiner Kallweit <hkallweit1@gmail.com> wrote:
> 
> > On 24.09.2020 21:14, Petr Tesarik wrote:  
> > > On Wed, 23 Sep 2020 11:57:41 +0200
> > > Heiner Kallweit <hkallweit1@gmail.com> wrote:
> > >   
> > >> On 03.09.2020 10:41, Petr Tesarik wrote:  
> > >>> Hi Heiner,
> > >>>
> > >>> this issue was on the back-burner for some time, but I've got some
> > >>> interesting news now.
> > >>>
> > >>> On Sat, 18 Jul 2020 14:07:50 +0200
> > >>> Heiner Kallweit <hkallweit1@gmail.com> wrote:
> > >>>     
> > >>>> [...]
> > >>>> Maybe the following gives us an idea:
> > >>>> Please do "ethtool -d <if>" after boot and after resume from suspend,
> > >>>> and check for differences.    
> > >>>
> > >>> The register dump did not reveal anything of interest - the only
> > >>> differences were in the physical addresses after a device reopen.
> > >>>
> > >>> However, knowing that reloading the driver can fix the issue, I copied
> > >>> the initialization sequence from init_one() to rtl8169_resume() and
> > >>> gave it a try. That works!
> > >>>
> > >>> Then I started removing the initialization calls one by one. This
> > >>> exercise left me with a call to rtl_init_rxcfg(), which simply sets the
> > >>> RxConfig register. In other words, these is the difference between
> > >>> 5.8.4 and my working version:
> > >>>
> > >>> --- linux-orig/drivers/net/ethernet/realtek/r8169_main.c	2020-09-02 22:43:09.361951750 +0200
> > >>> +++ linux/drivers/net/ethernet/realtek/r8169_main.c	2020-09-03 10:36:23.915803703 +0200
> > >>> @@ -4925,6 +4925,9 @@
> > >>>  
> > >>>  	clk_prepare_enable(tp->clk);
> > >>>  
> > >>> +	if (tp->mac_version == RTL_GIGA_MAC_VER_37)
> > >>> +		RTL_W32(tp, RxConfig, RX128_INT_EN | RX_DMA_BURST);
> > >>> +
> > >>>  	if (netif_running(tp->dev))
> > >>>  		__rtl8169_resume(tp);
> > >>>  
> > >>> This is quite surprising, at least when the device is managed by
> > >>> NetworkManager, because then it is closed on wakeup, and the open
> > >>> method should call rtl_init_rxcfg() anyway. So, it might be a timing
> > >>> issue, or incorrect order of register writes.
> > >>>     
> > >> Thanks for the analysis. If you manually bring down and up the
> > >> interface, do you see the same issue?  
> > > 
> > > I'm not quite sure what you mean, but if the interface is configured
> > > (and NetworkManager is stopped), I can do 'ip link set eth0 down' and
> > > then 'ip link set eth0 up', and the interface is fully functional.
> > >   
> > >> What is the value of RxConfig when entering the resume function?  
> > > 
> > > I added a dev_info() to rtl8169_resume(). First with NetworkManager
> > > active (i.e. interface down on suspend):
> > > 
> > > [  525.956675] r8169 0000:03:00.2: RxConfig after resume: 0x0002400f
> > > 
> > > Then I re-tried with NetworkManager stopped (i.e. interface up on
> > > suspend). Same result:
> > > 
> > > [  785.413887] r8169 0000:03:00.2: RxConfig after resume: 0x0002400f
> > > 
> > > I hope that's what you were asking for...
> > > 
> > > Petr T
> > >   
> > 
> > rtl8169_resume() has been changed in 5.9, therefore the patch doesn't
> > apply cleanly on older kernel versions. Can you test the following
> > on a 5.9-rc version or linux-next?  
> 
> I tried installing 5.9-rc6, but it freezes hard at boot, last message is:
> 
> [   14.916259] libphy: r8169: probed
> 
> At this point, I suspect you're right that the BIOS is seriously buggy.
> Let me check if ASUSTek has released any update for this model.

Hm, it took me about an hour wondering why I cannot flash the 314 update, but then I finally noticed that this was for X543, while mine is an X453... *sigh*

So, I'm at BIOS version 214, released in 2015, and that's the latest version. There are some older versions available, but the BIOS Flash utility won't let me downgrade.

Does it make sense to bisect the change that broke the driver for me, or should I rather dispose of this waste^Wlaptop in an environmentally friendly manner? I mean, would you eventually accept a workaround for a few machines with a broken BIOS?

Petr T

[-- Attachment #2: Digitální podpis OpenPGP --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RTL8402 stops working after hibernate/resume
  2020-09-25  8:54                 ` Petr Tesarik
@ 2020-09-25  9:44                   ` Heiner Kallweit
  2020-09-25  9:52                     ` Petr Tesarik
  0 siblings, 1 reply; 34+ messages in thread
From: Heiner Kallweit @ 2020-09-25  9:44 UTC (permalink / raw)
  To: Petr Tesarik; +Cc: Realtek linux nic maintainers, netdev

On 25.09.2020 10:54, Petr Tesarik wrote:
> On Fri, 25 Sep 2020 09:30:37 +0200
> Petr Tesarik <ptesarik@suse.cz> wrote:
> 
>> On Thu, 24 Sep 2020 22:12:24 +0200
>> Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>
>>> On 24.09.2020 21:14, Petr Tesarik wrote:  
>>>> On Wed, 23 Sep 2020 11:57:41 +0200
>>>> Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>   
>>>>> On 03.09.2020 10:41, Petr Tesarik wrote:  
>>>>>> Hi Heiner,
>>>>>>
>>>>>> this issue was on the back-burner for some time, but I've got some
>>>>>> interesting news now.
>>>>>>
>>>>>> On Sat, 18 Jul 2020 14:07:50 +0200
>>>>>> Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>>>     
>>>>>>> [...]
>>>>>>> Maybe the following gives us an idea:
>>>>>>> Please do "ethtool -d <if>" after boot and after resume from suspend,
>>>>>>> and check for differences.    
>>>>>>
>>>>>> The register dump did not reveal anything of interest - the only
>>>>>> differences were in the physical addresses after a device reopen.
>>>>>>
>>>>>> However, knowing that reloading the driver can fix the issue, I copied
>>>>>> the initialization sequence from init_one() to rtl8169_resume() and
>>>>>> gave it a try. That works!
>>>>>>
>>>>>> Then I started removing the initialization calls one by one. This
>>>>>> exercise left me with a call to rtl_init_rxcfg(), which simply sets the
>>>>>> RxConfig register. In other words, these is the difference between
>>>>>> 5.8.4 and my working version:
>>>>>>
>>>>>> --- linux-orig/drivers/net/ethernet/realtek/r8169_main.c	2020-09-02 22:43:09.361951750 +0200
>>>>>> +++ linux/drivers/net/ethernet/realtek/r8169_main.c	2020-09-03 10:36:23.915803703 +0200
>>>>>> @@ -4925,6 +4925,9 @@
>>>>>>  
>>>>>>  	clk_prepare_enable(tp->clk);
>>>>>>  
>>>>>> +	if (tp->mac_version == RTL_GIGA_MAC_VER_37)
>>>>>> +		RTL_W32(tp, RxConfig, RX128_INT_EN | RX_DMA_BURST);
>>>>>> +
>>>>>>  	if (netif_running(tp->dev))
>>>>>>  		__rtl8169_resume(tp);
>>>>>>  
>>>>>> This is quite surprising, at least when the device is managed by
>>>>>> NetworkManager, because then it is closed on wakeup, and the open
>>>>>> method should call rtl_init_rxcfg() anyway. So, it might be a timing
>>>>>> issue, or incorrect order of register writes.
>>>>>>     
>>>>> Thanks for the analysis. If you manually bring down and up the
>>>>> interface, do you see the same issue?  
>>>>
>>>> I'm not quite sure what you mean, but if the interface is configured
>>>> (and NetworkManager is stopped), I can do 'ip link set eth0 down' and
>>>> then 'ip link set eth0 up', and the interface is fully functional.
>>>>   
>>>>> What is the value of RxConfig when entering the resume function?  
>>>>
>>>> I added a dev_info() to rtl8169_resume(). First with NetworkManager
>>>> active (i.e. interface down on suspend):
>>>>
>>>> [  525.956675] r8169 0000:03:00.2: RxConfig after resume: 0x0002400f
>>>>
>>>> Then I re-tried with NetworkManager stopped (i.e. interface up on
>>>> suspend). Same result:
>>>>
>>>> [  785.413887] r8169 0000:03:00.2: RxConfig after resume: 0x0002400f
>>>>
>>>> I hope that's what you were asking for...
>>>>
>>>> Petr T
>>>>   
>>>
>>> rtl8169_resume() has been changed in 5.9, therefore the patch doesn't
>>> apply cleanly on older kernel versions. Can you test the following
>>> on a 5.9-rc version or linux-next?  
>>
>> I tried installing 5.9-rc6, but it freezes hard at boot, last message is:
>>
>> [   14.916259] libphy: r8169: probed
>>

This doesn't necessarily mean that the r8169 driver crashes the system.
Other things could run in parallel. It freezes w/o any message?

>> At this point, I suspect you're right that the BIOS is seriously buggy.
>> Let me check if ASUSTek has released any update for this model.
> 
> Hm, it took me about an hour wondering why I cannot flash the 314 update, but then I finally noticed that this was for X543, while mine is an X453... *sigh*
> 
> So, I'm at BIOS version 214, released in 2015, and that's the latest version. There are some older versions available, but the BIOS Flash utility won't let me downgrade.
> 
> Does it make sense to bisect the change that broke the driver for me, or should I rather dispose of this waste^Wlaptop in an environmentally friendly manner? I mean, would you eventually accept a workaround for a few machines with a broken BIOS?
> 
If the workaround is small and there's little chance to break other stuff: then usually yes.
If you can spend the effort to bisect the issue, this would be appreciated.


> Petr T
> 
Heiner

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RTL8402 stops working after hibernate/resume
  2020-09-25  9:44                   ` Heiner Kallweit
@ 2020-09-25  9:52                     ` Petr Tesarik
  2020-09-25 12:56                       ` Petr Tesarik
  0 siblings, 1 reply; 34+ messages in thread
From: Petr Tesarik @ 2020-09-25  9:52 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Realtek linux nic maintainers, netdev

[-- Attachment #1: Type: text/plain, Size: 4736 bytes --]

On Fri, 25 Sep 2020 11:44:09 +0200
Heiner Kallweit <hkallweit1@gmail.com> wrote:

> On 25.09.2020 10:54, Petr Tesarik wrote:
> > On Fri, 25 Sep 2020 09:30:37 +0200
> > Petr Tesarik <ptesarik@suse.cz> wrote:
> >   
> >> On Thu, 24 Sep 2020 22:12:24 +0200
> >> Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>  
> >>> On 24.09.2020 21:14, Petr Tesarik wrote:    
> >>>> On Wed, 23 Sep 2020 11:57:41 +0200
> >>>> Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>>>     
> >>>>> On 03.09.2020 10:41, Petr Tesarik wrote:    
> >>>>>> Hi Heiner,
> >>>>>>
> >>>>>> this issue was on the back-burner for some time, but I've got some
> >>>>>> interesting news now.
> >>>>>>
> >>>>>> On Sat, 18 Jul 2020 14:07:50 +0200
> >>>>>> Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>>>>>       
> >>>>>>> [...]
> >>>>>>> Maybe the following gives us an idea:
> >>>>>>> Please do "ethtool -d <if>" after boot and after resume from suspend,
> >>>>>>> and check for differences.      
> >>>>>>
> >>>>>> The register dump did not reveal anything of interest - the only
> >>>>>> differences were in the physical addresses after a device reopen.
> >>>>>>
> >>>>>> However, knowing that reloading the driver can fix the issue, I copied
> >>>>>> the initialization sequence from init_one() to rtl8169_resume() and
> >>>>>> gave it a try. That works!
> >>>>>>
> >>>>>> Then I started removing the initialization calls one by one. This
> >>>>>> exercise left me with a call to rtl_init_rxcfg(), which simply sets the
> >>>>>> RxConfig register. In other words, these is the difference between
> >>>>>> 5.8.4 and my working version:
> >>>>>>
> >>>>>> --- linux-orig/drivers/net/ethernet/realtek/r8169_main.c	2020-09-02 22:43:09.361951750 +0200
> >>>>>> +++ linux/drivers/net/ethernet/realtek/r8169_main.c	2020-09-03 10:36:23.915803703 +0200
> >>>>>> @@ -4925,6 +4925,9 @@
> >>>>>>  
> >>>>>>  	clk_prepare_enable(tp->clk);
> >>>>>>  
> >>>>>> +	if (tp->mac_version == RTL_GIGA_MAC_VER_37)
> >>>>>> +		RTL_W32(tp, RxConfig, RX128_INT_EN | RX_DMA_BURST);
> >>>>>> +
> >>>>>>  	if (netif_running(tp->dev))
> >>>>>>  		__rtl8169_resume(tp);
> >>>>>>  
> >>>>>> This is quite surprising, at least when the device is managed by
> >>>>>> NetworkManager, because then it is closed on wakeup, and the open
> >>>>>> method should call rtl_init_rxcfg() anyway. So, it might be a timing
> >>>>>> issue, or incorrect order of register writes.
> >>>>>>       
> >>>>> Thanks for the analysis. If you manually bring down and up the
> >>>>> interface, do you see the same issue?    
> >>>>
> >>>> I'm not quite sure what you mean, but if the interface is configured
> >>>> (and NetworkManager is stopped), I can do 'ip link set eth0 down' and
> >>>> then 'ip link set eth0 up', and the interface is fully functional.
> >>>>     
> >>>>> What is the value of RxConfig when entering the resume function?    
> >>>>
> >>>> I added a dev_info() to rtl8169_resume(). First with NetworkManager
> >>>> active (i.e. interface down on suspend):
> >>>>
> >>>> [  525.956675] r8169 0000:03:00.2: RxConfig after resume: 0x0002400f
> >>>>
> >>>> Then I re-tried with NetworkManager stopped (i.e. interface up on
> >>>> suspend). Same result:
> >>>>
> >>>> [  785.413887] r8169 0000:03:00.2: RxConfig after resume: 0x0002400f
> >>>>
> >>>> I hope that's what you were asking for...
> >>>>
> >>>> Petr T
> >>>>     
> >>>
> >>> rtl8169_resume() has been changed in 5.9, therefore the patch doesn't
> >>> apply cleanly on older kernel versions. Can you test the following
> >>> on a 5.9-rc version or linux-next?    
> >>
> >> I tried installing 5.9-rc6, but it freezes hard at boot, last message is:
> >>
> >> [   14.916259] libphy: r8169: probed
> >>  
> 
> This doesn't necessarily mean that the r8169 driver crashes the system.
> Other things could run in parallel. It freezes w/o any message?

The system freezes hard. I have already encountered a similar freeze
with the alternative r8169 driver, so it's quite likely related.

> >> At this point, I suspect you're right that the BIOS is seriously buggy.
> >> Let me check if ASUSTek has released any update for this model.  
> > 
>[...]
> > Does it make sense to bisect the change that broke the driver for me, or should I rather dispose of this waste^Wlaptop in an environmentally friendly manner? I mean, would you eventually accept a workaround for a few machines with a broken BIOS?
> >   
> If the workaround is small and there's little chance to break other stuff: then usually yes.
> If you can spend the effort to bisect the issue, this would be appreciated.

OK, then I'm going to give it a try.

Stay tuned,
Petr T

[-- Attachment #2: Digitální podpis OpenPGP --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RTL8402 stops working after hibernate/resume
  2020-09-25  9:52                     ` Petr Tesarik
@ 2020-09-25 12:56                       ` Petr Tesarik
  2020-09-25 14:47                         ` Heiner Kallweit
  0 siblings, 1 reply; 34+ messages in thread
From: Petr Tesarik @ 2020-09-25 12:56 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Realtek linux nic maintainers, netdev

[-- Attachment #1: Type: text/plain, Size: 1449 bytes --]

On Fri, 25 Sep 2020 11:52:41 +0200
Petr Tesarik <ptesarik@suse.cz> wrote:

> On Fri, 25 Sep 2020 11:44:09 +0200
> Heiner Kallweit <hkallweit1@gmail.com> wrote:
> 
> > On 25.09.2020 10:54, Petr Tesarik wrote:  
>[...]
> > > Does it make sense to bisect the change that broke the driver for me, or should I rather dispose of this waste^Wlaptop in an environmentally friendly manner? I mean, would you eventually accept a workaround for a few machines with a broken BIOS?
> > >     
> > If the workaround is small and there's little chance to break other stuff: then usually yes.
> > If you can spend the effort to bisect the issue, this would be appreciated.  
> 
> OK, then I'm going to give it a try.

Done. The system freezes when this commit is applied:

commit 9f0b54cd167219266bd3864570ae8f4987b57520
Author: Heiner Kallweit <hkallweit1@gmail.com>
Date:   Wed Jun 17 22:55:40 2020 +0200

    r8169: move switching optional clock on/off to pll power functions
    
    Relevant chip clocks are disabled in rtl_pll_power_down(), therefore
    move calling clk_disable_unprepare() there. Similar for enabling the
    clock.
    
    Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

I cannot be sure this is related to the malfunction after resume reported earlier, but it again touches the suspend/resume path...

Anything else I should try?

Petr T

[-- Attachment #2: Digitální podpis OpenPGP --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RTL8402 stops working after hibernate/resume
  2020-09-25 12:56                       ` Petr Tesarik
@ 2020-09-25 14:47                         ` Heiner Kallweit
  2020-09-29 19:07                           ` Petr Tesarik
  0 siblings, 1 reply; 34+ messages in thread
From: Heiner Kallweit @ 2020-09-25 14:47 UTC (permalink / raw)
  To: Petr Tesarik; +Cc: Realtek linux nic maintainers, netdev

On 25.09.2020 14:56, Petr Tesarik wrote:
> On Fri, 25 Sep 2020 11:52:41 +0200
> Petr Tesarik <ptesarik@suse.cz> wrote:
> 
>> On Fri, 25 Sep 2020 11:44:09 +0200
>> Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>
>>> On 25.09.2020 10:54, Petr Tesarik wrote:  
>> [...]
>>>> Does it make sense to bisect the change that broke the driver for me, or should I rather dispose of this waste^Wlaptop in an environmentally friendly manner? I mean, would you eventually accept a workaround for a few machines with a broken BIOS?
>>>>     
>>> If the workaround is small and there's little chance to break other stuff: then usually yes.
>>> If you can spend the effort to bisect the issue, this would be appreciated.  
>>
>> OK, then I'm going to give it a try.
> 
> Done. The system freezes when this commit is applied:
> 
> commit 9f0b54cd167219266bd3864570ae8f4987b57520
> Author: Heiner Kallweit <hkallweit1@gmail.com>
> Date:   Wed Jun 17 22:55:40 2020 +0200
> 
>     r8169: move switching optional clock on/off to pll power functions
> 
This sounds weird. On your system tp->clk should be NULL, making
clk_prepare_enable() et al no-ops. Please check whether tp->clk
is NULL after the call to rtl_get_ether_clk().

    
>     Relevant chip clocks are disabled in rtl_pll_power_down(), therefore
>     move calling clk_disable_unprepare() there. Similar for enabling the
>     clock.
>     
>     Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> I cannot be sure this is related to the malfunction after resume reported earlier, but it again touches the suspend/resume path...
> 
> Anything else I should try?
> 
> Petr T
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RTL8402 stops working after hibernate/resume
  2020-09-25 14:47                         ` Heiner Kallweit
@ 2020-09-29 19:07                           ` Petr Tesarik
  2020-09-29 19:41                             ` Heiner Kallweit
  2020-09-29 20:08                             ` Hans de Goede
  0 siblings, 2 replies; 34+ messages in thread
From: Petr Tesarik @ 2020-09-29 19:07 UTC (permalink / raw)
  To: Heiner Kallweit, Hans de Goede
  Cc: Realtek linux nic maintainers, netdev, linux-clk

[-- Attachment #1: Type: text/plain, Size: 2759 bytes --]

Hi Heiner (and now also Hans)!

@Hans: I'm adding you to this conversation, because you're the author
of commit b1e3454d39f99, which seems to break the r8169 driver on a
laptop of mine.

On Fri, 25 Sep 2020 16:47:54 +0200
Heiner Kallweit <hkallweit1@gmail.com> wrote:

> On 25.09.2020 14:56, Petr Tesarik wrote:
> > On Fri, 25 Sep 2020 11:52:41 +0200
> > Petr Tesarik <ptesarik@suse.cz> wrote:
> >   
> >> On Fri, 25 Sep 2020 11:44:09 +0200
> >> Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>  
> >>> On 25.09.2020 10:54, Petr Tesarik wrote:    
> >> [...]  
> >>>> Does it make sense to bisect the change that broke the driver for me, or should I rather dispose of this waste^Wlaptop in an environmentally friendly manner? I mean, would you eventually accept a workaround for a few machines with a broken BIOS?
> >>>>       
> >>> If the workaround is small and there's little chance to break other stuff: then usually yes.
> >>> If you can spend the effort to bisect the issue, this would be appreciated.    
> >>
> >> OK, then I'm going to give it a try.  
> > 
> > Done. The system freezes when this commit is applied:
> > 
> > commit 9f0b54cd167219266bd3864570ae8f4987b57520
> > Author: Heiner Kallweit <hkallweit1@gmail.com>
> > Date:   Wed Jun 17 22:55:40 2020 +0200
> > 
> >     r8169: move switching optional clock on/off to pll power functions
> >   
> This sounds weird. On your system tp->clk should be NULL, making
> clk_prepare_enable() et al no-ops. Please check whether tp->clk
> is NULL after the call to rtl_get_ether_clk().

This might be part of the issue. On my system tp->clk is definitely not
NULL:

crash> *rtl8169_private.clk 0xffff9277aca58940
  clk = 0xffff9277ac2c82a0

crash> *clk 0xffff9277ac2c82a0
struct clk {
  core = 0xffff9277aef65d00, 
  dev = 0xffff9277aed000b0, 
  dev_id = 0xffff9277aec60c00 "0000:03:00.2", 
  con_id = 0xffff9277ad04b080 "ether_clk", 
  min_rate = 0, 
  max_rate = 18446744073709551615, 
  exclusive_count = 0, 
  clks_node = {
    next = 0xffff9277ad2428d8, 
    pprev = 0xffff9277aef65dc8
  }
}

The dev_id corresponds to the Ethernet controller:

03:00.2 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL810xE PCI Express Fast Ethernet controller (rev 06)

Looking at clk_find(), it matches this entry in clocks:

struct clk_lookup {
  node = {
    next = 0xffffffffbc702f40, 
    prev = 0xffff9277bf7190c0
  }, 
  dev_id = 0x0, 
  con_id = 0xffff9277bf719524 "ether_clk", 
  clk = 0x0, 
  clk_hw = 0xffff9277ad2427f8
}

That's because this kernel is built with CONFIG_PMC_ATOM=y, and looking
at the platform initialization code, the "ether_clk" alias is created
unconditionally. Hans already added.

Petr T

[-- Attachment #2: Digitální podpis OpenPGP --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RTL8402 stops working after hibernate/resume
  2020-09-29 19:07                           ` Petr Tesarik
@ 2020-09-29 19:41                             ` Heiner Kallweit
  2020-09-29 20:08                             ` Hans de Goede
  1 sibling, 0 replies; 34+ messages in thread
From: Heiner Kallweit @ 2020-09-29 19:41 UTC (permalink / raw)
  To: Petr Tesarik, Hans de Goede
  Cc: Realtek linux nic maintainers, netdev, linux-clk

On 29.09.2020 21:07, Petr Tesarik wrote:
> Hi Heiner (and now also Hans)!
> 
> @Hans: I'm adding you to this conversation, because you're the author
> of commit b1e3454d39f99, which seems to break the r8169 driver on a
> laptop of mine.
> 
> On Fri, 25 Sep 2020 16:47:54 +0200
> Heiner Kallweit <hkallweit1@gmail.com> wrote:
> 
>> On 25.09.2020 14:56, Petr Tesarik wrote:
>>> On Fri, 25 Sep 2020 11:52:41 +0200
>>> Petr Tesarik <ptesarik@suse.cz> wrote:
>>>   
>>>> On Fri, 25 Sep 2020 11:44:09 +0200
>>>> Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>  
>>>>> On 25.09.2020 10:54, Petr Tesarik wrote:    
>>>> [...]  
>>>>>> Does it make sense to bisect the change that broke the driver for me, or should I rather dispose of this waste^Wlaptop in an environmentally friendly manner? I mean, would you eventually accept a workaround for a few machines with a broken BIOS?
>>>>>>       
>>>>> If the workaround is small and there's little chance to break other stuff: then usually yes.
>>>>> If you can spend the effort to bisect the issue, this would be appreciated.    
>>>>
>>>> OK, then I'm going to give it a try.  
>>>
>>> Done. The system freezes when this commit is applied:
>>>
>>> commit 9f0b54cd167219266bd3864570ae8f4987b57520
>>> Author: Heiner Kallweit <hkallweit1@gmail.com>
>>> Date:   Wed Jun 17 22:55:40 2020 +0200
>>>
>>>     r8169: move switching optional clock on/off to pll power functions
>>>   
>> This sounds weird. On your system tp->clk should be NULL, making
>> clk_prepare_enable() et al no-ops. Please check whether tp->clk
>> is NULL after the call to rtl_get_ether_clk().
> 
> This might be part of the issue. On my system tp->clk is definitely not
> NULL:
> 
OK, interesting. The referenced patch changes driver behavior in a way
that ether_clk is disabled again in probe(), and only re-enabled
in ndo_open(). This should be helpful from a power-saving perspective
because before that we enabled the clock even if the network device
wasn't used.
It seems however that disabling ether_clk has (at least on your system)
side effects causing a system freeze. That's a best guess from my side
however, and not really something I can comment on.

> crash> *rtl8169_private.clk 0xffff9277aca58940
>   clk = 0xffff9277ac2c82a0
> 
> crash> *clk 0xffff9277ac2c82a0
> struct clk {
>   core = 0xffff9277aef65d00, 
>   dev = 0xffff9277aed000b0, 
>   dev_id = 0xffff9277aec60c00 "0000:03:00.2", 
>   con_id = 0xffff9277ad04b080 "ether_clk", 
>   min_rate = 0, 
>   max_rate = 18446744073709551615, 
>   exclusive_count = 0, 
>   clks_node = {
>     next = 0xffff9277ad2428d8, 
>     pprev = 0xffff9277aef65dc8
>   }
> }
> 
> The dev_id corresponds to the Ethernet controller:
> 
> 03:00.2 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL810xE PCI Express Fast Ethernet controller (rev 06)
> 
> Looking at clk_find(), it matches this entry in clocks:
> 
> struct clk_lookup {
>   node = {
>     next = 0xffffffffbc702f40, 
>     prev = 0xffff9277bf7190c0
>   }, 
>   dev_id = 0x0, 
>   con_id = 0xffff9277bf719524 "ether_clk", 
>   clk = 0x0, 
>   clk_hw = 0xffff9277ad2427f8
> }
> 
> That's because this kernel is built with CONFIG_PMC_ATOM=y, and looking
> at the platform initialization code, the "ether_clk" alias is created
> unconditionally. Hans already added.
> 
> Petr T
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RTL8402 stops working after hibernate/resume
  2020-09-29 19:07                           ` Petr Tesarik
  2020-09-29 19:41                             ` Heiner Kallweit
@ 2020-09-29 20:08                             ` Hans de Goede
  2020-09-29 20:29                               ` Hans de Goede
                                                 ` (2 more replies)
  1 sibling, 3 replies; 34+ messages in thread
From: Hans de Goede @ 2020-09-29 20:08 UTC (permalink / raw)
  To: Petr Tesarik, Heiner Kallweit
  Cc: Realtek linux nic maintainers, netdev, linux-clk

Hi,

On 9/29/20 9:07 PM, Petr Tesarik wrote:
> Hi Heiner (and now also Hans)!
> 
> @Hans: I'm adding you to this conversation, because you're the author
> of commit b1e3454d39f99, which seems to break the r8169 driver on a
> laptop of mine.

Erm, no, as you bi-sected yourself already commit 9f0b54cd167219
("r8169: move switching optional clock on/off to pll power functions")

Broke your laptop, commit b1e3454d39f99 ("clk: x86: add "ether_clk" alias
for Bay Trail / Cherry Trail") is about 18 months older.

> On Fri, 25 Sep 2020 16:47:54 +0200
> Heiner Kallweit <hkallweit1@gmail.com> wrote:
> 
>> On 25.09.2020 14:56, Petr Tesarik wrote:
>>> On Fri, 25 Sep 2020 11:52:41 +0200
>>> Petr Tesarik <ptesarik@suse.cz> wrote:
>>>    
>>>> On Fri, 25 Sep 2020 11:44:09 +0200
>>>> Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>   
>>>>> On 25.09.2020 10:54, Petr Tesarik wrote:
>>>> [...]
>>>>>> Does it make sense to bisect the change that broke the driver for me, or should I rather dispose of this waste^Wlaptop in an environmentally friendly manner? I mean, would you eventually accept a workaround for a few machines with a broken BIOS?
>>>>>>        
>>>>> If the workaround is small and there's little chance to break other stuff: then usually yes.
>>>>> If you can spend the effort to bisect the issue, this would be appreciated.
>>>>
>>>> OK, then I'm going to give it a try.
>>>
>>> Done. The system freezes when this commit is applied:
>>>
>>> commit 9f0b54cd167219266bd3864570ae8f4987b57520
>>> Author: Heiner Kallweit <hkallweit1@gmail.com>
>>> Date:   Wed Jun 17 22:55:40 2020 +0200
>>>
>>>      r8169: move switching optional clock on/off to pll power functions
>>>    
>> This sounds weird. On your system tp->clk should be NULL,

Heiner, why do you say that tp->clk should be NULL on Petr's
system? Because it is an x86 based system?

Some X86 SoCs, specifically, the more tablet oriented Bay and Cherry
Trail SoCs, which are much more ARM SoC like then other X86 SoCs do
also use the clock framework and the SoC has a number of external clk
pins which are typically used by audio codecs and by ethernet chips.

>> making
>> clk_prepare_enable() et al no-ops. Please check whether tp->clk
>> is NULL after the call to rtl_get_ether_clk().
> 
> This might be part of the issue. On my system tp->clk is definitely not
> NULL:
> 
> crash> *rtl8169_private.clk 0xffff9277aca58940
>    clk = 0xffff9277ac2c82a0
> 
> crash> *clk 0xffff9277ac2c82a0
> struct clk {
>    core = 0xffff9277aef65d00,
>    dev = 0xffff9277aed000b0,
>    dev_id = 0xffff9277aec60c00 "0000:03:00.2",
>    con_id = 0xffff9277ad04b080 "ether_clk",
>    min_rate = 0,
>    max_rate = 18446744073709551615,
>    exclusive_count = 0,
>    clks_node = {
>      next = 0xffff9277ad2428d8,
>      pprev = 0xffff9277aef65dc8
>    }
> }
> 
> The dev_id corresponds to the Ethernet controller:
> 
> 03:00.2 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL810xE PCI Express Fast Ethernet controller (rev 06)
> 
> Looking at clk_find(), it matches this entry in clocks:
> 
> struct clk_lookup {
>    node = {
>      next = 0xffffffffbc702f40,
>      prev = 0xffff9277bf7190c0
>    },
>    dev_id = 0x0,
>    con_id = 0xffff9277bf719524 "ether_clk",
>    clk = 0x0,
>    clk_hw = 0xffff9277ad2427f8
> }
> 
> That's because this kernel is built with CONFIG_PMC_ATOM=y, and looking
> at the platform initialization code, the "ether_clk" alias is created
> unconditionally. Hans already added.

Petr, unconditionally is not really correct here, just as claiming
above that my commit broke things was not really correct either.

I guess this is mostly semantics, but I don't appreciate
the accusatory tone here.

The code in question binds to a clk-pmc-atom platform_device which
gets instantiated by drivers/platform/x86/pmc_atom.c. Which in turn
binds to a PCI device which is only present on Bay Trail and Cherry
Trail SoCs.

IOW the commit operates as advertised in its Subject:
"clk: x86: add "ether_clk" alias for Bay Trail / Cherry Trail"

So with that all clarified lets try to see if we can figure out
*why* this is actually happening.

Petr, can you describe your hardware in some more detail,
in the bits quoted when you first Cc-ed me there is not that
much detail. What is the vendor and model of your laptop?

Looking closer at commit 9f0b54cd167219
("r8169: move switching optional clock on/off to pll power functions")
I notice that the functions which now enable/disable the clock:
rtl_pll_power_up() and rtl_pll_power_down()

Only run when the interface is up during suspend/resume.
Petr, I guess the laptop is not connected to ethernet when you
hibernate it?

That means that on resume the clock will not be re-enabled.

This is a subtle but important change and I believe that
this is what is breaking things. I guess that the PLL which
rtl_pll_power_up() / rtl_pll_power_down() controls is only
used for ethernet-timing.  But the external clock controlled
through pt->clk is a replacement for using an external
crystal with the r8169 chip. So with it disabled, the entire
chip does not have a clock and is essentially dead.
It can then e.g. not respond to any pci-e reads/writes done
to it.

So I believe that the proper fix for this is to revert
commit 9f0b54cd167219
("r8169: move switching optional clock on/off to pll power functions")

As that caused the whole chip's clock to be left off after
a suspend/resume while the interface is down.

Also some remarks about this while I'm being a bit grumpy about
all this anyways (sorry):

1. 9f0b54cd167219 ("r8169: move switching optional clock on/off
to pll power functions") commit's message does not seem to really
explain why this change was made...

2. If a git blame would have been done to find the commit adding
the clk support: commit c2f6f3ee7f22 ("r8169: Get and enable optional ether_clk clock")
then you could have known that the clk in question is an external
clock for the entire chip, the commit message pretty clearly states
this (although "the entire" part is implied only) :

"On some boards a platform clock is used as clock for the r8169 chip,
this commit adds support for getting and enabling this clock (assuming
it has an "ether_clk" alias set on it).

This is related to commit d31fd43c0f9a ("clk: x86: Do not gate clocks
enabled by the firmware") which is a previous attempt to fix this for some
x86 boards, but this causes all Cherry Trail SoC using boards to not reach
there lowest power states when suspending.

This commit (together with an atom-pmc-clk driver commit adding the alias)
fixes things properly by making the r8169 get the clock and enable it when
it needs it."

Regards,

Hans


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RTL8402 stops working after hibernate/resume
  2020-09-29 20:08                             ` Hans de Goede
@ 2020-09-29 20:29                               ` Hans de Goede
  2020-09-29 20:35                               ` Heiner Kallweit
  2020-09-29 21:17                               ` Petr Tesarik
  2 siblings, 0 replies; 34+ messages in thread
From: Hans de Goede @ 2020-09-29 20:29 UTC (permalink / raw)
  To: Petr Tesarik, Heiner Kallweit
  Cc: Realtek linux nic maintainers, netdev, linux-clk

p.s.

On 9/29/20 10:08 PM, Hans de Goede wrote:

<snip>

> So I believe that the proper fix for this is to revert
> commit 9f0b54cd167219
> ("r8169: move switching optional clock on/off to pll power functions")

Heiner, assuming you agree that reverting this commit is
the best way to fix this, can you please submit a revert
for this upstream ?

With a:

Fixes: 9f0b54cd167219 ("r8169: move switching optional clock on/off to pll power functions")

Tag in the commit-message so that this gets cherry-picked into
the stable series where necessary.

Regards,

Hans



> As that caused the whole chip's clock to be left off after
> a suspend/resume while the interface is down.
> 
> Also some remarks about this while I'm being a bit grumpy about
> all this anyways (sorry):
> 
> 1. 9f0b54cd167219 ("r8169: move switching optional clock on/off
> to pll power functions") commit's message does not seem to really
> explain why this change was made...
> 
> 2. If a git blame would have been done to find the commit adding
> the clk support: commit c2f6f3ee7f22 ("r8169: Get and enable optional ether_clk clock")
> then you could have known that the clk in question is an external
> clock for the entire chip, the commit message pretty clearly states
> this (although "the entire" part is implied only) :
> 
> "On some boards a platform clock is used as clock for the r8169 chip,
> this commit adds support for getting and enabling this clock (assuming
> it has an "ether_clk" alias set on it).
> 
> This is related to commit d31fd43c0f9a ("clk: x86: Do not gate clocks
> enabled by the firmware") which is a previous attempt to fix this for some
> x86 boards, but this causes all Cherry Trail SoC using boards to not reach
> there lowest power states when suspending.
> 
> This commit (together with an atom-pmc-clk driver commit adding the alias)
> fixes things properly by making the r8169 get the clock and enable it when
> it needs it."
> 
> Regards,
> 
> Hans


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RTL8402 stops working after hibernate/resume
  2020-09-29 20:08                             ` Hans de Goede
  2020-09-29 20:29                               ` Hans de Goede
@ 2020-09-29 20:35                               ` Heiner Kallweit
  2020-09-29 20:42                                 ` Heiner Kallweit
  2020-09-30  9:04                                 ` Hans de Goede
  2020-09-29 21:17                               ` Petr Tesarik
  2 siblings, 2 replies; 34+ messages in thread
From: Heiner Kallweit @ 2020-09-29 20:35 UTC (permalink / raw)
  To: Hans de Goede, Petr Tesarik
  Cc: Realtek linux nic maintainers, netdev, linux-clk

On 29.09.2020 22:08, Hans de Goede wrote:
> Hi,
> 
> On 9/29/20 9:07 PM, Petr Tesarik wrote:
>> Hi Heiner (and now also Hans)!
>>
>> @Hans: I'm adding you to this conversation, because you're the author
>> of commit b1e3454d39f99, which seems to break the r8169 driver on a
>> laptop of mine.
> 
> Erm, no, as you bi-sected yourself already commit 9f0b54cd167219
> ("r8169: move switching optional clock on/off to pll power functions")
> 
> Broke your laptop, commit b1e3454d39f99 ("clk: x86: add "ether_clk" alias
> for Bay Trail / Cherry Trail") is about 18 months older.
> 
>> On Fri, 25 Sep 2020 16:47:54 +0200
>> Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>
>>> On 25.09.2020 14:56, Petr Tesarik wrote:
>>>> On Fri, 25 Sep 2020 11:52:41 +0200
>>>> Petr Tesarik <ptesarik@suse.cz> wrote:
>>>>   
>>>>> On Fri, 25 Sep 2020 11:44:09 +0200
>>>>> Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>>  
>>>>>> On 25.09.2020 10:54, Petr Tesarik wrote:
>>>>> [...]
>>>>>>> Does it make sense to bisect the change that broke the driver for me, or should I rather dispose of this waste^Wlaptop in an environmentally friendly manner? I mean, would you eventually accept a workaround for a few machines with a broken BIOS?
>>>>>>>        
>>>>>> If the workaround is small and there's little chance to break other stuff: then usually yes.
>>>>>> If you can spend the effort to bisect the issue, this would be appreciated.
>>>>>
>>>>> OK, then I'm going to give it a try.
>>>>
>>>> Done. The system freezes when this commit is applied:
>>>>
>>>> commit 9f0b54cd167219266bd3864570ae8f4987b57520
>>>> Author: Heiner Kallweit <hkallweit1@gmail.com>
>>>> Date:   Wed Jun 17 22:55:40 2020 +0200
>>>>
>>>>      r8169: move switching optional clock on/off to pll power functions
>>>>    
>>> This sounds weird. On your system tp->clk should be NULL,
> 
> Heiner, why do you say that tp->clk should be NULL on Petr's
> system? Because it is an x86 based system?
> 
This was a misunderstanding on my side, I was under the assumption
that ether_clk applies to DT-configured systems only.

> Some X86 SoCs, specifically, the more tablet oriented Bay and Cherry
> Trail SoCs, which are much more ARM SoC like then other X86 SoCs do
> also use the clock framework and the SoC has a number of external clk
> pins which are typically used by audio codecs and by ethernet chips.
> 
>>> making
>>> clk_prepare_enable() et al no-ops. Please check whether tp->clk
>>> is NULL after the call to rtl_get_ether_clk().
>>
>> This might be part of the issue. On my system tp->clk is definitely not
>> NULL:
>>
>> crash> *rtl8169_private.clk 0xffff9277aca58940
>>    clk = 0xffff9277ac2c82a0
>>
>> crash> *clk 0xffff9277ac2c82a0
>> struct clk {
>>    core = 0xffff9277aef65d00,
>>    dev = 0xffff9277aed000b0,
>>    dev_id = 0xffff9277aec60c00 "0000:03:00.2",
>>    con_id = 0xffff9277ad04b080 "ether_clk",
>>    min_rate = 0,
>>    max_rate = 18446744073709551615,
>>    exclusive_count = 0,
>>    clks_node = {
>>      next = 0xffff9277ad2428d8,
>>      pprev = 0xffff9277aef65dc8
>>    }
>> }
>>
>> The dev_id corresponds to the Ethernet controller:
>>
>> 03:00.2 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL810xE PCI Express Fast Ethernet controller (rev 06)
>>
>> Looking at clk_find(), it matches this entry in clocks:
>>
>> struct clk_lookup {
>>    node = {
>>      next = 0xffffffffbc702f40,
>>      prev = 0xffff9277bf7190c0
>>    },
>>    dev_id = 0x0,
>>    con_id = 0xffff9277bf719524 "ether_clk",
>>    clk = 0x0,
>>    clk_hw = 0xffff9277ad2427f8
>> }
>>
>> That's because this kernel is built with CONFIG_PMC_ATOM=y, and looking
>> at the platform initialization code, the "ether_clk" alias is created
>> unconditionally. Hans already added.
> 
> Petr, unconditionally is not really correct here, just as claiming
> above that my commit broke things was not really correct either.
> 
> I guess this is mostly semantics, but I don't appreciate
> the accusatory tone here.
> 
> The code in question binds to a clk-pmc-atom platform_device which
> gets instantiated by drivers/platform/x86/pmc_atom.c. Which in turn
> binds to a PCI device which is only present on Bay Trail and Cherry
> Trail SoCs.
> 
> IOW the commit operates as advertised in its Subject:
> "clk: x86: add "ether_clk" alias for Bay Trail / Cherry Trail"
> 
> So with that all clarified lets try to see if we can figure out
> *why* this is actually happening.
> 
> Petr, can you describe your hardware in some more detail,
> in the bits quoted when you first Cc-ed me there is not that
> much detail. What is the vendor and model of your laptop?
> 
> Looking closer at commit 9f0b54cd167219
> ("r8169: move switching optional clock on/off to pll power functions")
> I notice that the functions which now enable/disable the clock:
> rtl_pll_power_up() and rtl_pll_power_down()
> 
> Only run when the interface is up during suspend/resume.
> Petr, I guess the laptop is not connected to ethernet when you
> hibernate it?
> 
> That means that on resume the clock will not be re-enabled.
> 
> This is a subtle but important change and I believe that
> this is what is breaking things. I guess that the PLL which
> rtl_pll_power_up() / rtl_pll_power_down() controls is only
> used for ethernet-timing.  But the external clock controlled
> through pt->clk is a replacement for using an external
> crystal with the r8169 chip. So with it disabled, the entire
> chip does not have a clock and is essentially dead.
> It can then e.g. not respond to any pci-e reads/writes done
> to it.
> 
> So I believe that the proper fix for this is to revert
> commit 9f0b54cd167219
> ("r8169: move switching optional clock on/off to pll power functions")
> 
> As that caused the whole chip's clock to be left off after
> a suspend/resume while the interface is down.
> 
> Also some remarks about this while I'm being a bit grumpy about
> all this anyways (sorry):
> 
> 1. 9f0b54cd167219 ("r8169: move switching optional clock on/off
> to pll power functions") commit's message does not seem to really
> explain why this change was made...
> 
> 2. If a git blame would have been done to find the commit adding
> the clk support: commit c2f6f3ee7f22 ("r8169: Get and enable optional ether_clk clock")
> then you could have known that the clk in question is an external
> clock for the entire chip, the commit message pretty clearly states
> this (although "the entire" part is implied only) :
> 
> "On some boards a platform clock is used as clock for the r8169 chip,
> this commit adds support for getting and enabling this clock (assuming
> it has an "ether_clk" alias set on it).
> 
Even if the missing clock would stop the network chip completely,
this shouldn't freeze the system as described by Petr.
In some old RTL8169S spec an external 25MHz clock is mentioned,
what matches the MII bus frequency. Therefore I'm not 100% convinced
that the clock is needed for basic chip operation, but due to
Realtek not releasing datasheets I can't verify this.

But yes, if reverting this change avoids the issue on Petr's system,
then we should do it. A simple mechanical revert wouldn't work because
source file structure has changed since then, so I'll prepare a patch
that effectively reverts the change.

> This is related to commit d31fd43c0f9a ("clk: x86: Do not gate clocks
> enabled by the firmware") which is a previous attempt to fix this for some
> x86 boards, but this causes all Cherry Trail SoC using boards to not reach
> there lowest power states when suspending.
> 
> This commit (together with an atom-pmc-clk driver commit adding the alias)
> fixes things properly by making the r8169 get the clock and enable it when
> it needs it."
> 
> Regards,
> 
> Hans
> 

Heiner

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RTL8402 stops working after hibernate/resume
  2020-09-29 20:35                               ` Heiner Kallweit
@ 2020-09-29 20:42                                 ` Heiner Kallweit
  2020-09-30  9:04                                 ` Hans de Goede
  1 sibling, 0 replies; 34+ messages in thread
From: Heiner Kallweit @ 2020-09-29 20:42 UTC (permalink / raw)
  To: Hans de Goede, Petr Tesarik
  Cc: Realtek linux nic maintainers, netdev, linux-clk

On 29.09.2020 22:35, Heiner Kallweit wrote:
> On 29.09.2020 22:08, Hans de Goede wrote:
>> Hi,
>>
>> On 9/29/20 9:07 PM, Petr Tesarik wrote:
>>> Hi Heiner (and now also Hans)!
>>>
>>> @Hans: I'm adding you to this conversation, because you're the author
>>> of commit b1e3454d39f99, which seems to break the r8169 driver on a
>>> laptop of mine.
>>
>> Erm, no, as you bi-sected yourself already commit 9f0b54cd167219
>> ("r8169: move switching optional clock on/off to pll power functions")
>>
>> Broke your laptop, commit b1e3454d39f99 ("clk: x86: add "ether_clk" alias
>> for Bay Trail / Cherry Trail") is about 18 months older.
>>
>>> On Fri, 25 Sep 2020 16:47:54 +0200
>>> Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>
>>>> On 25.09.2020 14:56, Petr Tesarik wrote:
>>>>> On Fri, 25 Sep 2020 11:52:41 +0200
>>>>> Petr Tesarik <ptesarik@suse.cz> wrote:
>>>>>   
>>>>>> On Fri, 25 Sep 2020 11:44:09 +0200
>>>>>> Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>>>  
>>>>>>> On 25.09.2020 10:54, Petr Tesarik wrote:
>>>>>> [...]
>>>>>>>> Does it make sense to bisect the change that broke the driver for me, or should I rather dispose of this waste^Wlaptop in an environmentally friendly manner? I mean, would you eventually accept a workaround for a few machines with a broken BIOS?
>>>>>>>>        
>>>>>>> If the workaround is small and there's little chance to break other stuff: then usually yes.
>>>>>>> If you can spend the effort to bisect the issue, this would be appreciated.
>>>>>>
>>>>>> OK, then I'm going to give it a try.
>>>>>
>>>>> Done. The system freezes when this commit is applied:
>>>>>
>>>>> commit 9f0b54cd167219266bd3864570ae8f4987b57520
>>>>> Author: Heiner Kallweit <hkallweit1@gmail.com>
>>>>> Date:   Wed Jun 17 22:55:40 2020 +0200
>>>>>
>>>>>      r8169: move switching optional clock on/off to pll power functions
>>>>>    
>>>> This sounds weird. On your system tp->clk should be NULL,
>>
>> Heiner, why do you say that tp->clk should be NULL on Petr's
>> system? Because it is an x86 based system?
>>
> This was a misunderstanding on my side, I was under the assumption
> that ether_clk applies to DT-configured systems only.
> 
>> Some X86 SoCs, specifically, the more tablet oriented Bay and Cherry
>> Trail SoCs, which are much more ARM SoC like then other X86 SoCs do
>> also use the clock framework and the SoC has a number of external clk
>> pins which are typically used by audio codecs and by ethernet chips.
>>
>>>> making
>>>> clk_prepare_enable() et al no-ops. Please check whether tp->clk
>>>> is NULL after the call to rtl_get_ether_clk().
>>>
>>> This might be part of the issue. On my system tp->clk is definitely not
>>> NULL:
>>>
>>> crash> *rtl8169_private.clk 0xffff9277aca58940
>>>    clk = 0xffff9277ac2c82a0
>>>
>>> crash> *clk 0xffff9277ac2c82a0
>>> struct clk {
>>>    core = 0xffff9277aef65d00,
>>>    dev = 0xffff9277aed000b0,
>>>    dev_id = 0xffff9277aec60c00 "0000:03:00.2",
>>>    con_id = 0xffff9277ad04b080 "ether_clk",
>>>    min_rate = 0,
>>>    max_rate = 18446744073709551615,
>>>    exclusive_count = 0,
>>>    clks_node = {
>>>      next = 0xffff9277ad2428d8,
>>>      pprev = 0xffff9277aef65dc8
>>>    }
>>> }
>>>
>>> The dev_id corresponds to the Ethernet controller:
>>>
>>> 03:00.2 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL810xE PCI Express Fast Ethernet controller (rev 06)
>>>
>>> Looking at clk_find(), it matches this entry in clocks:
>>>
>>> struct clk_lookup {
>>>    node = {
>>>      next = 0xffffffffbc702f40,
>>>      prev = 0xffff9277bf7190c0
>>>    },
>>>    dev_id = 0x0,
>>>    con_id = 0xffff9277bf719524 "ether_clk",
>>>    clk = 0x0,
>>>    clk_hw = 0xffff9277ad2427f8
>>> }
>>>
>>> That's because this kernel is built with CONFIG_PMC_ATOM=y, and looking
>>> at the platform initialization code, the "ether_clk" alias is created
>>> unconditionally. Hans already added.
>>
>> Petr, unconditionally is not really correct here, just as claiming
>> above that my commit broke things was not really correct either.
>>
>> I guess this is mostly semantics, but I don't appreciate
>> the accusatory tone here.
>>
>> The code in question binds to a clk-pmc-atom platform_device which
>> gets instantiated by drivers/platform/x86/pmc_atom.c. Which in turn
>> binds to a PCI device which is only present on Bay Trail and Cherry
>> Trail SoCs.
>>
>> IOW the commit operates as advertised in its Subject:
>> "clk: x86: add "ether_clk" alias for Bay Trail / Cherry Trail"
>>
>> So with that all clarified lets try to see if we can figure out
>> *why* this is actually happening.
>>
>> Petr, can you describe your hardware in some more detail,
>> in the bits quoted when you first Cc-ed me there is not that
>> much detail. What is the vendor and model of your laptop?
>>
>> Looking closer at commit 9f0b54cd167219
>> ("r8169: move switching optional clock on/off to pll power functions")
>> I notice that the functions which now enable/disable the clock:
>> rtl_pll_power_up() and rtl_pll_power_down()
>>
>> Only run when the interface is up during suspend/resume.
>> Petr, I guess the laptop is not connected to ethernet when you
>> hibernate it?
>>
>> That means that on resume the clock will not be re-enabled.
>>
>> This is a subtle but important change and I believe that
>> this is what is breaking things. I guess that the PLL which
>> rtl_pll_power_up() / rtl_pll_power_down() controls is only
>> used for ethernet-timing.  But the external clock controlled
>> through pt->clk is a replacement for using an external
>> crystal with the r8169 chip. So with it disabled, the entire
>> chip does not have a clock and is essentially dead.
>> It can then e.g. not respond to any pci-e reads/writes done
>> to it.
>>
>> So I believe that the proper fix for this is to revert
>> commit 9f0b54cd167219
>> ("r8169: move switching optional clock on/off to pll power functions")
>>
>> As that caused the whole chip's clock to be left off after
>> a suspend/resume while the interface is down.
>>
>> Also some remarks about this while I'm being a bit grumpy about
>> all this anyways (sorry):
>>
>> 1. 9f0b54cd167219 ("r8169: move switching optional clock on/off
>> to pll power functions") commit's message does not seem to really
>> explain why this change was made...
>>
>> 2. If a git blame would have been done to find the commit adding
>> the clk support: commit c2f6f3ee7f22 ("r8169: Get and enable optional ether_clk clock")
>> then you could have known that the clk in question is an external
>> clock for the entire chip, the commit message pretty clearly states
>> this (although "the entire" part is implied only) :
>>
>> "On some boards a platform clock is used as clock for the r8169 chip,
>> this commit adds support for getting and enabling this clock (assuming
>> it has an "ether_clk" alias set on it).
>>
> Even if the missing clock would stop the network chip completely,
> this shouldn't freeze the system as described by Petr.
> In some old RTL8169S spec an external 25MHz clock is mentioned,
> what matches the MII bus frequency. Therefore I'm not 100% convinced
> that the clock is needed for basic chip operation, but due to
> Realtek not releasing datasheets I can't verify this.
> 
> But yes, if reverting this change avoids the issue on Petr's system,
> then we should do it. A simple mechanical revert wouldn't work because
> source file structure has changed since then, so I'll prepare a patch
> that effectively reverts the change.
> 
Uups, the last part was written keeping the original introduction
of handling ether_clk in mind. My later change can be simply reverted.


>> This is related to commit d31fd43c0f9a ("clk: x86: Do not gate clocks
>> enabled by the firmware") which is a previous attempt to fix this for some
>> x86 boards, but this causes all Cherry Trail SoC using boards to not reach
>> there lowest power states when suspending.
>>
>> This commit (together with an atom-pmc-clk driver commit adding the alias)
>> fixes things properly by making the r8169 get the clock and enable it when
>> it needs it."
>>
>> Regards,
>>
>> Hans
>>
> 
> Heiner
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RTL8402 stops working after hibernate/resume
  2020-09-29 20:08                             ` Hans de Goede
  2020-09-29 20:29                               ` Hans de Goede
  2020-09-29 20:35                               ` Heiner Kallweit
@ 2020-09-29 21:17                               ` Petr Tesarik
  2 siblings, 0 replies; 34+ messages in thread
From: Petr Tesarik @ 2020-09-29 21:17 UTC (permalink / raw)
  To: Hans de Goede
  Cc: Heiner Kallweit, Realtek linux nic maintainers, netdev, linux-clk

[-- Attachment #1: Type: text/plain, Size: 4854 bytes --]

Hi,

On Tue, 29 Sep 2020 22:08:56 +0200
Hans de Goede <hdegoede@redhat.com> wrote:

> Hi,
> 
> On 9/29/20 9:07 PM, Petr Tesarik wrote:
> > Hi Heiner (and now also Hans)!
> > 
> > @Hans: I'm adding you to this conversation, because you're the author
> > of commit b1e3454d39f99, which seems to break the r8169 driver on a
> > laptop of mine.  
> 
> Erm, no, as you bi-sected yourself already commit 9f0b54cd167219
> ("r8169: move switching optional clock on/off to pll power functions")
> 
> Broke your laptop, commit b1e3454d39f99 ("clk: x86: add "ether_clk" alias
> for Bay Trail / Cherry Trail") is about 18 months older.

Well, I'm no expert on power management, nor clock drivers or network
drivers. I'm an end user, so I had to accept Hans' claim that the
pointer should be NULL...

>[...]
> > That's because this kernel is built with CONFIG_PMC_ATOM=y, and looking
> > at the platform initialization code, the "ether_clk" alias is created
> > unconditionally. Hans already added.  
> 
> Petr, unconditionally is not really correct here, just as claiming
> above that my commit broke things was not really correct either.

I'm sorry if you feel offended. By "unconditionally" I mean that this
clock cannot be disabled by the user (e.g. with a kernel command line
option).

>[...]
> So with that all clarified lets try to see if we can figure out
> *why* this is actually happening.
> 
> Petr, can you describe your hardware in some more detail,
> in the bits quoted when you first Cc-ed me there is not that
> much detail. What is the vendor and model of your laptop?

Seeing that Heiner has already agreed to revert his patch, I'm not sure
this is still relevant, but here I go.

This is an ASUS X453MA notebook with a dual Intel(R) Celeron(R) CPU
N2840  @ 2.16GHz, so it's totally expected that my system uses the
Intel Atom Processor E3800 Series drivers.

> Looking closer at commit 9f0b54cd167219
> ("r8169: move switching optional clock on/off to pll power functions")
> I notice that the functions which now enable/disable the clock:
> rtl_pll_power_up() and rtl_pll_power_down()
> 
> Only run when the interface is up during suspend/resume.
> Petr, I guess the laptop is not connected to ethernet when you
> hibernate it?

The laptop is always connected to an active 1000G Ethernet switch port.

However, there seem to be two bugs related to the Realtek driver: one
causes a hard hang on driver load (that's the one we're handling here),
the other is related to incorrect Rx queue initialization after resume.
They may or may not be related.

> That means that on resume the clock will not be re-enabled.
> 
> This is a subtle but important change and I believe that
> this is what is breaking things. I guess that the PLL which
> rtl_pll_power_up() / rtl_pll_power_down() controls is only
> used for ethernet-timing.  But the external clock controlled
> through pt->clk is a replacement for using an external
> crystal with the r8169 chip. So with it disabled, the entire
> chip does not have a clock and is essentially dead.
> It can then e.g. not respond to any pci-e reads/writes done
> to it.

...which would nicely explain the hang, indeed.

Thanks,
Petr T

> So I believe that the proper fix for this is to revert
> commit 9f0b54cd167219
> ("r8169: move switching optional clock on/off to pll power functions")
> 
> As that caused the whole chip's clock to be left off after
> a suspend/resume while the interface is down.
> 
> Also some remarks about this while I'm being a bit grumpy about
> all this anyways (sorry):
> 
> 1. 9f0b54cd167219 ("r8169: move switching optional clock on/off
> to pll power functions") commit's message does not seem to really
> explain why this change was made...
> 
> 2. If a git blame would have been done to find the commit adding
> the clk support: commit c2f6f3ee7f22 ("r8169: Get and enable optional
> ether_clk clock") then you could have known that the clk in question
> is an external clock for the entire chip, the commit message pretty
> clearly states this (although "the entire" part is implied only) :
> 
> "On some boards a platform clock is used as clock for the r8169 chip,
> this commit adds support for getting and enabling this clock (assuming
> it has an "ether_clk" alias set on it).
> 
> This is related to commit d31fd43c0f9a ("clk: x86: Do not gate clocks
> enabled by the firmware") which is a previous attempt to fix this for
> some x86 boards, but this causes all Cherry Trail SoC using boards to
> not reach there lowest power states when suspending.
> 
> This commit (together with an atom-pmc-clk driver commit adding the
> alias) fixes things properly by making the r8169 get the clock and
> enable it when it needs it."
> 
> Regards,
> 
> Hans
> 


[-- Attachment #2: Digitální podpis OpenPGP --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RTL8402 stops working after hibernate/resume
  2020-09-29 20:35                               ` Heiner Kallweit
  2020-09-29 20:42                                 ` Heiner Kallweit
@ 2020-09-30  9:04                                 ` Hans de Goede
  2020-09-30 15:47                                   ` Heiner Kallweit
  1 sibling, 1 reply; 34+ messages in thread
From: Hans de Goede @ 2020-09-30  9:04 UTC (permalink / raw)
  To: Heiner Kallweit, Petr Tesarik
  Cc: Realtek linux nic maintainers, netdev, linux-clk

Hi,

On 9/29/20 10:35 PM, Heiner Kallweit wrote:
> On 29.09.2020 22:08, Hans de Goede wrote:

<snip>

>> Also some remarks about this while I'm being a bit grumpy about
>> all this anyways (sorry):
>>
>> 1. 9f0b54cd167219 ("r8169: move switching optional clock on/off
>> to pll power functions") commit's message does not seem to really
>> explain why this change was made...
>>
>> 2. If a git blame would have been done to find the commit adding
>> the clk support: commit c2f6f3ee7f22 ("r8169: Get and enable optional ether_clk clock")
>> then you could have known that the clk in question is an external
>> clock for the entire chip, the commit message pretty clearly states
>> this (although "the entire" part is implied only) :
>>
>> "On some boards a platform clock is used as clock for the r8169 chip,
>> this commit adds support for getting and enabling this clock (assuming
>> it has an "ether_clk" alias set on it).
>>
> Even if the missing clock would stop the network chip completely,
> this shouldn't freeze the system as described by Petr.
> In some old RTL8169S spec an external 25MHz clock is mentioned,
> what matches the MII bus frequency. Therefore I'm not 100% convinced
> that the clock is needed for basic chip operation, but due to
> Realtek not releasing datasheets I can't verify this.

Well if that 25 MHz is the only clock the chip has, then it basically
has to be on all the time since all clocked digital ASICs cannot work
without a clock.  Now pci-e is a packet-switched point-to-point bus,
so the ethernet chip not working should not freeze the entire system,
but I'm not really surprised that even though it should not do that,
that it still does.

> But yes, if reverting this change avoids the issue on Petr's system,
> then we should do it. A simple mechanical revert wouldn't work because
> source file structure has changed since then, so I'll prepare a patch
> that effectively reverts the change.

Great, thank you.

Regards,

Hans


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RTL8402 stops working after hibernate/resume
  2020-09-30  9:04                                 ` Hans de Goede
@ 2020-09-30 15:47                                   ` Heiner Kallweit
  2020-09-30 16:41                                     ` Petr Tesarik
  2020-09-30 18:10                                     ` Petr Tesarik
  0 siblings, 2 replies; 34+ messages in thread
From: Heiner Kallweit @ 2020-09-30 15:47 UTC (permalink / raw)
  To: Hans de Goede, Petr Tesarik
  Cc: Realtek linux nic maintainers, netdev, linux-clk

On 30.09.2020 11:04, Hans de Goede wrote:
> Hi,
> 
> On 9/29/20 10:35 PM, Heiner Kallweit wrote:
>> On 29.09.2020 22:08, Hans de Goede wrote:
> 
> <snip>
> 
>>> Also some remarks about this while I'm being a bit grumpy about
>>> all this anyways (sorry):
>>>
>>> 1. 9f0b54cd167219 ("r8169: move switching optional clock on/off
>>> to pll power functions") commit's message does not seem to really
>>> explain why this change was made...
>>>
>>> 2. If a git blame would have been done to find the commit adding
>>> the clk support: commit c2f6f3ee7f22 ("r8169: Get and enable optional ether_clk clock")
>>> then you could have known that the clk in question is an external
>>> clock for the entire chip, the commit message pretty clearly states
>>> this (although "the entire" part is implied only) :
>>>
>>> "On some boards a platform clock is used as clock for the r8169 chip,
>>> this commit adds support for getting and enabling this clock (assuming
>>> it has an "ether_clk" alias set on it).
>>>
>> Even if the missing clock would stop the network chip completely,
>> this shouldn't freeze the system as described by Petr.
>> In some old RTL8169S spec an external 25MHz clock is mentioned,
>> what matches the MII bus frequency. Therefore I'm not 100% convinced
>> that the clock is needed for basic chip operation, but due to
>> Realtek not releasing datasheets I can't verify this.
> 
> Well if that 25 MHz is the only clock the chip has, then it basically
> has to be on all the time since all clocked digital ASICs cannot work
> without a clock.  Now pci-e is a packet-switched point-to-point bus,
> so the ethernet chip not working should not freeze the entire system,
> but I'm not really surprised that even though it should not do that,
> that it still does.
> 
>> But yes, if reverting this change avoids the issue on Petr's system,
>> then we should do it. A simple mechanical revert wouldn't work because
>> source file structure has changed since then, so I'll prepare a patch
>> that effectively reverts the change.
> 
> Great, thank you.
> 
> Regards,
> 
> Hans
> 

Petr,
in the following I send two patches. First one is supposed to fix the freeze.
It also fixes another issue that existed before my ether_clk change:
ether_clk was disabled on suspend even if WoL is enabled. And the network
chip most likely needs the clock to check for WoL packets.
Please let me know whether it fixes the freeze, then I'll add your Tested-by.

Second patch is a re-send of the one I sent before, it should fix
the rx issues after resume from suspend for you.


diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index 6c7c004c2..72351c5b0 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -2236,14 +2236,10 @@ static void rtl_pll_power_down(struct rtl8169_private *tp)
 	default:
 		break;
 	}
-
-	clk_disable_unprepare(tp->clk);
 }
 
 static void rtl_pll_power_up(struct rtl8169_private *tp)
 {
-	clk_prepare_enable(tp->clk);
-
 	switch (tp->mac_version) {
 	case RTL_GIGA_MAC_VER_25 ... RTL_GIGA_MAC_VER_33:
 	case RTL_GIGA_MAC_VER_37:
@@ -4820,29 +4816,39 @@ static void rtl8169_net_suspend(struct rtl8169_private *tp)
 
 #ifdef CONFIG_PM
 
+static int rtl8169_net_resume(struct rtl8169_private *tp)
+{
+	rtl_rar_set(tp, tp->dev->dev_addr);
+
+	if (tp->TxDescArray)
+		rtl8169_up(tp);
+
+	netif_device_attach(tp->dev);
+
+	return 0;
+}
+
 static int __maybe_unused rtl8169_suspend(struct device *device)
 {
 	struct rtl8169_private *tp = dev_get_drvdata(device);
 
 	rtnl_lock();
 	rtl8169_net_suspend(tp);
+	if (!device_may_wakeup(tp_to_dev(tp)))
+		clk_disable_unprepare(tp->clk);
 	rtnl_unlock();
 
 	return 0;
 }
 
-static int rtl8169_resume(struct device *device)
+static int __maybe_unused rtl8169_resume(struct device *device)
 {
 	struct rtl8169_private *tp = dev_get_drvdata(device);
 
-	rtl_rar_set(tp, tp->dev->dev_addr);
-
-	if (tp->TxDescArray)
-		rtl8169_up(tp);
+	if (!device_may_wakeup(tp_to_dev(tp)))
+		clk_prepare_enable(tp->clk);
 
-	netif_device_attach(tp->dev);
-
-	return 0;
+	return rtl8169_net_resume(tp);
 }
 
 static int rtl8169_runtime_suspend(struct device *device)
@@ -4868,7 +4874,7 @@ static int rtl8169_runtime_resume(struct device *device)
 
 	__rtl8169_set_wol(tp, tp->saved_wolopts);
 
-	return rtl8169_resume(device);
+	return rtl8169_net_resume(tp);
 }
 
 static int rtl8169_runtime_idle(struct device *device)
-- 
2.28.0









diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index 9e4e6a883..4fb49fd0d 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -4837,6 +4837,10 @@ static int rtl8169_resume(struct device *device)
 
 	rtl_rar_set(tp, tp->dev->dev_addr);
 
+	/* Reportedly at least Asus X453MA corrupts packets otherwise */
+	if (tp->mac_version == RTL_GIGA_MAC_VER_37)
+		rtl_init_rxcfg(tp);
+
 	if (tp->TxDescArray)
 		rtl8169_up(tp);
 
-- 
2.28.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: RTL8402 stops working after hibernate/resume
  2020-09-30 15:47                                   ` Heiner Kallweit
@ 2020-09-30 16:41                                     ` Petr Tesarik
  2020-09-30 17:13                                       ` Heiner Kallweit
  2020-09-30 17:32                                       ` Petr Tesarik
  2020-09-30 18:10                                     ` Petr Tesarik
  1 sibling, 2 replies; 34+ messages in thread
From: Petr Tesarik @ 2020-09-30 16:41 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Hans de Goede, Realtek linux nic maintainers, netdev, linux-clk

[-- Attachment #1: Type: text/plain, Size: 5233 bytes --]

HI Heiner,

On Wed, 30 Sep 2020 17:47:15 +0200
Heiner Kallweit <hkallweit1@gmail.com> wrote:

> On 30.09.2020 11:04, Hans de Goede wrote:
> > Hi,
> > 
> > On 9/29/20 10:35 PM, Heiner Kallweit wrote:  
> >> On 29.09.2020 22:08, Hans de Goede wrote:  
> > 
> > <snip>
> >   
> >>> Also some remarks about this while I'm being a bit grumpy about
> >>> all this anyways (sorry):
> >>>
> >>> 1. 9f0b54cd167219 ("r8169: move switching optional clock on/off
> >>> to pll power functions") commit's message does not seem to really
> >>> explain why this change was made...
> >>>
> >>> 2. If a git blame would have been done to find the commit adding
> >>> the clk support: commit c2f6f3ee7f22 ("r8169: Get and enable optional ether_clk clock")
> >>> then you could have known that the clk in question is an external
> >>> clock for the entire chip, the commit message pretty clearly states
> >>> this (although "the entire" part is implied only) :
> >>>
> >>> "On some boards a platform clock is used as clock for the r8169 chip,
> >>> this commit adds support for getting and enabling this clock (assuming
> >>> it has an "ether_clk" alias set on it).
> >>>  
> >> Even if the missing clock would stop the network chip completely,
> >> this shouldn't freeze the system as described by Petr.
> >> In some old RTL8169S spec an external 25MHz clock is mentioned,
> >> what matches the MII bus frequency. Therefore I'm not 100% convinced
> >> that the clock is needed for basic chip operation, but due to
> >> Realtek not releasing datasheets I can't verify this.  
> > 
> > Well if that 25 MHz is the only clock the chip has, then it basically
> > has to be on all the time since all clocked digital ASICs cannot work
> > without a clock.  Now pci-e is a packet-switched point-to-point bus,
> > so the ethernet chip not working should not freeze the entire system,
> > but I'm not really surprised that even though it should not do that,
> > that it still does.
> >   
> >> But yes, if reverting this change avoids the issue on Petr's system,
> >> then we should do it. A simple mechanical revert wouldn't work because
> >> source file structure has changed since then, so I'll prepare a patch
> >> that effectively reverts the change.  
> > 
> > Great, thank you.
> > 
> > Regards,
> > 
> > Hans
> >   
> 
> Petr,
> in the following I send two patches. First one is supposed to fix the freeze.
> It also fixes another issue that existed before my ether_clk change:
> ether_clk was disabled on suspend even if WoL is enabled. And the network
> chip most likely needs the clock to check for WoL packets.

Should I also check whether WoL works? Plus, it would be probably good
if I could check whether it indeed didn't work before the change. ;-)

> Please let me know whether it fixes the freeze, then I'll add your Tested-by.

Will do.

> Second patch is a re-send of the one I sent before, it should fix
> the rx issues after resume from suspend for you.

All right, going to rebuild the kernel and see how it goes.

Stay tuned,
Petr T


> diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
> index 6c7c004c2..72351c5b0 100644
> --- a/drivers/net/ethernet/realtek/r8169_main.c
> +++ b/drivers/net/ethernet/realtek/r8169_main.c
> @@ -2236,14 +2236,10 @@ static void rtl_pll_power_down(struct rtl8169_private *tp)
>  	default:
>  		break;
>  	}
> -
> -	clk_disable_unprepare(tp->clk);
>  }
>  
>  static void rtl_pll_power_up(struct rtl8169_private *tp)
>  {
> -	clk_prepare_enable(tp->clk);
> -
>  	switch (tp->mac_version) {
>  	case RTL_GIGA_MAC_VER_25 ... RTL_GIGA_MAC_VER_33:
>  	case RTL_GIGA_MAC_VER_37:
> @@ -4820,29 +4816,39 @@ static void rtl8169_net_suspend(struct rtl8169_private *tp)
>  
>  #ifdef CONFIG_PM
>  
> +static int rtl8169_net_resume(struct rtl8169_private *tp)
> +{
> +	rtl_rar_set(tp, tp->dev->dev_addr);
> +
> +	if (tp->TxDescArray)
> +		rtl8169_up(tp);
> +
> +	netif_device_attach(tp->dev);
> +
> +	return 0;
> +}
> +
>  static int __maybe_unused rtl8169_suspend(struct device *device)
>  {
>  	struct rtl8169_private *tp = dev_get_drvdata(device);
>  
>  	rtnl_lock();
>  	rtl8169_net_suspend(tp);
> +	if (!device_may_wakeup(tp_to_dev(tp)))
> +		clk_disable_unprepare(tp->clk);
>  	rtnl_unlock();
>  
>  	return 0;
>  }
>  
> -static int rtl8169_resume(struct device *device)
> +static int __maybe_unused rtl8169_resume(struct device *device)
>  {
>  	struct rtl8169_private *tp = dev_get_drvdata(device);
>  
> -	rtl_rar_set(tp, tp->dev->dev_addr);
> -
> -	if (tp->TxDescArray)
> -		rtl8169_up(tp);
> +	if (!device_may_wakeup(tp_to_dev(tp)))
> +		clk_prepare_enable(tp->clk);
>  
> -	netif_device_attach(tp->dev);
> -
> -	return 0;
> +	return rtl8169_net_resume(tp);
>  }
>  
>  static int rtl8169_runtime_suspend(struct device *device)
> @@ -4868,7 +4874,7 @@ static int rtl8169_runtime_resume(struct device *device)
>  
>  	__rtl8169_set_wol(tp, tp->saved_wolopts);
>  
> -	return rtl8169_resume(device);
> +	return rtl8169_net_resume(tp);
>  }
>  
>  static int rtl8169_runtime_idle(struct device *device)


[-- Attachment #2: Digitální podpis OpenPGP --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RTL8402 stops working after hibernate/resume
  2020-09-30 16:41                                     ` Petr Tesarik
@ 2020-09-30 17:13                                       ` Heiner Kallweit
  2020-09-30 17:32                                       ` Petr Tesarik
  1 sibling, 0 replies; 34+ messages in thread
From: Heiner Kallweit @ 2020-09-30 17:13 UTC (permalink / raw)
  To: Petr Tesarik
  Cc: Hans de Goede, Realtek linux nic maintainers, netdev, linux-clk

On 30.09.2020 18:41, Petr Tesarik wrote:
> HI Heiner,
> 
> On Wed, 30 Sep 2020 17:47:15 +0200
> Heiner Kallweit <hkallweit1@gmail.com> wrote:
> 
>> On 30.09.2020 11:04, Hans de Goede wrote:
>>> Hi,
>>>
>>> On 9/29/20 10:35 PM, Heiner Kallweit wrote:  
>>>> On 29.09.2020 22:08, Hans de Goede wrote:  
>>>
>>> <snip>
>>>   
>>>>> Also some remarks about this while I'm being a bit grumpy about
>>>>> all this anyways (sorry):
>>>>>
>>>>> 1. 9f0b54cd167219 ("r8169: move switching optional clock on/off
>>>>> to pll power functions") commit's message does not seem to really
>>>>> explain why this change was made...
>>>>>
>>>>> 2. If a git blame would have been done to find the commit adding
>>>>> the clk support: commit c2f6f3ee7f22 ("r8169: Get and enable optional ether_clk clock")
>>>>> then you could have known that the clk in question is an external
>>>>> clock for the entire chip, the commit message pretty clearly states
>>>>> this (although "the entire" part is implied only) :
>>>>>
>>>>> "On some boards a platform clock is used as clock for the r8169 chip,
>>>>> this commit adds support for getting and enabling this clock (assuming
>>>>> it has an "ether_clk" alias set on it).
>>>>>  
>>>> Even if the missing clock would stop the network chip completely,
>>>> this shouldn't freeze the system as described by Petr.
>>>> In some old RTL8169S spec an external 25MHz clock is mentioned,
>>>> what matches the MII bus frequency. Therefore I'm not 100% convinced
>>>> that the clock is needed for basic chip operation, but due to
>>>> Realtek not releasing datasheets I can't verify this.  
>>>
>>> Well if that 25 MHz is the only clock the chip has, then it basically
>>> has to be on all the time since all clocked digital ASICs cannot work
>>> without a clock.  Now pci-e is a packet-switched point-to-point bus,
>>> so the ethernet chip not working should not freeze the entire system,
>>> but I'm not really surprised that even though it should not do that,
>>> that it still does.
>>>   
>>>> But yes, if reverting this change avoids the issue on Petr's system,
>>>> then we should do it. A simple mechanical revert wouldn't work because
>>>> source file structure has changed since then, so I'll prepare a patch
>>>> that effectively reverts the change.  
>>>
>>> Great, thank you.
>>>
>>> Regards,
>>>
>>> Hans
>>>   
>>
>> Petr,
>> in the following I send two patches. First one is supposed to fix the freeze.
>> It also fixes another issue that existed before my ether_clk change:
>> ether_clk was disabled on suspend even if WoL is enabled. And the network
>> chip most likely needs the clock to check for WoL packets.
> 
> Should I also check whether WoL works? Plus, it would be probably good
> if I could check whether it indeed didn't work before the change. ;-)
> 
This would be perfect and much appreciated!


>> Please let me know whether it fixes the freeze, then I'll add your Tested-by.
> 
> Will do.
> 
>> Second patch is a re-send of the one I sent before, it should fix
>> the rx issues after resume from suspend for you.
> 
> All right, going to rebuild the kernel and see how it goes.
> 
> Stay tuned,
> Petr T
> 
> 
>> diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
>> index 6c7c004c2..72351c5b0 100644
>> --- a/drivers/net/ethernet/realtek/r8169_main.c
>> +++ b/drivers/net/ethernet/realtek/r8169_main.c
>> @@ -2236,14 +2236,10 @@ static void rtl_pll_power_down(struct rtl8169_private *tp)
>>  	default:
>>  		break;
>>  	}
>> -
>> -	clk_disable_unprepare(tp->clk);
>>  }
>>  
>>  static void rtl_pll_power_up(struct rtl8169_private *tp)
>>  {
>> -	clk_prepare_enable(tp->clk);
>> -
>>  	switch (tp->mac_version) {
>>  	case RTL_GIGA_MAC_VER_25 ... RTL_GIGA_MAC_VER_33:
>>  	case RTL_GIGA_MAC_VER_37:
>> @@ -4820,29 +4816,39 @@ static void rtl8169_net_suspend(struct rtl8169_private *tp)
>>  
>>  #ifdef CONFIG_PM
>>  
>> +static int rtl8169_net_resume(struct rtl8169_private *tp)
>> +{
>> +	rtl_rar_set(tp, tp->dev->dev_addr);
>> +
>> +	if (tp->TxDescArray)
>> +		rtl8169_up(tp);
>> +
>> +	netif_device_attach(tp->dev);
>> +
>> +	return 0;
>> +}
>> +
>>  static int __maybe_unused rtl8169_suspend(struct device *device)
>>  {
>>  	struct rtl8169_private *tp = dev_get_drvdata(device);
>>  
>>  	rtnl_lock();
>>  	rtl8169_net_suspend(tp);
>> +	if (!device_may_wakeup(tp_to_dev(tp)))
>> +		clk_disable_unprepare(tp->clk);
>>  	rtnl_unlock();
>>  
>>  	return 0;
>>  }
>>  
>> -static int rtl8169_resume(struct device *device)
>> +static int __maybe_unused rtl8169_resume(struct device *device)
>>  {
>>  	struct rtl8169_private *tp = dev_get_drvdata(device);
>>  
>> -	rtl_rar_set(tp, tp->dev->dev_addr);
>> -
>> -	if (tp->TxDescArray)
>> -		rtl8169_up(tp);
>> +	if (!device_may_wakeup(tp_to_dev(tp)))
>> +		clk_prepare_enable(tp->clk);
>>  
>> -	netif_device_attach(tp->dev);
>> -
>> -	return 0;
>> +	return rtl8169_net_resume(tp);
>>  }
>>  
>>  static int rtl8169_runtime_suspend(struct device *device)
>> @@ -4868,7 +4874,7 @@ static int rtl8169_runtime_resume(struct device *device)
>>  
>>  	__rtl8169_set_wol(tp, tp->saved_wolopts);
>>  
>> -	return rtl8169_resume(device);
>> +	return rtl8169_net_resume(tp);
>>  }
>>  
>>  static int rtl8169_runtime_idle(struct device *device)
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RTL8402 stops working after hibernate/resume
  2020-09-30 16:41                                     ` Petr Tesarik
  2020-09-30 17:13                                       ` Heiner Kallweit
@ 2020-09-30 17:32                                       ` Petr Tesarik
  2020-09-30 18:00                                         ` Petr Tesarik
  1 sibling, 1 reply; 34+ messages in thread
From: Petr Tesarik @ 2020-09-30 17:32 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Realtek linux nic maintainers, netdev

[-- Attachment #1: Type: text/plain, Size: 4200 bytes --]

Hi Heiner,

On Wed, 30 Sep 2020 18:41:24 +0200
Petr Tesarik <ptesarik@suse.cz> wrote:

> HI Heiner,
> 
> On Wed, 30 Sep 2020 17:47:15 +0200
> Heiner Kallweit <hkallweit1@gmail.com> wrote:
>[...]
> > Petr,
> > in the following I send two patches. First one is supposed to fix the freeze.
> > It also fixes another issue that existed before my ether_clk change:
> > ether_clk was disabled on suspend even if WoL is enabled. And the network
> > chip most likely needs the clock to check for WoL packets.  
> 
> Should I also check whether WoL works? Plus, it would be probably good
> if I could check whether it indeed didn't work before the change. ;-)
> 
> > Please let me know whether it fixes the freeze, then I'll add your
> >   Tested-by.  
> 
> Will do.

Here are the results:

- WoL does not work without your patch; this was tested with 5.8.4,
  because master freezes hard on load.

- I got a kernel crash when I woke up the laptop through keyboard; I am
  not sure if it is related, although I could spend some time looking
  at the resulting crash dump if you think it's worth the time.

- After applying your first patch on top of v5.9-rc6, the driver can be
  loaded, but stops working properly on suspend/resume.

- WoL still does not work, but I'm no longer getting a kernel crash at
  least. ;-)

Tested-by: Petr Tesarik <ptesarik@suse.com>

> > Second patch is a re-send of the one I sent before, it should fix
> > the rx issues after resume from suspend for you.  
> 
> All right, going to rebuild the kernel and see how it goes.

This second patch does not fix suspend/resume for me, unfortunately. The
receive is still erratic, but the behaviour is now different: some
packets are truncated, other packets are delayed by approx. 1024 ms.
Yes, 1024 may ring a bell, but I've no idea which.

HTH,
Petr T

> Stay tuned,
> Petr T
> 
> 
> > diff --git a/drivers/net/ethernet/realtek/r8169_main.c
> >   b/drivers/net/ethernet/realtek/r8169_main.c index
> >   6c7c004c2..72351c5b0 100644 ---
> >   a/drivers/net/ethernet/realtek/r8169_main.c +++
> >   b/drivers/net/ethernet/realtek/r8169_main.c @@ -2236,14 +2236,10
> >   @@ static void rtl_pll_power_down(struct rtl8169_private *tp)
> >   default: break;
> >  	}
> > -
> > -	clk_disable_unprepare(tp->clk);
> >  }
> >  
> >  static void rtl_pll_power_up(struct rtl8169_private *tp)
> >  {
> > -	clk_prepare_enable(tp->clk);
> > -
> >  	switch (tp->mac_version) {
> >  	case RTL_GIGA_MAC_VER_25 ... RTL_GIGA_MAC_VER_33:
> >  	case RTL_GIGA_MAC_VER_37:
> > @@ -4820,29 +4816,39 @@ static void rtl8169_net_suspend(struct
> >   rtl8169_private *tp) 
> >  #ifdef CONFIG_PM
> >  
> > +static int rtl8169_net_resume(struct rtl8169_private *tp)
> > +{
> > +	rtl_rar_set(tp, tp->dev->dev_addr);
> > +
> > +	if (tp->TxDescArray)
> > +		rtl8169_up(tp);
> > +
> > +	netif_device_attach(tp->dev);
> > +
> > +	return 0;
> > +}
> > +
> >  static int __maybe_unused rtl8169_suspend(struct device *device)
> >  {
> >  	struct rtl8169_private *tp = dev_get_drvdata(device);
> >  
> >  	rtnl_lock();
> >  	rtl8169_net_suspend(tp);
> > +	if (!device_may_wakeup(tp_to_dev(tp)))
> > +		clk_disable_unprepare(tp->clk);
> >  	rtnl_unlock();
> >  
> >  	return 0;
> >  }
> >  
> > -static int rtl8169_resume(struct device *device)
> > +static int __maybe_unused rtl8169_resume(struct device *device)
> >  {
> >  	struct rtl8169_private *tp = dev_get_drvdata(device);
> >  
> > -	rtl_rar_set(tp, tp->dev->dev_addr);
> > -
> > -	if (tp->TxDescArray)
> > -		rtl8169_up(tp);
> > +	if (!device_may_wakeup(tp_to_dev(tp)))
> > +		clk_prepare_enable(tp->clk);
> >  
> > -	netif_device_attach(tp->dev);
> > -
> > -	return 0;
> > +	return rtl8169_net_resume(tp);
> >  }
> >  
> >  static int rtl8169_runtime_suspend(struct device *device)
> > @@ -4868,7 +4874,7 @@ static int rtl8169_runtime_resume(struct
> >   device *device) 
> >  	__rtl8169_set_wol(tp, tp->saved_wolopts);
> >  
> > -	return rtl8169_resume(device);
> > +	return rtl8169_net_resume(tp);
> >  }
> >  
> >  static int rtl8169_runtime_idle(struct device *device)  
> 


[-- Attachment #2: Digitální podpis OpenPGP --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RTL8402 stops working after hibernate/resume
  2020-09-30 17:32                                       ` Petr Tesarik
@ 2020-09-30 18:00                                         ` Petr Tesarik
  2020-09-30 20:11                                           ` Heiner Kallweit
  0 siblings, 1 reply; 34+ messages in thread
From: Petr Tesarik @ 2020-09-30 18:00 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Realtek linux nic maintainers, netdev

[-- Attachment #1: Type: text/plain, Size: 5106 bytes --]

Hi Heiner again,

On Wed, 30 Sep 2020 19:32:59 +0200
Petr Tesarik <ptesarik@suse.cz> wrote:

> Hi Heiner,
> 
> On Wed, 30 Sep 2020 18:41:24 +0200
> Petr Tesarik <ptesarik@suse.cz> wrote:
> 
> > HI Heiner,
> > 
> > On Wed, 30 Sep 2020 17:47:15 +0200
> > Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >[...]  
> > > Petr,
> > > in the following I send two patches. First one is supposed to fix the freeze.
> > > It also fixes another issue that existed before my ether_clk change:
> > > ether_clk was disabled on suspend even if WoL is enabled. And the network
> > > chip most likely needs the clock to check for WoL packets.    
> > 
> > Should I also check whether WoL works? Plus, it would be probably good
> > if I could check whether it indeed didn't work before the change. ;-)
> >   
> > > Please let me know whether it fixes the freeze, then I'll add your
> > >   Tested-by.    
> > 
> > Will do.  
> 
> Here are the results:
> 
> - WoL does not work without your patch; this was tested with 5.8.4,
>   because master freezes hard on load.
> 
> - I got a kernel crash when I woke up the laptop through keyboard; I am
>   not sure if it is related, although I could spend some time looking
>   at the resulting crash dump if you think it's worth the time.
> 
> - After applying your first patch on top of v5.9-rc6, the driver can be
>   loaded, but stops working properly on suspend/resume.
> 
> - WoL still does not work, but I'm no longer getting a kernel crash at
>   least. ;-)
> 
> Tested-by: Petr Tesarik <ptesarik@suse.com>
> 
> > > Second patch is a re-send of the one I sent before, it should fix
> > > the rx issues after resume from suspend for you.    
> > 
> > All right, going to rebuild the kernel and see how it goes.  
> 
> This second patch does not fix suspend/resume for me, unfortunately. The
> receive is still erratic, but the behaviour is now different: some
> packets are truncated, other packets are delayed by approx. 1024 ms.
> Yes, 1024 may ring a bell, but I've no idea which.

I'm sorry, I added some debugging message, and as I was wondering why
the resume path is not run, I found out I was not properly reloading the
modified module. So, after fixing my build, your patch also fixes the
Rx queue on resume! Both with and without NetworkManager.

So, you can also add to the second patch:

Tested-by: Petr Tesarik <ptesarik@suse.com>

WoL still does not work on my laptop, but this might be an unrelated
issue, and I can even imagine the BIOS is buggy in this regard.

Petr T

> HTH,
> Petr T
> 
> > Stay tuned,
> > Petr T
> > 
> >   
> > > diff --git a/drivers/net/ethernet/realtek/r8169_main.c
> > >   b/drivers/net/ethernet/realtek/r8169_main.c index
> > >   6c7c004c2..72351c5b0 100644 ---
> > >   a/drivers/net/ethernet/realtek/r8169_main.c +++
> > >   b/drivers/net/ethernet/realtek/r8169_main.c @@ -2236,14 +2236,10
> > >   @@ static void rtl_pll_power_down(struct rtl8169_private *tp)
> > >   default: break;
> > >  	}
> > > -
> > > -	clk_disable_unprepare(tp->clk);
> > >  }
> > >  
> > >  static void rtl_pll_power_up(struct rtl8169_private *tp)
> > >  {
> > > -	clk_prepare_enable(tp->clk);
> > > -
> > >  	switch (tp->mac_version) {
> > >  	case RTL_GIGA_MAC_VER_25 ... RTL_GIGA_MAC_VER_33:
> > >  	case RTL_GIGA_MAC_VER_37:
> > > @@ -4820,29 +4816,39 @@ static void rtl8169_net_suspend(struct
> > >   rtl8169_private *tp) 
> > >  #ifdef CONFIG_PM
> > >  
> > > +static int rtl8169_net_resume(struct rtl8169_private *tp)
> > > +{
> > > +	rtl_rar_set(tp, tp->dev->dev_addr);
> > > +
> > > +	if (tp->TxDescArray)
> > > +		rtl8169_up(tp);
> > > +
> > > +	netif_device_attach(tp->dev);
> > > +
> > > +	return 0;
> > > +}
> > > +
> > >  static int __maybe_unused rtl8169_suspend(struct device *device)
> > >  {
> > >  	struct rtl8169_private *tp = dev_get_drvdata(device);
> > >  
> > >  	rtnl_lock();
> > >  	rtl8169_net_suspend(tp);
> > > +	if (!device_may_wakeup(tp_to_dev(tp)))
> > > +		clk_disable_unprepare(tp->clk);
> > >  	rtnl_unlock();
> > >  
> > >  	return 0;
> > >  }
> > >  
> > > -static int rtl8169_resume(struct device *device)
> > > +static int __maybe_unused rtl8169_resume(struct device *device)
> > >  {
> > >  	struct rtl8169_private *tp = dev_get_drvdata(device);
> > >  
> > > -	rtl_rar_set(tp, tp->dev->dev_addr);
> > > -
> > > -	if (tp->TxDescArray)
> > > -		rtl8169_up(tp);
> > > +	if (!device_may_wakeup(tp_to_dev(tp)))
> > > +		clk_prepare_enable(tp->clk);
> > >  
> > > -	netif_device_attach(tp->dev);
> > > -
> > > -	return 0;
> > > +	return rtl8169_net_resume(tp);
> > >  }
> > >  
> > >  static int rtl8169_runtime_suspend(struct device *device)
> > > @@ -4868,7 +4874,7 @@ static int rtl8169_runtime_resume(struct
> > >   device *device) 
> > >  	__rtl8169_set_wol(tp, tp->saved_wolopts);
> > >  
> > > -	return rtl8169_resume(device);
> > > +	return rtl8169_net_resume(tp);
> > >  }
> > >  
> > >  static int rtl8169_runtime_idle(struct device *device)    
> >   
> 


[-- Attachment #2: Digitální podpis OpenPGP --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RTL8402 stops working after hibernate/resume
  2020-09-30 15:47                                   ` Heiner Kallweit
  2020-09-30 16:41                                     ` Petr Tesarik
@ 2020-09-30 18:10                                     ` Petr Tesarik
  1 sibling, 0 replies; 34+ messages in thread
From: Petr Tesarik @ 2020-09-30 18:10 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Hans de Goede, Realtek linux nic maintainers, netdev, linux-clk

[-- Attachment #1: Type: text/plain, Size: 1485 bytes --]

On Wed, 30 Sep 2020 17:47:15 +0200
Heiner Kallweit <hkallweit1@gmail.com> wrote:

>[...]
> Petr,
> in the following I send two patches. First one is supposed to fix the freeze.
> It also fixes another issue that existed before my ether_clk change:
> ether_clk was disabled on suspend even if WoL is enabled. And the network
> chip most likely needs the clock to check for WoL packets.
> Please let me know whether it fixes the freeze, then I'll add your Tested-by.
> 
> Second patch is a re-send of the one I sent before, it should fix
> the rx issues after resume from suspend for you.
> 
>[...]
> 
> diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
> index 9e4e6a883..4fb49fd0d 100644
> --- a/drivers/net/ethernet/realtek/r8169_main.c
> +++ b/drivers/net/ethernet/realtek/r8169_main.c
> @@ -4837,6 +4837,10 @@ static int rtl8169_resume(struct device *device)
>  
>  	rtl_rar_set(tp, tp->dev->dev_addr);
>  
> +	/* Reportedly at least Asus X453MA corrupts packets otherwise */

Just a nitpick: The incoming packets are not corrupted, they are truncated:

+	/* Reportedly at least Asus X453MA truncates packets otherwise */

Other than that, like I have already written in another part of the thread:

Tested-by: Petr Tesarik <ptesarik@suse.com>

> +	if (tp->mac_version == RTL_GIGA_MAC_VER_37)
> +		rtl_init_rxcfg(tp);
> +
>  	if (tp->TxDescArray)
>  		rtl8169_up(tp);
>  
> -- 
> 2.28.0
> 

[-- Attachment #2: Digitální podpis OpenPGP --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RTL8402 stops working after hibernate/resume
  2020-09-30 18:00                                         ` Petr Tesarik
@ 2020-09-30 20:11                                           ` Heiner Kallweit
  2020-09-30 21:44                                             ` Petr Tesarik
  0 siblings, 1 reply; 34+ messages in thread
From: Heiner Kallweit @ 2020-09-30 20:11 UTC (permalink / raw)
  To: Petr Tesarik; +Cc: Realtek linux nic maintainers, netdev

On 30.09.2020 20:00, Petr Tesarik wrote:
> Hi Heiner again,
> 
> On Wed, 30 Sep 2020 19:32:59 +0200
> Petr Tesarik <ptesarik@suse.cz> wrote:
> 
>> Hi Heiner,
>>
>> On Wed, 30 Sep 2020 18:41:24 +0200
>> Petr Tesarik <ptesarik@suse.cz> wrote:
>>
>>> HI Heiner,
>>>
>>> On Wed, 30 Sep 2020 17:47:15 +0200
>>> Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>> [...]  
>>>> Petr,
>>>> in the following I send two patches. First one is supposed to fix the freeze.
>>>> It also fixes another issue that existed before my ether_clk change:
>>>> ether_clk was disabled on suspend even if WoL is enabled. And the network
>>>> chip most likely needs the clock to check for WoL packets.    
>>>
>>> Should I also check whether WoL works? Plus, it would be probably good
>>> if I could check whether it indeed didn't work before the change. ;-)
>>>   
>>>> Please let me know whether it fixes the freeze, then I'll add your
>>>>   Tested-by.    
>>>
>>> Will do.  
>>
>> Here are the results:
>>
>> - WoL does not work without your patch; this was tested with 5.8.4,
>>   because master freezes hard on load.
>>
>> - I got a kernel crash when I woke up the laptop through keyboard; I am
>>   not sure if it is related, although I could spend some time looking
>>   at the resulting crash dump if you think it's worth the time.
>>

This may be caused by the following code in the resume path in 5.8.
The chip is accessed before the clock gets enabled.

rtl_rar_set(tp, tp->dev->dev_addr);
clk_prepare_enable(tp->clk);


>> - After applying your first patch on top of v5.9-rc6, the driver can be
>>   loaded, but stops working properly on suspend/resume.
>>
>> - WoL still does not work, but I'm no longer getting a kernel crash at
>>   least. ;-)
>>
>> Tested-by: Petr Tesarik <ptesarik@suse.com>
>>
>>>> Second patch is a re-send of the one I sent before, it should fix
>>>> the rx issues after resume from suspend for you.    
>>>
>>> All right, going to rebuild the kernel and see how it goes.  
>>
>> This second patch does not fix suspend/resume for me, unfortunately. The
>> receive is still erratic, but the behaviour is now different: some
>> packets are truncated, other packets are delayed by approx. 1024 ms.
>> Yes, 1024 may ring a bell, but I've no idea which.
> 
> I'm sorry, I added some debugging message, and as I was wondering why
> the resume path is not run, I found out I was not properly reloading the
> modified module. So, after fixing my build, your patch also fixes the
> Rx queue on resume! Both with and without NetworkManager.
> 
> So, you can also add to the second patch:
> 
> Tested-by: Petr Tesarik <ptesarik@suse.com>
> 
Thanks a lot for all your efforts!

> WoL still does not work on my laptop, but this might be an unrelated
> issue, and I can even imagine the BIOS is buggy in this regard.
> 
A simple further check you could do:
After sending the WoL packet (that doesn't wake the system) you wake
the system by e.g. a keystroke. Then check in /proc/interrupts for
a PCIe PME interrupt. If there's a PME interrupt, then the network
chip successfully detected the WoL packet, and it seems we have to
blame the BIOS.

> Petr T
> 
>> HTH,
>> Petr T
>>
>>> Stay tuned,
>>> Petr T
>>>
>>>   
>>>> diff --git a/drivers/net/ethernet/realtek/r8169_main.c
>>>>   b/drivers/net/ethernet/realtek/r8169_main.c index
>>>>   6c7c004c2..72351c5b0 100644 ---
>>>>   a/drivers/net/ethernet/realtek/r8169_main.c +++
>>>>   b/drivers/net/ethernet/realtek/r8169_main.c @@ -2236,14 +2236,10
>>>>   @@ static void rtl_pll_power_down(struct rtl8169_private *tp)
>>>>   default: break;
>>>>  	}
>>>> -
>>>> -	clk_disable_unprepare(tp->clk);
>>>>  }
>>>>  
>>>>  static void rtl_pll_power_up(struct rtl8169_private *tp)
>>>>  {
>>>> -	clk_prepare_enable(tp->clk);
>>>> -
>>>>  	switch (tp->mac_version) {
>>>>  	case RTL_GIGA_MAC_VER_25 ... RTL_GIGA_MAC_VER_33:
>>>>  	case RTL_GIGA_MAC_VER_37:
>>>> @@ -4820,29 +4816,39 @@ static void rtl8169_net_suspend(struct
>>>>   rtl8169_private *tp) 
>>>>  #ifdef CONFIG_PM
>>>>  
>>>> +static int rtl8169_net_resume(struct rtl8169_private *tp)
>>>> +{
>>>> +	rtl_rar_set(tp, tp->dev->dev_addr);
>>>> +
>>>> +	if (tp->TxDescArray)
>>>> +		rtl8169_up(tp);
>>>> +
>>>> +	netif_device_attach(tp->dev);
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +
>>>>  static int __maybe_unused rtl8169_suspend(struct device *device)
>>>>  {
>>>>  	struct rtl8169_private *tp = dev_get_drvdata(device);
>>>>  
>>>>  	rtnl_lock();
>>>>  	rtl8169_net_suspend(tp);
>>>> +	if (!device_may_wakeup(tp_to_dev(tp)))
>>>> +		clk_disable_unprepare(tp->clk);
>>>>  	rtnl_unlock();
>>>>  
>>>>  	return 0;
>>>>  }
>>>>  
>>>> -static int rtl8169_resume(struct device *device)
>>>> +static int __maybe_unused rtl8169_resume(struct device *device)
>>>>  {
>>>>  	struct rtl8169_private *tp = dev_get_drvdata(device);
>>>>  
>>>> -	rtl_rar_set(tp, tp->dev->dev_addr);
>>>> -
>>>> -	if (tp->TxDescArray)
>>>> -		rtl8169_up(tp);
>>>> +	if (!device_may_wakeup(tp_to_dev(tp)))
>>>> +		clk_prepare_enable(tp->clk);
>>>>  
>>>> -	netif_device_attach(tp->dev);
>>>> -
>>>> -	return 0;
>>>> +	return rtl8169_net_resume(tp);
>>>>  }
>>>>  
>>>>  static int rtl8169_runtime_suspend(struct device *device)
>>>> @@ -4868,7 +4874,7 @@ static int rtl8169_runtime_resume(struct
>>>>   device *device) 
>>>>  	__rtl8169_set_wol(tp, tp->saved_wolopts);
>>>>  
>>>> -	return rtl8169_resume(device);
>>>> +	return rtl8169_net_resume(tp);
>>>>  }
>>>>  
>>>>  static int rtl8169_runtime_idle(struct device *device)    
>>>   
>>
> 
Heiner

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RTL8402 stops working after hibernate/resume
  2020-09-30 20:11                                           ` Heiner Kallweit
@ 2020-09-30 21:44                                             ` Petr Tesarik
  2020-10-01  6:14                                               ` Heiner Kallweit
  0 siblings, 1 reply; 34+ messages in thread
From: Petr Tesarik @ 2020-09-30 21:44 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Realtek linux nic maintainers, netdev

[-- Attachment #1: Type: text/plain, Size: 1179 bytes --]

On Wed, 30 Sep 2020 22:11:02 +0200
Heiner Kallweit <hkallweit1@gmail.com> wrote:

> On 30.09.2020 20:00, Petr Tesarik wrote:
>[...]
> > WoL still does not work on my laptop, but this might be an unrelated
> > issue, and I can even imagine the BIOS is buggy in this regard.
> >   
> A simple further check you could do:
> After sending the WoL packet (that doesn't wake the system) you wake
> the system by e.g. a keystroke. Then check in /proc/interrupts for
> a PCIe PME interrupt. If there's a PME interrupt, then the network
> chip successfully detected the WoL packet, and it seems we have to
> blame the BIOS.

Well, the switch does not sense carrier on the corresponding port while
the laptop is suspended, so I'm pretty sure nothing gets delivered over
that link. No, I suspect the ACPI suspend method turns off the RTL8208
PHY chip, maybe as a side effect...

But I don't need working WoL on this system - look, this is cheap old
stuff that the previous owner considered electronic waste. I even
suspect this was because wired network never worked well after resume.
With your fix, this piece can still serve some purpose.

Thank you!

Petr T

[-- Attachment #2: Digitální podpis OpenPGP --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: RTL8402 stops working after hibernate/resume
  2020-09-30 21:44                                             ` Petr Tesarik
@ 2020-10-01  6:14                                               ` Heiner Kallweit
  0 siblings, 0 replies; 34+ messages in thread
From: Heiner Kallweit @ 2020-10-01  6:14 UTC (permalink / raw)
  To: Petr Tesarik; +Cc: Realtek linux nic maintainers, netdev

On 30.09.2020 23:44, Petr Tesarik wrote:
> On Wed, 30 Sep 2020 22:11:02 +0200
> Heiner Kallweit <hkallweit1@gmail.com> wrote:
> 
>> On 30.09.2020 20:00, Petr Tesarik wrote:
>> [...]
>>> WoL still does not work on my laptop, but this might be an unrelated
>>> issue, and I can even imagine the BIOS is buggy in this regard.
>>>   
>> A simple further check you could do:
>> After sending the WoL packet (that doesn't wake the system) you wake
>> the system by e.g. a keystroke. Then check in /proc/interrupts for
>> a PCIe PME interrupt. If there's a PME interrupt, then the network
>> chip successfully detected the WoL packet, and it seems we have to
>> blame the BIOS.
> 
> Well, the switch does not sense carrier on the corresponding port while
> the laptop is suspended, so I'm pretty sure nothing gets delivered over
> that link. No, I suspect the ACPI suspend method turns off the RTL8208
> PHY chip, maybe as a side effect...
> 
Did you configure WoL explicitly, e.g. with ethtool -s <if> wol g ?

> But I don't need working WoL on this system - look, this is cheap old
> stuff that the previous owner considered electronic waste. I even
> suspect this was because wired network never worked well after resume.
> With your fix, this piece can still serve some purpose.
> 
Great, I will submit the fixes today.

> Thank you!
> 
> Petr T
> 
Heiner

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2020-10-01  6:15 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-15  8:28 RTL8402 stops working after hibernate/resume Petr Tesarik
2020-07-15 15:22 ` Heiner Kallweit
2020-07-16  8:58   ` Petr Tesarik
2020-07-18 12:07     ` Heiner Kallweit
2020-09-03  8:41       ` Petr Tesarik
2020-09-17 14:10         ` Petr Tesarik
2020-09-23  9:57         ` Heiner Kallweit
2020-09-24 19:14           ` Petr Tesarik
2020-09-24 19:37             ` Heiner Kallweit
2020-09-24 20:12             ` Heiner Kallweit
2020-09-25  7:30               ` Petr Tesarik
2020-09-25  8:54                 ` Petr Tesarik
2020-09-25  9:44                   ` Heiner Kallweit
2020-09-25  9:52                     ` Petr Tesarik
2020-09-25 12:56                       ` Petr Tesarik
2020-09-25 14:47                         ` Heiner Kallweit
2020-09-29 19:07                           ` Petr Tesarik
2020-09-29 19:41                             ` Heiner Kallweit
2020-09-29 20:08                             ` Hans de Goede
2020-09-29 20:29                               ` Hans de Goede
2020-09-29 20:35                               ` Heiner Kallweit
2020-09-29 20:42                                 ` Heiner Kallweit
2020-09-30  9:04                                 ` Hans de Goede
2020-09-30 15:47                                   ` Heiner Kallweit
2020-09-30 16:41                                     ` Petr Tesarik
2020-09-30 17:13                                       ` Heiner Kallweit
2020-09-30 17:32                                       ` Petr Tesarik
2020-09-30 18:00                                         ` Petr Tesarik
2020-09-30 20:11                                           ` Heiner Kallweit
2020-09-30 21:44                                             ` Petr Tesarik
2020-10-01  6:14                                               ` Heiner Kallweit
2020-09-30 18:10                                     ` Petr Tesarik
2020-09-29 21:17                               ` Petr Tesarik
2020-07-15 15:25 ` Heiner Kallweit

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.