linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Regression in v4.20 with net phy soft reset changes
@ 2019-01-09 19:06 Tony Lindgren
  2019-01-09 19:28 ` Heiner Kallweit
  0 siblings, 1 reply; 7+ messages in thread
From: Tony Lindgren @ 2019-01-09 19:06 UTC (permalink / raw)
  To: Florian Fainelli, David S. Miller
  Cc: Andrew Lunn, Bartosz Golaszewski, Chris Healy, Clemens Gruber,
	Grygorii Strashko, Ivan Khoronzhuk, Keerthy, Murali Karicheri,
	Rex Chang, Sekhar Nori, Tero Kristo, WingMan Kwok, netdev,
	linux-kernel, linux-arm-kernel, linux-omap

Hi all,

Commit 6e2d85ec0559 ("net: phy: Stop with excessive soft reset") caused
a regression where suspend resume cycle fails to bring up Ethernet on at
least cpsw on am437x-sk-evm.

Keerthy noticed this may not happen on the first resume, but usually
happens after few suspend resume cycles. The most working suspend resume
cycles I've seen with the commit above is three.

Any ideas what might be going wrong?

Note that unrelated to the commit above, there may be other issues too
as the cpsw phy LED seems to come on only after about five seconds with
about total of 10 seconds before the Ethernet is up again.

Regards,

Tony

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Regression in v4.20 with net phy soft reset changes
  2019-01-09 19:06 Regression in v4.20 with net phy soft reset changes Tony Lindgren
@ 2019-01-09 19:28 ` Heiner Kallweit
  2019-01-09 21:36   ` Tony Lindgren
  0 siblings, 1 reply; 7+ messages in thread
From: Heiner Kallweit @ 2019-01-09 19:28 UTC (permalink / raw)
  To: Tony Lindgren, Florian Fainelli, David S. Miller
  Cc: Andrew Lunn, Bartosz Golaszewski, Chris Healy, Clemens Gruber,
	Grygorii Strashko, Ivan Khoronzhuk, Keerthy, Murali Karicheri,
	Rex Chang, Sekhar Nori, Tero Kristo, WingMan Kwok, netdev,
	linux-kernel, linux-arm-kernel, linux-omap

On 09.01.2019 20:06, Tony Lindgren wrote:
> Hi all,
> 
> Commit 6e2d85ec0559 ("net: phy: Stop with excessive soft reset") caused
> a regression where suspend resume cycle fails to bring up Ethernet on at
> least cpsw on am437x-sk-evm.
> 
What kind of PHY and which PHY driver is used with this board?
I found one schematics of am437x where a KSZ9031RN PHY is used.
Is it the same on your board?

As described in the commit message of this commit you would have
the option to implement the soft_reset callback in the PHY driver.
Can you try to add .soft_reset = genphy_soft_reset to the
KSZ9031 driver config in drivers/net/phy/micrel.c and check whether
it fixes the issue?

> Keerthy noticed this may not happen on the first resume, but usually
> happens after few suspend resume cycles. The most working suspend resume
> cycles I've seen with the commit above is three.
> 
> Any ideas what might be going wrong?
> 
> Note that unrelated to the commit above, there may be other issues too
> as the cpsw phy LED seems to come on only after about five seconds with
> about total of 10 seconds before the Ethernet is up again.
> 
> Regards,
> 
> Tony
> 
Heiner

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Regression in v4.20 with net phy soft reset changes
  2019-01-09 19:28 ` Heiner Kallweit
@ 2019-01-09 21:36   ` Tony Lindgren
  2019-01-09 21:54     ` Heiner Kallweit
                       ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Tony Lindgren @ 2019-01-09 21:36 UTC (permalink / raw)
  To: Heiner Kallweit, Sekhar Nori
  Cc: Florian Fainelli, David S. Miller, Andrew Lunn,
	Bartosz Golaszewski, Chris Healy, Clemens Gruber,
	Grygorii Strashko, Ivan Khoronzhuk, Keerthy, Murali Karicheri,
	Rex Chang, Tero Kristo, WingMan Kwok, netdev, linux-kernel,
	linux-arm-kernel, linux-omap

Hi,

* Heiner Kallweit <hkallweit1@gmail.com> [190109 19:28]:
> On 09.01.2019 20:06, Tony Lindgren wrote:
> > Commit 6e2d85ec0559 ("net: phy: Stop with excessive soft reset") caused
> > a regression where suspend resume cycle fails to bring up Ethernet on at
> > least cpsw on am437x-sk-evm.
> > 
> What kind of PHY and which PHY driver is used with this board?
> I found one schematics of am437x where a KSZ9031RN PHY is used.
> Is it the same on your board?

Yes that's the phy.

> As described in the commit message of this commit you would have
> the option to implement the soft_reset callback in the PHY driver.
> Can you try to add .soft_reset = genphy_soft_reset to the
> KSZ9031 driver config in drivers/net/phy/micrel.c and check whether
> it fixes the issue?

Yes that seems to work based on a quick test of five suspend
resume cycles.

I wonder what all hardware this issue affects though?

It's probably best that the network folks check what all
hardare needs patching.

For TI hardware, Sekhar and TI network folks, can you guys
please check the various TI SoCs for multiple suspend resume
cycles with v5.0-rc1 and patch accordingly? See also below
for something else to check, 10 seconds to resume a phy
seems very long to me :)

> > Keerthy noticed this may not happen on the first resume, but usually
> > happens after few suspend resume cycles. The most working suspend resume
> > cycles I've seen with the commit above is three.
...
> > Note that unrelated to the commit above, there may be other issues too
> > as the cpsw phy LED seems to come on only after about five seconds with
> > about total of 10 seconds before the Ethernet is up again.

Regards,

Tony

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Regression in v4.20 with net phy soft reset changes
  2019-01-09 21:36   ` Tony Lindgren
@ 2019-01-09 21:54     ` Heiner Kallweit
       [not found]     ` <5bdce91b-cd56-75eb-c6d2-5542869b716e@ti.com>
  2019-01-10 11:52     ` Sekhar Nori
  2 siblings, 0 replies; 7+ messages in thread
From: Heiner Kallweit @ 2019-01-09 21:54 UTC (permalink / raw)
  To: Tony Lindgren, Sekhar Nori
  Cc: Florian Fainelli, David S. Miller, Andrew Lunn,
	Bartosz Golaszewski, Chris Healy, Clemens Gruber,
	Grygorii Strashko, Ivan Khoronzhuk, Keerthy, Murali Karicheri,
	Rex Chang, Tero Kristo, WingMan Kwok, netdev, linux-kernel,
	linux-arm-kernel, linux-omap

On 09.01.2019 22:36, Tony Lindgren wrote:
> Hi,
> 
> * Heiner Kallweit <hkallweit1@gmail.com> [190109 19:28]:
>> On 09.01.2019 20:06, Tony Lindgren wrote:
>>> Commit 6e2d85ec0559 ("net: phy: Stop with excessive soft reset") caused
>>> a regression where suspend resume cycle fails to bring up Ethernet on at
>>> least cpsw on am437x-sk-evm.
>>>
>> What kind of PHY and which PHY driver is used with this board?
>> I found one schematics of am437x where a KSZ9031RN PHY is used.
>> Is it the same on your board?
> 
> Yes that's the phy.
> 
>> As described in the commit message of this commit you would have
>> the option to implement the soft_reset callback in the PHY driver.
>> Can you try to add .soft_reset = genphy_soft_reset to the
>> KSZ9031 driver config in drivers/net/phy/micrel.c and check whether
>> it fixes the issue?
> 
> Yes that seems to work based on a quick test of five suspend
> resume cycles.
> 
> I wonder what all hardware this issue affects though?
> 
As one of few vendors Microchip publishes errata documentation
like this one for KSZ9031RNX:
http://ww1.microchip.com/downloads/en/DeviceDoc/80000692D.pdf
I wonder whether this is applicable for the PHY in our case and
whether the need for an extra soft reset is caused by one of
the mentioned issues.

> It's probably best that the network folks check what all
> hardare needs patching.
> 
> For TI hardware, Sekhar and TI network folks, can you guys
> please check the various TI SoCs for multiple suspend resume
> cycles with v5.0-rc1 and patch accordingly? See also below
> for something else to check, 10 seconds to resume a phy
> seems very long to me :)
> 
>>> Keerthy noticed this may not happen on the first resume, but usually
>>> happens after few suspend resume cycles. The most working suspend resume
>>> cycles I've seen with the commit above is three.
> ...
>>> Note that unrelated to the commit above, there may be other issues too
>>> as the cpsw phy LED seems to come on only after about five seconds with
>>> about total of 10 seconds before the Ethernet is up again.
> 
> Regards,
> 
> Tony
> 
Heiner

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Regression in v4.20 with net phy soft reset changes
       [not found]     ` <5bdce91b-cd56-75eb-c6d2-5542869b716e@ti.com>
@ 2019-01-10  5:50       ` Keerthy
  0 siblings, 0 replies; 7+ messages in thread
From: Keerthy @ 2019-01-10  5:50 UTC (permalink / raw)
  To: Tony Lindgren, Heiner Kallweit, Sekhar Nori
  Cc: Florian Fainelli, David S. Miller, Andrew Lunn,
	Bartosz Golaszewski, Chris Healy, Clemens Gruber,
	Ivan Khoronzhuk, Murali Karicheri, Rex Chang, Tero Kristo,
	WingMan Kwok, netdev, linux-kernel, linux-arm-kernel, linux-omap



On Thursday 10 January 2019 10:36 AM, Keerthy wrote:
> 
> 
> On Thursday 10 January 2019 03:06 AM, Tony Lindgren wrote:
>> Hi,
>>
>> * Heiner Kallweit <hkallweit1@gmail.com> [190109 19:28]:
>>> On 09.01.2019 20:06, Tony Lindgren wrote:
>>>> Commit 6e2d85ec0559 ("net: phy: Stop with excessive soft reset") caused
>>>> a regression where suspend resume cycle fails to bring up Ethernet on at
>>>> least cpsw on am437x-sk-evm.
>>>>
>>> What kind of PHY and which PHY driver is used with this board?
>>> I found one schematics of am437x where a KSZ9031RN PHY is used.
>>> Is it the same on your board?
>>
>> Yes that's the phy.
>>
>>> As described in the commit message of this commit you would have
>>> the option to implement the soft_reset callback in the PHY driver.
>>> Can you try to add .soft_reset = genphy_soft_reset to the
>>> KSZ9031 driver config in drivers/net/phy/micrel.c and check whether
>>> it fixes the issue?
>>
>> Yes that seems to work based on a quick test of five suspend
>> resume cycles.
> 

Removing Grygorii as the e-mail no longer exists

To add to Tony's observations. I ran a 100 time suspend/resume adding the
soft_reset hook to the phy_driver structs in drivers/net/phy/micrel.c and
suspend/resume (DS0) works nicely.

> 
>>
>> I wonder what all hardware this issue affects though?
>>
>> It's probably best that the network folks check what all
>> hardare needs patching.
>>
>> For TI hardware, Sekhar and TI network folks, can you guys
>> please check the various TI SoCs for multiple suspend resume
>> cycles with v5.0-rc1 and patch accordingly? See also below
>> for something else to check, 10 seconds to resume a phy
>> seems very long to me :)
>>
>>>> Keerthy noticed this may not happen on the first resume, but usually
>>>> happens after few suspend resume cycles. The most working suspend resume
>>>> cycles I've seen with the commit above is three.
>> ...
>>>> Note that unrelated to the commit above, there may be other issues too
>>>> as the cpsw phy LED seems to come on only after about five seconds with
>>>> about total of 10 seconds before the Ethernet is up again.
>>
>> Regards,
>>
>> Tony
>>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Regression in v4.20 with net phy soft reset changes
  2019-01-09 21:36   ` Tony Lindgren
  2019-01-09 21:54     ` Heiner Kallweit
       [not found]     ` <5bdce91b-cd56-75eb-c6d2-5542869b716e@ti.com>
@ 2019-01-10 11:52     ` Sekhar Nori
  2019-01-10 16:09       ` Tony Lindgren
  2 siblings, 1 reply; 7+ messages in thread
From: Sekhar Nori @ 2019-01-10 11:52 UTC (permalink / raw)
  To: Tony Lindgren, Heiner Kallweit
  Cc: Florian Fainelli, David S. Miller, Andrew Lunn,
	Bartosz Golaszewski, Chris Healy, Clemens Gruber,
	Ivan Khoronzhuk, Keerthy, Murali Karicheri, Rex Chang,
	Tero Kristo, WingMan Kwok, netdev, linux-kernel,
	linux-arm-kernel, linux-omap

Hi Tony,

On 10/01/19 3:06 AM, Tony Lindgren wrote:
> Hi,
> 
> * Heiner Kallweit <hkallweit1@gmail.com> [190109 19:28]:
>> On 09.01.2019 20:06, Tony Lindgren wrote:
>>> Commit 6e2d85ec0559 ("net: phy: Stop with excessive soft reset") caused
>>> a regression where suspend resume cycle fails to bring up Ethernet on at
>>> least cpsw on am437x-sk-evm.
>>>
>> What kind of PHY and which PHY driver is used with this board?
>> I found one schematics of am437x where a KSZ9031RN PHY is used.
>> Is it the same on your board?
> 
> Yes that's the phy.
> 
>> As described in the commit message of this commit you would have
>> the option to implement the soft_reset callback in the PHY driver.
>> Can you try to add .soft_reset = genphy_soft_reset to the
>> KSZ9031 driver config in drivers/net/phy/micrel.c and check whether
>> it fixes the issue?
> 
> Yes that seems to work based on a quick test of five suspend
> resume cycles.
> 
> I wonder what all hardware this issue affects though?
> 
> It's probably best that the network folks check what all
> hardare needs patching.
> 
> For TI hardware, Sekhar and TI network folks, can you guys
> please check the various TI SoCs for multiple suspend resume
> cycles with v5.0-rc1 and patch accordingly? See also below

Will do.

> for something else to check, 10 seconds to resume a phy
> seems very long to me :)

On the AM437x GP EVM which uses the same PHY, the link does not even
come up for me after a cable plug unplug. Link is up at boot (I use
NFS). This only happens with v4.20 and v5.0-rc1, not with v4.19.

Adding the genphy_soft_reset hook solves the issue and link comes back
up almost immediately. I checked this with v5.0-rc1.

I don't see the link problem if I shift to 100Mps prior to the
plug/unplug experiment using ethtool. So looks like the problem is
restricted to Gigabit link only. Are you using Gigabit link too?

I think we should patch drivers/net/phy/micrel.c to solve the
regression. Not sure of the root cause though. In the errata pointed to
by Heiner, there is "Module 6" which comes close to what we are seeing,
except it talks of a scenario where auto-negotiation is turned off 100M
link is used and we see the issue even with auto-neg on and in gigabit
mode. "Module 5" is also related to link failure, but is already worked
around in kernel with ksz9031_center_flp_timing().

> 
>>> Keerthy noticed this may not happen on the first resume, but usually
>>> happens after few suspend resume cycles. The most working suspend resume
>>> cycles I've seen with the commit above is three.
> ...
>>> Note that unrelated to the commit above, there may be other issues too
>>> as the cpsw phy LED seems to come on only after about five seconds with
>>> about total of 10 seconds before the Ethernet is up again.

I don't quite see this problem on the AM437x GP EVM. I have seen gigabit
link takes quite some time (sometimes more than 10 seconds) on x15 EVM.
Not sure if the problem you are facing is related to gigabit too. If you
are using gigabit link, can you downgrade to 100MBps to check? Either
using a 100M only switch or by using ethtool on the EVM.

$ ethtool -s eth0 speed 100 duplex full

Thanks,
Sekhar

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Regression in v4.20 with net phy soft reset changes
  2019-01-10 11:52     ` Sekhar Nori
@ 2019-01-10 16:09       ` Tony Lindgren
  0 siblings, 0 replies; 7+ messages in thread
From: Tony Lindgren @ 2019-01-10 16:09 UTC (permalink / raw)
  To: Sekhar Nori
  Cc: Heiner Kallweit, Florian Fainelli, David S. Miller, Andrew Lunn,
	Bartosz Golaszewski, Chris Healy, Clemens Gruber,
	Ivan Khoronzhuk, Keerthy, Murali Karicheri, Rex Chang,
	Tero Kristo, WingMan Kwok, netdev, linux-kernel,
	linux-arm-kernel, linux-omap

* Sekhar Nori <nsekhar@ti.com> [190110 11:52]:
> On 10/01/19 3:06 AM, Tony Lindgren wrote:
> > For TI hardware, Sekhar and TI network folks, can you guys
> > please check the various TI SoCs for multiple suspend resume
> > cycles with v5.0-rc1 and patch accordingly? See also below
> 
> Will do.

OK thanks!

> I don't see the link problem if I shift to 100Mps prior to the
> plug/unplug experiment using ethtool. So looks like the problem is
> restricted to Gigabit link only. Are you using Gigabit link too?

Yes this is a Gigabit link.

> I think we should patch drivers/net/phy/micrel.c to solve the
> regression. Not sure of the root cause though. In the errata pointed to
> by Heiner, there is "Module 6" which comes close to what we are seeing,
> except it talks of a scenario where auto-negotiation is turned off 100M
> link is used and we see the issue even with auto-neg on and in gigabit
> mode. "Module 5" is also related to link failure, but is already worked
> around in kernel with ksz9031_center_flp_timing().
>
> >>> Keerthy noticed this may not happen on the first resume, but usually
> >>> happens after few suspend resume cycles. The most working suspend resume
> >>> cycles I've seen with the commit above is three.
> > ...
> >>> Note that unrelated to the commit above, there may be other issues too
> >>> as the cpsw phy LED seems to come on only after about five seconds with
> >>> about total of 10 seconds before the Ethernet is up again.
> 
> I don't quite see this problem on the AM437x GP EVM. I have seen gigabit
> link takes quite some time (sometimes more than 10 seconds) on x15 EVM.
> Not sure if the problem you are facing is related to gigabit too. If you
> are using gigabit link, can you downgrade to 100MBps to check? Either
> using a 100M only switch or by using ethtool on the EVM.
> 
> $ ethtool -s eth0 speed 100 duplex full

Yes this makes the 10 second resume latency go away on am437x-sk-evm
here.

Regards,

Tony

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-01-10 16:09 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-09 19:06 Regression in v4.20 with net phy soft reset changes Tony Lindgren
2019-01-09 19:28 ` Heiner Kallweit
2019-01-09 21:36   ` Tony Lindgren
2019-01-09 21:54     ` Heiner Kallweit
     [not found]     ` <5bdce91b-cd56-75eb-c6d2-5542869b716e@ti.com>
2019-01-10  5:50       ` Keerthy
2019-01-10 11:52     ` Sekhar Nori
2019-01-10 16:09       ` Tony Lindgren

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).