netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* net: phylink: supported modes set to 0 with genphy sfp module
@ 2020-05-11 15:45 Julien Beraud
  2020-05-11 18:29 ` Russell King - ARM Linux admin
  0 siblings, 1 reply; 6+ messages in thread
From: Julien Beraud @ 2020-05-11 15:45 UTC (permalink / raw)
  To: netdev, Russell King; +Cc: Andrew Lunn, Florian Fainelli, Heiner Kallweit

Following commit:

commit 52c956003a9d5bcae1f445f9dfd42b624adb6e87
Author: Russell King <rmk+kernel@armlinux.org.uk>
Date:   Wed Dec 11 10:56:45 2019 +0000

     net: phylink: delay MAC configuration for copper SFP modules


In function phylink_sfp_connect_phy, phylink_sfp_config is called before 
phylink_attach_phy.

In the case of a genphy, the "supported" field of the phy_device is 
filled by:
phylink_attach_phy->phy_attach_direct->phy_probe->genphy_read_abilities.

It means that:

ret = phylink_sfp_config(pl, mode, phy->supported, phy->advertising);
will have phy->supported with no bits set, and then the first call to 
phylink_validate in phylink_sfp_config will return an error:

return phylink_is_empty_linkmode(supported) ? -EINVAL : 0;

this results in putting the sfp driver in "failed" state.

Thanks,
Julien Beraud





^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: net: phylink: supported modes set to 0 with genphy sfp module
  2020-05-11 15:45 net: phylink: supported modes set to 0 with genphy sfp module Julien Beraud
@ 2020-05-11 18:29 ` Russell King - ARM Linux admin
  2020-05-11 19:06   ` Florian Fainelli
  0 siblings, 1 reply; 6+ messages in thread
From: Russell King - ARM Linux admin @ 2020-05-11 18:29 UTC (permalink / raw)
  To: Julien Beraud; +Cc: netdev, Andrew Lunn, Florian Fainelli, Heiner Kallweit

On Mon, May 11, 2020 at 05:45:02PM +0200, Julien Beraud wrote:
> Following commit:
> 
> commit 52c956003a9d5bcae1f445f9dfd42b624adb6e87
> Author: Russell King <rmk+kernel@armlinux.org.uk>
> Date:   Wed Dec 11 10:56:45 2019 +0000
> 
>     net: phylink: delay MAC configuration for copper SFP modules
> 
> 
> In function phylink_sfp_connect_phy, phylink_sfp_config is called before
> phylink_attach_phy.
> 
> In the case of a genphy, the "supported" field of the phy_device is filled
> by:
> phylink_attach_phy->phy_attach_direct->phy_probe->genphy_read_abilities.
> 
> It means that:
> 
> ret = phylink_sfp_config(pl, mode, phy->supported, phy->advertising);
> will have phy->supported with no bits set, and then the first call to
> phylink_validate in phylink_sfp_config will return an error:
> 
> return phylink_is_empty_linkmode(supported) ? -EINVAL : 0;
> 
> this results in putting the sfp driver in "failed" state.

Which PHY is this?

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 10.2Mbps down 587kbps up

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: net: phylink: supported modes set to 0 with genphy sfp module
  2020-05-11 18:29 ` Russell King - ARM Linux admin
@ 2020-05-11 19:06   ` Florian Fainelli
  2020-05-12  9:28     ` Julien Beraud
  0 siblings, 1 reply; 6+ messages in thread
From: Florian Fainelli @ 2020-05-11 19:06 UTC (permalink / raw)
  To: Russell King - ARM Linux admin, Julien Beraud
  Cc: netdev, Andrew Lunn, Heiner Kallweit



On 5/11/2020 11:29 AM, Russell King - ARM Linux admin wrote:
> On Mon, May 11, 2020 at 05:45:02PM +0200, Julien Beraud wrote:
>> Following commit:
>>
>> commit 52c956003a9d5bcae1f445f9dfd42b624adb6e87
>> Author: Russell King <rmk+kernel@armlinux.org.uk>
>> Date:   Wed Dec 11 10:56:45 2019 +0000
>>
>>     net: phylink: delay MAC configuration for copper SFP modules
>>
>>
>> In function phylink_sfp_connect_phy, phylink_sfp_config is called before
>> phylink_attach_phy.
>>
>> In the case of a genphy, the "supported" field of the phy_device is filled
>> by:
>> phylink_attach_phy->phy_attach_direct->phy_probe->genphy_read_abilities.
>>
>> It means that:
>>
>> ret = phylink_sfp_config(pl, mode, phy->supported, phy->advertising);
>> will have phy->supported with no bits set, and then the first call to
>> phylink_validate in phylink_sfp_config will return an error:
>>
>> return phylink_is_empty_linkmode(supported) ? -EINVAL : 0;
>>
>> this results in putting the sfp driver in "failed" state.
> 
> Which PHY is this?

Using the generic PHY with a copper SFP module does not sound like a
great idea because without a specialized PHY driver (that is, not the
Generic PHY driver) there is not usually much that can happen.
-- 
Florian

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: net: phylink: supported modes set to 0 with genphy sfp module
  2020-05-11 19:06   ` Florian Fainelli
@ 2020-05-12  9:28     ` Julien Beraud
  2020-05-12 12:28       ` Russell King - ARM Linux admin
  0 siblings, 1 reply; 6+ messages in thread
From: Julien Beraud @ 2020-05-12  9:28 UTC (permalink / raw)
  To: Florian Fainelli, Russell King - ARM Linux admin
  Cc: netdev, Andrew Lunn, Heiner Kallweit, Antoine Tenart



On 11/05/2020 21:06, Florian Fainelli wrote:
> 
> 
> On 5/11/2020 11:29 AM, Russell King - ARM Linux admin wrote:
>> On Mon, May 11, 2020 at 05:45:02PM +0200, Julien Beraud wrote:
>>> Following commit:
>>>
>>> commit 52c956003a9d5bcae1f445f9dfd42b624adb6e87
>>> Author: Russell King <rmk+kernel@armlinux.org.uk>
>>> Date:   Wed Dec 11 10:56:45 2019 +0000
>>>
>>>      net: phylink: delay MAC configuration for copper SFP modules
>>>
>>>
>>> In function phylink_sfp_connect_phy, phylink_sfp_config is called before
>>> phylink_attach_phy.
>>>
>>> In the case of a genphy, the "supported" field of the phy_device is filled
>>> by:
>>> phylink_attach_phy->phy_attach_direct->phy_probe->genphy_read_abilities.
>>>
>>> It means that:
>>>
>>> ret = phylink_sfp_config(pl, mode, phy->supported, phy->advertising);
>>> will have phy->supported with no bits set, and then the first call to
>>> phylink_validate in phylink_sfp_config will return an error:
>>>
>>> return phylink_is_empty_linkmode(supported) ? -EINVAL : 0;
>>>
>>> this results in putting the sfp driver in "failed" state.
>>
>> Which PHY is this?

The phy seems to be Marvell 88E1111, so the simple solution is to just add the driver for this PHY to my config.
That said, if for some reason someone plugs a module for which no phy driver is found the issue will happen again.

> 
> Using the generic PHY with a copper SFP module does not sound like a
> great idea because without a specialized PHY driver (that is, not the
> Generic PHY driver) there is not usually much that can happen.
Thanks for the info. I don't have an advice on whether it is a good idea to use a copper sfp without a specialized driver,
but before commit 52c956003a9d5bcae1f445f9dfd42b624adb6e87, it used to work and I could at least get a network connection.

Moreover, this commit didn't explicitely intend to forbid this behavior and the error is not explicit either.

If phylink+sfp still supports using genphy as a fallback, It may be good to fix the current behavior.
If not, maybe adding an explicit warning or error would be preferrable.

-------------------
A bit more details about the issue:
- The board I am using has a MAC connected to a PCS, connected to an sfp cage, so no PHY on-board.

- No driver is found for the PHY that's on the sfp module I am using.

- The MAC + PCS driver's conversion to phylink still needs to be upstreamed, and we'll send patches soon hopefully.
I can make the code available if needed, but the issue doesn't depend on the MAC+PCS driver.
It has been reproduced on mcbin by Antoine Tenart (thanks!) by just removing the driver for the phy he is using from his kernel.
It should be reproducible on any board with such a setup. The IP I am using is altera triple speed ethernet MAC + PCS.

- The issue happens when plugging an sfp module in the sfp cage which is copper and embeds a PHY for which no driver is found and falls back to genphy.

The error that happens is the following (call trace plus debug prints plus a few debug prints I have added :

[   27.607215] sfp soc:sfp1: mod-def0 1 -> 0
[   27.611235] sfp soc:sfp1: tx-fault 0 -> 1
[   27.615247] sfp soc:sfp1: SM: enter present:up:fail event remove
[   27.626668] sfp soc:sfp1: module removed
[   27.632415] sfp soc:sfp1: tx disable 0 -> 1
[   27.636618] sfp soc:sfp1: SM: exit empty:up:down
[   27.644541] sfp soc:sfp1: SM: enter empty:up:down event tx_fault
[   27.652282] sfp soc:sfp1: SM: exit empty:up:down
[   29.077218] sfp soc:sfp1: mod-def0 0 -> 1
[   29.081238] sfp soc:sfp1: tx-fault 1 -> 0
[   29.085250] sfp soc:sfp1: SM: enter empty:up:down event insert
[   29.096669] sfp soc:sfp1: SM: exit probe:up:down
[   29.109535] sfp soc:sfp1: SM: enter probe:up:down event tx_clear
[   29.116892] sfp soc:sfp1: SM: exit probe:up:down
[   29.397193] sfp soc:sfp1: SM: enter probe:up:down event timeout
[   29.415422] sfp soc:sfp1: module AVAGO            ABCU-5731ARZ rev      sn AGC14275317E     dc 140704
[   29.425204] sfp soc:sfp1: tx disable 1 -> 0
[   29.429448] sfp soc:sfp1: SM: exit present:up:wait
[   29.487209] sfp soc:sfp1: SM: enter present:up:wait event timeout
[   29.502605] sfp soc:sfp1: sfp_sm_probe_phy, support 00,00000000,00000000
[   29.509396] altera_tse c0200000.ethernet eth0: phylink_sfp_connect_phy-1, support 00,00000000,00000000
[   29.518749] altera_tse c0200000.ethernet eth0: alt_tse_validate, support 00,00000000,00000000
[   29.527311] altera_tse c0200000.ethernet eth0: supported : 00,00000000,00000000
[   29.534614] altera_tse c0200000.ethernet eth0: validation with support 00,00000000,00000000 failed: -22
[   29.544232] sfp soc:sfp1: sfp_add_phy failed: -22
[   29.548999] sfp soc:sfp1: SM: exit present:up:fail
[   29.557243] sfp soc:sfp1: los 1 -> 0
[   29.560833] sfp soc:sfp1: SM: enter present:up:fail event los_low
[   29.566918] sfp soc:sfp1: SM: exit present:up:fail
[   29.677195] sfp soc:sfp1: los 0 -> 1
[   29.680782] sfp soc:sfp1: SM: enter present:up:fail event los_high
[   29.686950] sfp soc:sfp1: SM: exit present:up:fail

And the call trace leading to the phylink_validate call that fails :
  => phylink_validate
  => phylink_sfp_config
  => phylink_sfp_connect_phy
  => sfp_add_phy
  => sfp_sm_probe_phy
  => sfp_sm_event
  => sfp_timeout
  => process_one_work
  => worker_thread
  => kthread
  => ret_from_fork
  => 0

------------------------------------------------------------
This issue disappears when I revert commit 52c956003a9d5bcae1f445f9dfd42b624adb6e87, and the network connection seems to work.

It seems that before this commit, the call to phylink_attach_phy was made before calling phylink_sfp_config.
It also seems that in case the sfp module embeds a PHY for which there no specific driver is found,
the phy_probe function which fills the phy->supported field is called following the call to phylink_attach_phy.
So the subsequent call to phylink_sfp_config is passed with phy->supported set to 0, resulting in :
"validation with support 00,00000000,00000000 failed: -22"

Antoine proposed me the following patch as a workaround and it gets back to working like before with it:
-----------------------------------------------------
diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index 0f23bec431c1..737da4d146ce 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -2002,6 +2002,8 @@ static bool phylink_phy_no_inband(struct phy_device *phy)
  
  static int phylink_sfp_connect_phy(void *upstream, struct phy_device *phy)
  {
+       unsigned long *supported, *advertising;
+       bool genphy = !phy->mdio.dev.driver;
         struct phylink *pl = upstream;
         phy_interface_t interface;
         u8 mode;
@@ -2021,8 +2023,15 @@ static int phylink_sfp_connect_phy(void *upstream, struct phy_device *phy)
         else
                 mode = MLO_AN_INBAND;
  
+       /* If a PHY driver has been probed use the PHY's reported capabilities;
+        * otherwise we're using genphy and the PHY capabilities won't be known
+        * before the PHY has been attached.
+        */
+       supported = genphy ? pl->sfp_support : phy->supported;
+       advertising = genphy ? pl->sfp_support : phy->advertising;
+
         /* Do the initial configuration */
-       ret = phylink_sfp_config(pl, mode, phy->supported, phy->advertising);
+       ret = phylink_sfp_config(pl, mode, supported, advertising);
         if (ret < 0)
                 return ret;
  
---------------------------------------------------------

I haven't spent time digging a bit more about the consequences of such a patch and neither did he afaik.
I hope this info is valuable, thanks.

Regards,
Julien

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: net: phylink: supported modes set to 0 with genphy sfp module
  2020-05-12  9:28     ` Julien Beraud
@ 2020-05-12 12:28       ` Russell King - ARM Linux admin
  2020-05-13  7:14         ` Julien Beraud
  0 siblings, 1 reply; 6+ messages in thread
From: Russell King - ARM Linux admin @ 2020-05-12 12:28 UTC (permalink / raw)
  To: Julien Beraud
  Cc: Florian Fainelli, netdev, Andrew Lunn, Heiner Kallweit, Antoine Tenart

On Tue, May 12, 2020 at 11:28:40AM +0200, Julien Beraud wrote:
> 
> 
> On 11/05/2020 21:06, Florian Fainelli wrote:
> > 
> > 
> > On 5/11/2020 11:29 AM, Russell King - ARM Linux admin wrote:
> > > On Mon, May 11, 2020 at 05:45:02PM +0200, Julien Beraud wrote:
> > > > Following commit:
> > > > 
> > > > commit 52c956003a9d5bcae1f445f9dfd42b624adb6e87
> > > > Author: Russell King <rmk+kernel@armlinux.org.uk>
> > > > Date:   Wed Dec 11 10:56:45 2019 +0000
> > > > 
> > > >      net: phylink: delay MAC configuration for copper SFP modules
> > > > 
> > > > 
> > > > In function phylink_sfp_connect_phy, phylink_sfp_config is called before
> > > > phylink_attach_phy.
> > > > 
> > > > In the case of a genphy, the "supported" field of the phy_device is filled
> > > > by:
> > > > phylink_attach_phy->phy_attach_direct->phy_probe->genphy_read_abilities.
> > > > 
> > > > It means that:
> > > > 
> > > > ret = phylink_sfp_config(pl, mode, phy->supported, phy->advertising);
> > > > will have phy->supported with no bits set, and then the first call to
> > > > phylink_validate in phylink_sfp_config will return an error:
> > > > 
> > > > return phylink_is_empty_linkmode(supported) ? -EINVAL : 0;
> > > > 
> > > > this results in putting the sfp driver in "failed" state.
> > > 
> > > Which PHY is this?
> 
> The phy seems to be Marvell 88E1111, so the simple solution is to just add the driver for this PHY to my config.
> That said, if for some reason someone plugs a module for which no phy driver is found the issue will happen again.

Right, please use the specific PHY driver for modules that contain the
88E1111 to avoid any surprises, otherwise things can end up being
misconfigured - for example, the PHY using 1000base-X and the host
using SGMII, which may work or may lead to problems.

> > Using the generic PHY with a copper SFP module does not sound like a
> > great idea because without a specialized PHY driver (that is, not the
> > Generic PHY driver) there is not usually much that can happen.
> Thanks for the info. I don't have an advice on whether it is a good idea to use a copper sfp without a specialized driver,
> but before commit 52c956003a9d5bcae1f445f9dfd42b624adb6e87, it used to work and I could at least get a network connection.
> 
> Moreover, this commit didn't explicitely intend to forbid this behavior and the error is not explicit either.
> 
> If phylink+sfp still supports using genphy as a fallback, It may be good to fix the current behavior.
> If not, maybe adding an explicit warning or error would be preferrable.

The commit is designed to increase the range of modules that can be
used, especially when the module supports higher rates than the MAC.

The downside is that we _must_ know the PHYs capabilities before
attaching to it, so that we can choose an appropriate interface to
_attach_ to it with.  It's a chicken and egg problem.

For this to work reliably in cases such as those you've identified,
I need phylib to always give me that information earlier than it
currently seems to for the genphy fallback - but the problem is the
genphy fallback only happens after attaching to it.  So again,
another chicken and egg problem.

Subsituting the SFP modules capabilities seems like a workaround,
but those are also a guess from poor information in the SFP EEPROM.
It is what we were doing before however...

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 10.2Mbps down 587kbps up

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: net: phylink: supported modes set to 0 with genphy sfp module
  2020-05-12 12:28       ` Russell King - ARM Linux admin
@ 2020-05-13  7:14         ` Julien Beraud
  0 siblings, 0 replies; 6+ messages in thread
From: Julien Beraud @ 2020-05-13  7:14 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: Florian Fainelli, netdev, Andrew Lunn, Heiner Kallweit, Antoine Tenart



On 12/05/2020 14:28, Russell King - ARM Linux admin wrote:
> On Tue, May 12, 2020 at 11:28:40AM +0200, Julien Beraud wrote:
>>
>>
>> On 11/05/2020 21:06, Florian Fainelli wrote:
>>>
>>>
>>> On 5/11/2020 11:29 AM, Russell King - ARM Linux admin wrote:
>>>> On Mon, May 11, 2020 at 05:45:02PM +0200, Julien Beraud wrote:
>>>>> Following commit:
>>>>>
>>>>> commit 52c956003a9d5bcae1f445f9dfd42b624adb6e87
>>>>> Author: Russell King <rmk+kernel@armlinux.org.uk>
>>>>> Date:   Wed Dec 11 10:56:45 2019 +0000
>>>>>
>>>>>       net: phylink: delay MAC configuration for copper SFP modules
>>>>>
>>>>>
>>>>> In function phylink_sfp_connect_phy, phylink_sfp_config is called before
>>>>> phylink_attach_phy.
>>>>>
>>>>> In the case of a genphy, the "supported" field of the phy_device is filled
>>>>> by:
>>>>> phylink_attach_phy->phy_attach_direct->phy_probe->genphy_read_abilities.
>>>>>
>>>>> It means that:
>>>>>
>>>>> ret = phylink_sfp_config(pl, mode, phy->supported, phy->advertising);
>>>>> will have phy->supported with no bits set, and then the first call to
>>>>> phylink_validate in phylink_sfp_config will return an error:
>>>>>
>>>>> return phylink_is_empty_linkmode(supported) ? -EINVAL : 0;
>>>>>
>>>>> this results in putting the sfp driver in "failed" state.
>>>>
>>>> Which PHY is this?
>>
>> The phy seems to be Marvell 88E1111, so the simple solution is to just add the driver for this PHY to my config.
>> That said, if for some reason someone plugs a module for which no phy driver is found the issue will happen again.
> 
> Right, please use the specific PHY driver for modules that contain the
> 88E1111 to avoid any surprises, otherwise things can end up being
> misconfigured - for example, the PHY using 1000base-X and the host
> using SGMII, which may work or may lead to problems.
> 
I'll do that, thanks.

>>> Using the generic PHY with a copper SFP module does not sound like a
>>> great idea because without a specialized PHY driver (that is, not the
>>> Generic PHY driver) there is not usually much that can happen.
>> Thanks for the info. I don't have an advice on whether it is a good idea to use a copper sfp without a specialized driver,
>> but before commit 52c956003a9d5bcae1f445f9dfd42b624adb6e87, it used to work and I could at least get a network connection.
>>
>> Moreover, this commit didn't explicitely intend to forbid this behavior and the error is not explicit either.
>>
>> If phylink+sfp still supports using genphy as a fallback, It may be good to fix the current behavior.
>> If not, maybe adding an explicit warning or error would be preferrable.
> 
> The commit is designed to increase the range of modules that can be
> used, especially when the module supports higher rates than the MAC.
> 
> The downside is that we _must_ know the PHYs capabilities before
> attaching to it, so that we can choose an appropriate interface to
> _attach_ to it with.  It's a chicken and egg problem.
> 
> For this to work reliably in cases such as those you've identified,
> I need phylib to always give me that information earlier than it
> currently seems to for the genphy fallback - but the problem is the
> genphy fallback only happens after attaching to it.  So again,
> another chicken and egg problem.
> 
> Subsituting the SFP modules capabilities seems like a workaround,
> but those are also a guess from poor information in the SFP EEPROM.
> It is what we were doing before however...
> 

Thank you very much for the explanations.
I'll try this patch for now.

Regards,
Julien


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-05-13  7:14 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-11 15:45 net: phylink: supported modes set to 0 with genphy sfp module Julien Beraud
2020-05-11 18:29 ` Russell King - ARM Linux admin
2020-05-11 19:06   ` Florian Fainelli
2020-05-12  9:28     ` Julien Beraud
2020-05-12 12:28       ` Russell King - ARM Linux admin
2020-05-13  7:14         ` Julien Beraud

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).