All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] net: phy: return EPROBE_DEFER if PHY is not accessible
@ 2023-03-17 12:16 arturo.buzarra
  2023-03-17 17:05 ` Florian Fainelli
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: arturo.buzarra @ 2023-03-17 12:16 UTC (permalink / raw)
  To: netdev

From: Arturo Buzarra <arturo.buzarra@digi.com>

A PHY driver can dynamically determine the devices features, but in some
circunstances, the PHY is not yet ready and the read capabilities does not fail
but returns an undefined value, so incorrect capabilities are assumed and the
initialization process fails. This commit postpones the PHY probe to ensure the
PHY is accessible.

Signed-off-by: Arturo Buzarra <arturo.buzarra@digi.com>
---
 drivers/net/phy/phy_device.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 1785f1cead97..f8c31e741936 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -2628,10 +2628,14 @@ int genphy_read_abilities(struct phy_device *phydev)
 			       phydev->supported);
 
 	val = phy_read(phydev, MII_BMSR);
 	if (val < 0)
 		return val;
+	if (val == 0x0000 || val == 0xffff) {
+		phydev_err(phydev, "PHY is not accessible\n");
+		return -EPROBE_DEFER;
+	}
 
 	linkmode_mod_bit(ETHTOOL_LINK_MODE_Autoneg_BIT, phydev->supported,
 			 val & BMSR_ANEGCAPABLE);
 
 	linkmode_mod_bit(ETHTOOL_LINK_MODE_100baseT_Full_BIT, phydev->supported,

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] net: phy: return EPROBE_DEFER if PHY is not accessible
  2023-03-17 12:16 [PATCH] net: phy: return EPROBE_DEFER if PHY is not accessible arturo.buzarra
@ 2023-03-17 17:05 ` Florian Fainelli
  2023-03-17 17:24 ` Andrew Lunn
  2023-03-17 18:21 ` Heiner Kallweit
  2 siblings, 0 replies; 12+ messages in thread
From: Florian Fainelli @ 2023-03-17 17:05 UTC (permalink / raw)
  To: arturo.buzarra, netdev, Andrew Lunn, Heiner Kallweit,
	Vladimir Oltean, Russell King

+Andrew, Heiner, Vladimir and Russell,

(please always copy the maintainers of the piece of code you are modifying).

On 3/17/23 05:16, arturo.buzarra@digi.com wrote:
> From: Arturo Buzarra <arturo.buzarra@digi.com>
> 
> A PHY driver can dynamically determine the devices features, but in some
> circunstances, the PHY is not yet ready and the read capabilities does not fail
> but returns an undefined value, so incorrect capabilities are assumed and the
> initialization process fails. This commit postpones the PHY probe to ensure the
> PHY is accessible.

We need more specifics here, what type of PHY device are you seeing this 
with? Keying off all 0s or all 1s is problematic because while it could 
signal that the PHY is not ready in your particular case, it could also 
mean that you have an electrical issue whereby the MDIO data line is 
pulled down, respectively high too hard. In that case, we would rather 
error out earlier than later, because no amount of probe deferral will 
solve that.

If your PHY requires "some time" before it can be ready you have a 
number of ways to achieve that:

- implement phy_driver::probe which may load firmware, initialize 
internal state, etc.

- implement a phy_driver::get_features

Using deferred reads until MII_BMSR contains what you want is unlikely 
to be the recommended way by your vendor to ensure the PHY is ready. 
There has got to be some sort of vendor specific register that can be 
polled to indicate readiness.


> 
> Signed-off-by: Arturo Buzarra <arturo.buzarra@digi.com>
> ---
>   drivers/net/phy/phy_device.c | 4 ++++
>   1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> index 1785f1cead97..f8c31e741936 100644
> --- a/drivers/net/phy/phy_device.c
> +++ b/drivers/net/phy/phy_device.c
> @@ -2628,10 +2628,14 @@ int genphy_read_abilities(struct phy_device *phydev)
>   			       phydev->supported);
>   
>   	val = phy_read(phydev, MII_BMSR);
>   	if (val < 0)
>   		return val;
> +	if (val == 0x0000 || val == 0xffff) {
> +		phydev_err(phydev, "PHY is not accessible\n");
> +		return -EPROBE_DEFER;
> +	}



>   
>   	linkmode_mod_bit(ETHTOOL_LINK_MODE_Autoneg_BIT, phydev->supported,
>   			 val & BMSR_ANEGCAPABLE);
>   
>   	linkmode_mod_bit(ETHTOOL_LINK_MODE_100baseT_Full_BIT, phydev->supported,

-- 
Florian


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] net: phy: return EPROBE_DEFER if PHY is not accessible
  2023-03-17 12:16 [PATCH] net: phy: return EPROBE_DEFER if PHY is not accessible arturo.buzarra
  2023-03-17 17:05 ` Florian Fainelli
@ 2023-03-17 17:24 ` Andrew Lunn
  2023-03-17 18:21 ` Heiner Kallweit
  2 siblings, 0 replies; 12+ messages in thread
From: Andrew Lunn @ 2023-03-17 17:24 UTC (permalink / raw)
  To: arturo.buzarra; +Cc: netdev, Florian Fainelli, Heiner Kallweit

On Fri, Mar 17, 2023 at 01:16:46PM +0100, arturo.buzarra@digi.com wrote:
> From: Arturo Buzarra <arturo.buzarra@digi.com>
> 
> A PHY driver can dynamically determine the devices features, but in some
> circunstances, the PHY is not yet ready and the read capabilities does not fail
> but returns an undefined value, so incorrect capabilities are assumed and the
> initialization process fails. This commit postpones the PHY probe to ensure the
> PHY is accessible.

In additional to Florians comments i have a few questions.

Are you resetting the device via a GPIO? Turning a regulator off/one
etc?

Is there anything in the data sheet about how long you need to wait
before accessing the device after power on, HW reset, or software
reset etc?

Does the device reliably enumerate on the bus, i.e. reading registers
2 and 3 to get its ID?

If the PHY is broken, by some yet to be determined definition of
broken, we try to limit the workaround to as narrow as possible. So it
should not be in the core probe code. It should be in the PHY specific
driver, and ideally for only its ID, not the whole vendors family of
PHYs.

  Andrew

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] net: phy: return EPROBE_DEFER if PHY is not accessible
  2023-03-17 12:16 [PATCH] net: phy: return EPROBE_DEFER if PHY is not accessible arturo.buzarra
  2023-03-17 17:05 ` Florian Fainelli
  2023-03-17 17:24 ` Andrew Lunn
@ 2023-03-17 18:21 ` Heiner Kallweit
  2023-03-20  9:45   ` Buzarra, Arturo
  2 siblings, 1 reply; 12+ messages in thread
From: Heiner Kallweit @ 2023-03-17 18:21 UTC (permalink / raw)
  To: arturo.buzarra
  Cc: netdev, Florian Fainelli, Andrew Lunn, Russell King - ARM Linux

On 17.03.2023 13:16, arturo.buzarra@digi.com wrote:
> From: Arturo Buzarra <arturo.buzarra@digi.com>
> 
> A PHY driver can dynamically determine the devices features, but in some
> circunstances, the PHY is not yet ready and the read capabilities does not fail
> but returns an undefined value, so incorrect capabilities are assumed and the
> initialization process fails. This commit postpones the PHY probe to ensure the
> PHY is accessible.
> 
To complement what has been said by Florian and Andrew:

"under some circumstances" is too vague in general. List potential such
circumstances and best what happened exactly in your case.

When genphy_read_abilities() is called the PHY has been accessed already,
reading the PHY ID. 

So best start with some details about your use case, which MAC, which PHY, etc.

> Signed-off-by: Arturo Buzarra <arturo.buzarra@digi.com>
> ---
>  drivers/net/phy/phy_device.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> index 1785f1cead97..f8c31e741936 100644
> --- a/drivers/net/phy/phy_device.c
> +++ b/drivers/net/phy/phy_device.c
> @@ -2628,10 +2628,14 @@ int genphy_read_abilities(struct phy_device *phydev)
>  			       phydev->supported);
>  
>  	val = phy_read(phydev, MII_BMSR);
>  	if (val < 0)
>  		return val;
> +	if (val == 0x0000 || val == 0xffff) {
> +		phydev_err(phydev, "PHY is not accessible\n");
> +		return -EPROBE_DEFER;
> +	}
>  
>  	linkmode_mod_bit(ETHTOOL_LINK_MODE_Autoneg_BIT, phydev->supported,
>  			 val & BMSR_ANEGCAPABLE);
>  
>  	linkmode_mod_bit(ETHTOOL_LINK_MODE_100baseT_Full_BIT, phydev->supported,


^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] net: phy: return EPROBE_DEFER if PHY is not accessible
  2023-03-17 18:21 ` Heiner Kallweit
@ 2023-03-20  9:45   ` Buzarra, Arturo
  2023-03-20 10:12     ` Russell King (Oracle)
  2023-03-20 12:00     ` Andrew Lunn
  0 siblings, 2 replies; 12+ messages in thread
From: Buzarra, Arturo @ 2023-03-20  9:45 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: netdev, Florian Fainelli, Andrew Lunn, Russell King - ARM Linux

Hi,

I will try to answer all your questions:

- "We need more specifics here, what type of PHY device are you seeing this with?"
- " So best start with some details about your use case, which MAC, which PHY, etc"
I'm using a LAN8720A PHY (10/100 on RMII mode) with a ST MAC ( in particular is a stm32mp1 processor).
We have two PHYs one is a Gigabit PHY (RGMII mode) and the another one is a 10/100 (RMII mode).
In the boot process, I think that there is a race condition between configuring the Ethernet MACs and the two PHYs. At same point the RGMII Ethernet MAC is configured and starts the PHY probes.
When the 10/100 PHY starts the probe, it tries to read the genphy_read_abilities() and always reads 0xFFFF ( I assume that this is the default electrical values for that lines before it are configured).
At that point, the PHY initialization assumes like a valid value 0xFFFF and obviously it reports capabilities that the LAN8720A PHY does not have, like for example gigabit support.
A few seconds later, the Ethernet MAC for RMII mode is probed, but I think that it is too late ( we are working with the manufacturer to investigate it)

- " If your PHY requires "some time" before it can be ready you have a number of ways to achieve that"
I reviewed the PHY datasheet and we measure the power-on lines (power, clock, reset, etc.) and we are compliant with the power-on sequence.
Also, I implement a retry system to read several times the same MII_BMSR register, even with a long delays but never reads the right value, so I assume that it is not an issue in the PHY initialization.

- " Are you resetting the device via a GPIO? Turning a regulator off/one etc?"
As I said before, we check the power-on sequence and we match with the datasheet, for our test purposes we tried to perform a hardware reset (using a gpio for the reset line) but nothing helps here.

- " Does the device reliably enumerate on the bus, i.e. reading registers 2 and 3 to get its ID?"
Reading the registers PHYSID1 and PHYSID2 reports also 0xFFFF

- " If the PHY is broken, by some yet to be determined definition of broken, we try to limit the workaround to as narrow as possible. So it should not be in the core probe code. It should be in the PHY specific driver, and ideally for only its ID, not the whole vendors family of PHYs"
I have several workarounds/fixed for that, the easy way is set the PHY capabilities in the smsc.c driver " .features	= PHY_BASIC_FEATURES" like in the past and it works fine. Also I have another fix in the same driver adding a custom function for " get_features" where if I read 0xFFFF or 0x0000 return a EPROBE_DEFER. However after review another drivers (net/usb/pegasus.c , net/Ethernet/toshiba/spider_net.c, net/sis/sis190.c, and more...)  that also checks the result of read MII_BMSR against 0x0000 and 0xFFFF , I tried to send a common fix in the PHY core. From my point of view read a 0x0000 or 0xFFFF value is an error in the probe step like if the phy_read() returns an error, so I think that the PHY core should consider return a EPROBE_DEFER to at least provide a second try for that PHY device.

Thanks,

Arturo.

-----Original Message-----
From: Heiner Kallweit <hkallweit1@gmail.com> 
Sent: viernes, 17 de marzo de 2023 19:21
To: Buzarra, Arturo <Arturo.Buzarra@digi.com>
Cc: netdev@vger.kernel.org; Florian Fainelli <f.fainelli@gmail.com>; Andrew Lunn <andrew@lunn.ch>; Russell King - ARM Linux <linux@armlinux.org.uk>
Subject: Re: [PATCH] net: phy: return EPROBE_DEFER if PHY is not accessible

[EXTERNAL E-MAIL] Warning! This email originated outside of the organization! Do not click links or open attachments unless you recognize the sender and know the content is safe.



On 17.03.2023 13:16, arturo.buzarra@digi.com wrote:
> From: Arturo Buzarra <arturo.buzarra@digi.com>
>
> A PHY driver can dynamically determine the devices features, but in 
> some circunstances, the PHY is not yet ready and the read capabilities 
> does not fail but returns an undefined value, so incorrect 
> capabilities are assumed and the initialization process fails. This 
> commit postpones the PHY probe to ensure the PHY is accessible.
>
To complement what has been said by Florian and Andrew:

"under some circumstances" is too vague in general. List potential such circumstances and best what happened exactly in your case.

When genphy_read_abilities() is called the PHY has been accessed already, reading the PHY ID.

So best start with some details about your use case, which MAC, which PHY, etc.

> Signed-off-by: Arturo Buzarra <arturo.buzarra@digi.com>
> ---
>  drivers/net/phy/phy_device.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/drivers/net/phy/phy_device.c 
> b/drivers/net/phy/phy_device.c index 1785f1cead97..f8c31e741936 100644
> --- a/drivers/net/phy/phy_device.c
> +++ b/drivers/net/phy/phy_device.c
> @@ -2628,10 +2628,14 @@ int genphy_read_abilities(struct phy_device *phydev)
>                              phydev->supported);
>
>       val = phy_read(phydev, MII_BMSR);
>       if (val < 0)
>               return val;
> +     if (val == 0x0000 || val == 0xffff) {
> +             phydev_err(phydev, "PHY is not accessible\n");
> +             return -EPROBE_DEFER;
> +     }
>
>       linkmode_mod_bit(ETHTOOL_LINK_MODE_Autoneg_BIT, phydev->supported,
>                        val & BMSR_ANEGCAPABLE);
>
>       linkmode_mod_bit(ETHTOOL_LINK_MODE_100baseT_Full_BIT, 
> phydev->supported,


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] net: phy: return EPROBE_DEFER if PHY is not accessible
  2023-03-20  9:45   ` Buzarra, Arturo
@ 2023-03-20 10:12     ` Russell King (Oracle)
  2023-03-20 12:00     ` Andrew Lunn
  1 sibling, 0 replies; 12+ messages in thread
From: Russell King (Oracle) @ 2023-03-20 10:12 UTC (permalink / raw)
  To: Buzarra, Arturo; +Cc: Heiner Kallweit, netdev, Florian Fainelli, Andrew Lunn

On Mon, Mar 20, 2023 at 09:45:38AM +0000, Buzarra, Arturo wrote:
> Hi,
> 
> I will try to answer all your questions:
> 
> - "We need more specifics here, what type of PHY device are you seeing this with?"
> - " So best start with some details about your use case, which MAC, which PHY, etc"
> I'm using a LAN8720A PHY (10/100 on RMII mode) with a ST MAC ( in particular is a stm32mp1 processor).

I can only find:

arch/arm/boot/dts/rk3066a-marsboard.dts:        lan8720a {
arch/arm/boot/dts/rk3188-radxarock.dts: lan8720a  {
arch/arm/boot/dts/imx6sx-softing-vining-2000.dts:                       /* LAN8720 PHY Reset */
arch/arm/boot/dts/imx6sx-softing-vining-2000.dts:                       /* LAN8720 PHY Reset */
arch/mips/boot/dts/ingenic/cu1000-neo.dts:      phy-handle = <&lan8720a>;
arch/mips/boot/dts/ingenic/cu1000-neo.dts:      lan8720a: ethernet-phy@0 {

using this PHY in the mainline kernel, none of them look like a stm32mp1
processor.

Please can you give details of how the MDIO bus is connected to this
PHY, in particular the hardware setup, and which driver is being used?

> We have two PHYs one is a Gigabit PHY (RGMII mode) and the another one is a 10/100 (RMII mode).
> In the boot process, I think that there is a race condition between configuring the Ethernet MACs and the two PHYs. At same point the RGMII Ethernet MAC is configured and starts the PHY probes.
> When the 10/100 PHY starts the probe, it tries to read the genphy_read_abilities() and always reads 0xFFFF ( I assume that this is the default electrical values for that lines before it are configured).
> At that point, the PHY initialization assumes like a valid value 0xFFFF and obviously it reports capabilities that the LAN8720A PHY does not have, like for example gigabit support.

So the questions that need to be asked are:

  What is causing 0xffff to be returned?

and two sub-questions on that which may help you answer it in a way
that is most relevant:

  Is it the PHY itself that is not responding?

  Is the MDIO driver not functional because the MAC or some resource it
  needs hasn't been setup, and thus is returning 0xffff?

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] net: phy: return EPROBE_DEFER if PHY is not accessible
  2023-03-20  9:45   ` Buzarra, Arturo
  2023-03-20 10:12     ` Russell King (Oracle)
@ 2023-03-20 12:00     ` Andrew Lunn
  2023-03-23  8:02       ` Buzarra, Arturo
  1 sibling, 1 reply; 12+ messages in thread
From: Andrew Lunn @ 2023-03-20 12:00 UTC (permalink / raw)
  To: Buzarra, Arturo
  Cc: Heiner Kallweit, netdev, Florian Fainelli, Russell King - ARM Linux

On Mon, Mar 20, 2023 at 09:45:38AM +0000, Buzarra, Arturo wrote:
> Hi,
> 
> I will try to answer all your questions:
> 
> - "We need more specifics here, what type of PHY device are you seeing this with?"
> - " So best start with some details about your use case, which MAC, which PHY, etc"
> I'm using a LAN8720A PHY (10/100 on RMII mode) with a ST MAC ( in particular is a stm32mp1 processor).
> We have two PHYs one is a Gigabit PHY (RGMII mode) and the another one is a 10/100 (RMII mode).

Where is the clock coming from? Does each PHY have its own crystal? Is
the clock coming from one of the MACs? Is one PHY a clock source and
the other a clock sync?

> In the boot process, I think that there is a race condition between
> configuring the Ethernet MACs and the two PHYs. At same point the
> RGMII Ethernet MAC is configured and starts the PHY probes.

You have two MACs. Do you have two MDIO busses, with one PHY on each
bus, or do you have just one MDIO bus in use, with two PHYs on it?

Please show us your Device Tree description of the hardware.

> When the 10/100 PHY starts the probe, it tries to read the
> genphy_read_abilities() and always reads 0xFFFF ( I assume that this
> is the default electrical values for that lines before it are
> configured).

There is a pull up on the MDIO data line, so that if nothing drives it
low, it reads 1. This was one of Florian comments. Have you check the
value of that pull up resistor?

> - " Does the device reliably enumerate on the bus, i.e. reading registers 2 and 3 to get its ID?"
> Reading the registers PHYSID1 and PHYSID2 reports also 0xFFFF

Which is odd, because the MDIO bus probe code would assume there is no
PHY there given those two values, and then would not bother trying to
read the abilities. So you are somehow forcing the core to assume
there is a PHY there. Do you have the PHY ID in DT?

> - " If the PHY is broken, by some yet to be determined definition of broken, we try to limit the workaround to as narrow as possible. So it should not be in the core probe code. It should be in the PHY specific driver, and ideally for only its ID, not the whole vendors family of PHYs"
> I have several workarounds/fixed for that, the easy way is set the PHY capabilities in the smsc.c driver " .features	= PHY_BASIC_FEATURES" like in the past and it works fine. Also I have another fix in the same driver adding a custom function for " get_features" where if I read 0xFFFF or 0x0000 return a EPROBE_DEFER. However after review another drivers (net/usb/pegasus.c , net/Ethernet/toshiba/spider_net.c, net/sis/sis190.c, and more...)  that also checks the result of read MII_BMSR against 0x0000 and 0xFFFF , I tried to send a common fix in the PHY core. From my point of view read a 0x0000 or 0xFFFF value is an error in the probe step like if the phy_read() returns an error, so I think that the PHY core should consider return a EPROBE_DEFER to at least provide a second try for that PHY device.

We first want to understand the problem before adding such hacks. It
really sounds like something the PHY needs is missing, a clock, time
after a reset, etc. If we can figure that out, we can avoid such
hacks.

	Andrew

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] net: phy: return EPROBE_DEFER if PHY is not accessible
  2023-03-20 12:00     ` Andrew Lunn
@ 2023-03-23  8:02       ` Buzarra, Arturo
  2023-03-23 14:19         ` Andrew Lunn
  0 siblings, 1 reply; 12+ messages in thread
From: Buzarra, Arturo @ 2023-03-23  8:02 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Heiner Kallweit, netdev, Florian Fainelli, Russell King - ARM Linux

Hi,

" You have two MACs. Do you have two MDIO busses, with one PHY on each bus, or do you have just one MDIO bus in use, with two PHYs on it?"
I have two Ethernet MACs, with one MDIO bus connected to two different PHYs

"There is a pull up on the MDIO data line, so that if nothing drives it low, it reads 1. This was one of Florian comments. Have you check the value of that pull up resistor?"
Yes, but with/without this pull-up we obtain the same behavior

"Which is odd, because the MDIO bus probe code would assume there is no PHY there given those two values, and then would not bother trying to read the abilities. So you are somehow forcing the core to assume there is a PHY there. Do you have the PHY ID in DT?"
Yes, we have both PHYs defined in the DT

"Where is the clock coming from? Does each PHY have its own crystal? Is the clock coming from one of the MACs? Is one PHY a clock source and the other a clock sync?"
Gigabit PHY has its own Crystal, however the 10/100 PHY uses a clock from the CPU and it is the root cause of the issue because when Linux asks the PHY capabilities the clock is not ready yet.

"We first want to understand the problem before adding such hacks. It really sounds like something the PHY needs is missing, a clock, time after a reset, etc. If we can figure that out, we can avoid such hacks"
We have identified the root cause of the 0xFFFF issue, it is because the two PHYs are defined in the same MDIO bus node inside the first Ethernet MAC node, and when the 10/100 PHY is probed the PHY Clock from the CPU is not ready, this is the DT definition:
---------
/* Gigabit Ethernet */
&eth1 {
	status = "okay";
	pinctrl-0 = <&eth1_rgmii_pins>;
	pinctrl-names = "default";
	phy-mode = "rgmii-id";
	max-speed = <1000>;
	phy-handle = <&phy0_eth1>;

	mdio1 {
		#address-cells = <1>;
		#size-cells = <0>;
		compatible = "snps,dwmac-mdio";

		phy0_eth1: ethernet-phy@0 {
			reg = <0>;
			compatible = "ethernet-phy-id0141.0dd0"; /* PHY ID for Marvell 88E1512 */
			reset-gpios = <&gpioi 2 (GPIO_ACTIVE_LOW | GPIO_PULL_UP)>;
			reset-assert-us = <1000>;
			reset-deassert-us = <2000>;
		};

		phy0_eth2: ethernet-phy@1 {
			reg = <1>;
			compatible = "ethernet-phy-id0007.c0f0"; /* PHY ID for SMSC LAN8720Ai */
			reset-gpios = <&gpioh 7 (GPIO_ACTIVE_LOW | GPIO_PULL_UP)>;
			interrupt-parent = <&gpioh>;
			interrupts = <2 IRQ_TYPE_LEVEL_LOW>;
		};
	};
};

/* 10/100 Ethernet */
&eth2 {
	status = "okay";
	pinctrl-0 = <&eth2_rmii_pins>;
	pinctrl-names = "default";
	phy-mode = "rmii";
	max-speed = <100>;
	phy-handle = <&phy0_eth2>;
	st,ext-phyclk;
};
---------
This is the power-up sequence:
- ST MAC driver (dwmac-stm32.c) initializes the first Ethernet interface in RGMII mode
- Linux kernel perform the Gigabit PHY probe
- Linux kernel perform the 10/100 PHY probe ( and reads a wrong value from the MII_BMSR register because the PHY clock from the CPU is not ready)
- ST MAC driver (dwmac-stm32.c) initializes the second Ethernet interface in RMII mode (Here the CPU initializes the PHY Clock)

So the 10/100 PHY is probed BEFORE the PHY Clock is initialized.

In spite of this corner case issue that we have with our particular setup, I think that consider a 0x0000 or 0xFFFF read value as a valid value is wrong. For our issue I have several fixes: hardcoded the PHY capabilities in the smsc.c driver with " .features  = PHY_BASIC_FEATURES" , another fix is in the same driver adding a custom function for " .get_features" where if I read 0xFFFF or 0x0000 return a EPROBE_DEFER. But the real motivation to send that patch was that after review several drivers that also checks the result of read MII_BMSR against 0x0000 and 0xFFFF , I tried to send a common fix in the PHY core, to avoid maintain this verification in different drivers.

Thanks,

Arturo.

-----Original Message-----
From: Andrew Lunn <andrew@lunn.ch> 
Sent: lunes, 20 de marzo de 2023 13:01
To: Buzarra, Arturo <Arturo.Buzarra@digi.com>
Cc: Heiner Kallweit <hkallweit1@gmail.com>; netdev@vger.kernel.org; Florian Fainelli <f.fainelli@gmail.com>; Russell King - ARM Linux <linux@armlinux.org.uk>
Subject: Re: [PATCH] net: phy: return EPROBE_DEFER if PHY is not accessible

[EXTERNAL E-MAIL] Warning! This email originated outside of the organization! Do not click links or open attachments unless you recognize the sender and know the content is safe.



On Mon, Mar 20, 2023 at 09:45:38AM +0000, Buzarra, Arturo wrote:
> Hi,
>
> I will try to answer all your questions:
>
> - "We need more specifics here, what type of PHY device are you seeing this with?"
> - " So best start with some details about your use case, which MAC, which PHY, etc"
> I'm using a LAN8720A PHY (10/100 on RMII mode) with a ST MAC ( in particular is a stm32mp1 processor).
> We have two PHYs one is a Gigabit PHY (RGMII mode) and the another one is a 10/100 (RMII mode).

Where is the clock coming from? Does each PHY have its own crystal? Is the clock coming from one of the MACs? Is one PHY a clock source and the other a clock sync?

> In the boot process, I think that there is a race condition between 
> configuring the Ethernet MACs and the two PHYs. At same point the 
> RGMII Ethernet MAC is configured and starts the PHY probes.

You have two MACs. Do you have two MDIO busses, with one PHY on each bus, or do you have just one MDIO bus in use, with two PHYs on it?

Please show us your Device Tree description of the hardware.

> When the 10/100 PHY starts the probe, it tries to read the
> genphy_read_abilities() and always reads 0xFFFF ( I assume that this 
> is the default electrical values for that lines before it are 
> configured).

There is a pull up on the MDIO data line, so that if nothing drives it low, it reads 1. This was one of Florian comments. Have you check the value of that pull up resistor?

> - " Does the device reliably enumerate on the bus, i.e. reading registers 2 and 3 to get its ID?"
> Reading the registers PHYSID1 and PHYSID2 reports also 0xFFFF

Which is odd, because the MDIO bus probe code would assume there is no PHY there given those two values, and then would not bother trying to read the abilities. So you are somehow forcing the core to assume there is a PHY there. Do you have the PHY ID in DT?

> - " If the PHY is broken, by some yet to be determined definition of broken, we try to limit the workaround to as narrow as possible. So it should not be in the core probe code. It should be in the PHY specific driver, and ideally for only its ID, not the whole vendors family of PHYs"
> I have several workarounds/fixed for that, the easy way is set the PHY capabilities in the smsc.c driver " .features  = PHY_BASIC_FEATURES" like in the past and it works fine. Also I have another fix in the same driver adding a custom function for " get_features" where if I read 0xFFFF or 0x0000 return a EPROBE_DEFER. However after review another drivers (net/usb/pegasus.c , net/Ethernet/toshiba/spider_net.c, net/sis/sis190.c, and more...)  that also checks the result of read MII_BMSR against 0x0000 and 0xFFFF , I tried to send a common fix in the PHY core. From my point of view read a 0x0000 or 0xFFFF value is an error in the probe step like if the phy_read() returns an error, so I think that the PHY core should consider return a EPROBE_DEFER to at least provide a second try for that PHY device.

We first want to understand the problem before adding such hacks. It really sounds like something the PHY needs is missing, a clock, time after a reset, etc. If we can figure that out, we can avoid such hacks.

        Andrew

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] net: phy: return EPROBE_DEFER if PHY is not accessible
  2023-03-23  8:02       ` Buzarra, Arturo
@ 2023-03-23 14:19         ` Andrew Lunn
  2023-03-23 14:35           ` Russell King (Oracle)
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Lunn @ 2023-03-23 14:19 UTC (permalink / raw)
  To: Buzarra, Arturo
  Cc: Heiner Kallweit, netdev, Florian Fainelli, Russell King - ARM Linux

> Gigabit PHY has its own Crystal, however the 10/100 PHY uses a clock
> from the CPU and it is the root cause of the issue because when
> Linux asks the PHY capabilities the clock is not ready yet.

O.K, now we are getting closer.

Which clock is it exactly? Both for the MAC and the PHY?

> We have identified the root cause of the 0xFFFF issue, it is because
> the two PHYs are defined in the same MDIO bus node inside the first
> Ethernet MAC node, and when the 10/100 PHY is probed the PHY Clock
> from the CPU is not ready, this is the DT definition:


> ---------
> /* Gigabit Ethernet */
> &eth1 {
> 	status = "okay";
> 	pinctrl-0 = <&eth1_rgmii_pins>;
> 	pinctrl-names = "default";
> 	phy-mode = "rgmii-id";
> 	max-speed = <1000>;
> 	phy-handle = <&phy0_eth1>;
> 
> 	mdio1 {
> 		#address-cells = <1>;
> 		#size-cells = <0>;
> 		compatible = "snps,dwmac-mdio";
> 
> 		phy0_eth1: ethernet-phy@0 {
> 			reg = <0>;
> 			compatible = "ethernet-phy-id0141.0dd0"; /* PHY ID for Marvell 88E1512 */
> 			reset-gpios = <&gpioi 2 (GPIO_ACTIVE_LOW | GPIO_PULL_UP)>;
> 			reset-assert-us = <1000>;
> 			reset-deassert-us = <2000>;
> 		};
> 
> 		phy0_eth2: ethernet-phy@1 {
> 			reg = <1>;
> 			compatible = "ethernet-phy-id0007.c0f0"; /* PHY ID for SMSC LAN8720Ai */
> 			reset-gpios = <&gpioh 7 (GPIO_ACTIVE_LOW | GPIO_PULL_UP)>;
> 			interrupt-parent = <&gpioh>;
> 			interrupts = <2 IRQ_TYPE_LEVEL_LOW>;
> 		};
> 	};
> };

This all looks reasonable.


> /* 10/100 Ethernet */
> &eth2 {
> 	status = "okay";
> 	pinctrl-0 = <&eth2_rmii_pins>;
> 	pinctrl-names = "default";
> 	phy-mode = "rmii";
> 	max-speed = <100>;
> 	phy-handle = <&phy0_eth2>;
> 	st,ext-phyclk;
> };

st,ext-phyclk is interesting. But looking at the code, that means the
clock is coming from somewhere else. And there appears to be eth-ck,
which the driver does a

devm_clk_get() on.

Is eth-ck being fed to both the MAC and the PHY? Is this the missing
clock?

So this is what we need to fix. And the correct way to do this is to
express this clock in DT. If you look at the PHY driver smsc, in its
probe function it has:

       /* Make clk optional to keep DTB backward compatibility. */
        refclk = devm_clk_get_optional_enabled(dev, NULL);
        if (IS_ERR(refclk))
                return dev_err_probe(dev, PTR_ERR(refclk),
                                     "Failed to request clock\n");

If there is a clock in DT, it will try to enable it. If the clock does
not exist yet, it will fail the probe with -EPRODE_DEFER, and the core
will try the probe again later, hopefully once the clock exists.

So this is the clock consumer. You also need a clock provider. What
exactly is this clock? We need the answer to the questions above,
but... If it is a SoC clock, it is quite likely there already is a
clock provider for it. If it is a clock in the MAC, you need to extend
the MAC driver to implement a clock provider. Maybe
meson8b_dwmac_register_clk() is a useful example?

	Andrew



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] net: phy: return EPROBE_DEFER if PHY is not accessible
  2023-03-23 14:19         ` Andrew Lunn
@ 2023-03-23 14:35           ` Russell King (Oracle)
  2023-03-30  7:46             ` Buzarra, Arturo
  0 siblings, 1 reply; 12+ messages in thread
From: Russell King (Oracle) @ 2023-03-23 14:35 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Buzarra, Arturo, Heiner Kallweit, netdev, Florian Fainelli

On Thu, Mar 23, 2023 at 03:19:21PM +0100, Andrew Lunn wrote:
> > Gigabit PHY has its own Crystal, however the 10/100 PHY uses a clock
> > from the CPU and it is the root cause of the issue because when
> > Linux asks the PHY capabilities the clock is not ready yet.
> 
> O.K, now we are getting closer.
> 
> Which clock is it exactly? Both for the MAC and the PHY?

Just a passing observation but... considering stmmac needs the clock
from the PHY in order to do even basic things, it doesn't surprise me
that there's a PHY out there that doesn't work without a clock provided
from the "other side" to also do the most basic things such as read the
IDs!

Hardware folk always find wonderful ways to break stuff and then need
software to fix it... :/

That all said, if the clock that's being discussed is the MDC signal
(MDIO interface clock) then /really/ (and obviously) that's a matter
for the MDIO driver to ensure that clock is available to run before
registering itself.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] net: phy: return EPROBE_DEFER if PHY is not accessible
  2023-03-23 14:35           ` Russell King (Oracle)
@ 2023-03-30  7:46             ` Buzarra, Arturo
  2023-03-30 12:01               ` Andrew Lunn
  0 siblings, 1 reply; 12+ messages in thread
From: Buzarra, Arturo @ 2023-03-30  7:46 UTC (permalink / raw)
  To: Russell King, Andrew Lunn; +Cc: Heiner Kallweit, netdev, Florian Fainelli

Hi,

" Which clock is it exactly? Both for the MAC and the PHY?"
Yes, you can clock both from a crystal or like in this case from the SoC.

" Is eth-ck being fed to both the MAC and the PHY? Is this the missing clock?"
Yes, It is the clock that only is available when the second Ethernet MAC (RMII) is initialized

" What exactly is this clock?"
This clock cames from the SoC, it is initialized in stm32mp1_set_mode() with the register value " val |= dwmac->ops->pmcsetr.eth2_ref_clk_sel;"

" Maybe meson8b_dwmac_register_clk() is a useful example?"
After reviewing your suggestion, I saw that the MAC driver "dwmac-stm32.c" doesn't have devm_clk_register(), so I'm assuming it doesn't register it as a clock provider. 

I will try to work on that way to create a clock dependency between MAC and PHY.

Regardless, what do you think reading 0x0000 or 0xFFFF should be considered as a PHY error?

Thanks,

Arturo.

-----Original Message-----
From: Russell King <linux@armlinux.org.uk> 
Sent: jueves, 23 de marzo de 2023 15:36
To: Andrew Lunn <andrew@lunn.ch>
Cc: Buzarra, Arturo <Arturo.Buzarra@digi.com>; Heiner Kallweit <hkallweit1@gmail.com>; netdev@vger.kernel.org; Florian Fainelli <f.fainelli@gmail.com>
Subject: Re: [PATCH] net: phy: return EPROBE_DEFER if PHY is not accessible

[EXTERNAL E-MAIL] Warning! This email originated outside of the organization! Do not click links or open attachments unless you recognize the sender and know the content is safe.



On Thu, Mar 23, 2023 at 03:19:21PM +0100, Andrew Lunn wrote:
> > Gigabit PHY has its own Crystal, however the 10/100 PHY uses a clock 
> > from the CPU and it is the root cause of the issue because when 
> > Linux asks the PHY capabilities the clock is not ready yet.
>
> O.K, now we are getting closer.
>
> Which clock is it exactly? Both for the MAC and the PHY?

Just a passing observation but... considering stmmac needs the clock from the PHY in order to do even basic things, it doesn't surprise me that there's a PHY out there that doesn't work without a clock provided from the "other side" to also do the most basic things such as read the IDs!

Hardware folk always find wonderful ways to break stuff and then need software to fix it... :/

That all said, if the clock that's being discussed is the MDC signal (MDIO interface clock) then /really/ (and obviously) that's a matter for the MDIO driver to ensure that clock is available to run before registering itself.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] net: phy: return EPROBE_DEFER if PHY is not accessible
  2023-03-30  7:46             ` Buzarra, Arturo
@ 2023-03-30 12:01               ` Andrew Lunn
  0 siblings, 0 replies; 12+ messages in thread
From: Andrew Lunn @ 2023-03-30 12:01 UTC (permalink / raw)
  To: Buzarra, Arturo; +Cc: Russell King, Heiner Kallweit, netdev, Florian Fainelli

> Regardless, what do you think reading 0x0000 or 0xFFFF should be
> considered as a PHY error?

It is clearly a hack to work around the missing clock. Sorry, but no.

   Andrew

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-03-30 12:01 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-17 12:16 [PATCH] net: phy: return EPROBE_DEFER if PHY is not accessible arturo.buzarra
2023-03-17 17:05 ` Florian Fainelli
2023-03-17 17:24 ` Andrew Lunn
2023-03-17 18:21 ` Heiner Kallweit
2023-03-20  9:45   ` Buzarra, Arturo
2023-03-20 10:12     ` Russell King (Oracle)
2023-03-20 12:00     ` Andrew Lunn
2023-03-23  8:02       ` Buzarra, Arturo
2023-03-23 14:19         ` Andrew Lunn
2023-03-23 14:35           ` Russell King (Oracle)
2023-03-30  7:46             ` Buzarra, Arturo
2023-03-30 12:01               ` Andrew Lunn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.