All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net 0/2]
@ 2021-11-22 23:51 Marek Behún
  2021-11-22 23:51 ` [PATCH net 1/2] net: phylink: Force link down and retrigger resolve on interface change Marek Behún
  2021-11-22 23:51 ` [PATCH net 2/2] net: phylink: Force retrigger in case of latched link-fail indicator Marek Behún
  0 siblings, 2 replies; 7+ messages in thread
From: Marek Behún @ 2021-11-22 23:51 UTC (permalink / raw)
  To: netdev; +Cc: Russell King, Jakub Kicinski, Andrew Lunn, davem, Marek Behún

With information from me and my nagging, Russell has produced two fixes
for phylink, which add code that triggers another phylink_resolve() from
phylink_resolve(), if certain conditions are met:
  interface is being changed
or
  link is down and previous link was up
These are needed because sometimes the PCS callbacks may provide stale
values if link / speed / ...

Marek Behún (2):
  net: phylink: Force link down and retrigger resolve on interface
    change
  net: phylink: Force retrigger in case of latched link-fail indicator

 drivers/net/phy/phylink.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

-- 
2.32.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH net 1/2] net: phylink: Force link down and retrigger resolve on interface change
  2021-11-22 23:51 [PATCH net 0/2] Marek Behún
@ 2021-11-22 23:51 ` Marek Behún
  2021-11-23 11:20   ` Russell King (Oracle)
  2021-11-22 23:51 ` [PATCH net 2/2] net: phylink: Force retrigger in case of latched link-fail indicator Marek Behún
  1 sibling, 1 reply; 7+ messages in thread
From: Marek Behún @ 2021-11-22 23:51 UTC (permalink / raw)
  To: netdev; +Cc: Russell King, Jakub Kicinski, Andrew Lunn, davem, Marek Behún

On PHY state change the phylink_resolve() function can read stale
information from the MAC and report incorrect link speed and duplex to
the kernel message log.

Example with a Marvell 88X3310 PHY connected to a SerDes port on Marvell
88E6393X switch:
- PHY driver triggers state change due to PHY interface mode being
  changed from 10gbase-r to 2500base-x due to copper change in speed
  from 10Gbps to 2.5Gbps, but the PHY itself either hasn't yet changed
  its interface to the host, or the interrupt about loss of SerDes link
  hadn't arrived yet (there can be a delay of several milliseconds for
  this), so we still think that the 10gbase-r mode is up
- phylink_resolve()
  - phylink_mac_pcs_get_state()
    - this fills in speed=10g link=up
  - interface mode is updated to 2500base-x but speed is left at 10Gbps
  - phylink_major_config()
    - interface is changed to 2500base-x
  - phylink_link_up()
    - mv88e6xxx_mac_link_up()
      - .port_set_speed_duplex()
        - speed is set to 10Gbps
    - reports "Link is Up - 10Gbps/Full" to dmesg

Afterwards when the interrupt finally arrives for mv88e6xxx, another
resolve is forced in which we get the correct speed from
phylink_mac_pcs_get_state(), but since the interface is not being
changed anymore, we don't call phylink_major_config() but only
phylink_mac_config(), which does not set speed/duplex anymore.

To fix this, we need to force the link down and trigger another resolve
on PHY interface change event.

Fixes: 9525ae83959b ("phylink: add phylink infrastructure")
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Marek Behún <kabel@kernel.org>
---
 drivers/net/phy/phylink.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index 3603c024109a..5b8b61daeb98 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -963,6 +963,7 @@ static void phylink_resolve(struct work_struct *w)
 	struct phylink_link_state link_state;
 	struct net_device *ndev = pl->netdev;
 	bool mac_config = false;
+	bool retrigger = false;
 	bool cur_link_state;
 
 	mutex_lock(&pl->state_mutex);
@@ -976,6 +977,7 @@ static void phylink_resolve(struct work_struct *w)
 		link_state.link = false;
 	} else if (pl->mac_link_dropped) {
 		link_state.link = false;
+		retrigger = true;
 	} else {
 		switch (pl->cur_link_an_mode) {
 		case MLO_AN_PHY:
@@ -1000,6 +1002,15 @@ static void phylink_resolve(struct work_struct *w)
 
 			/* Only update if the PHY link is up */
 			if (pl->phydev && pl->phy_state.link) {
+				/* If the interface has changed, force a
+				 * link down event if the link isn't already
+				 * down, and re-resolve.
+				 */
+				if (link_state.interface !=
+				    pl->phy_state.interface) {
+					retrigger = true;
+					link_state.link = false;
+				}
 				link_state.interface = pl->phy_state.interface;
 
 				/* If we have a PHY, we need to update with
@@ -1042,7 +1053,7 @@ static void phylink_resolve(struct work_struct *w)
 		else
 			phylink_link_up(pl, link_state);
 	}
-	if (!link_state.link && pl->mac_link_dropped) {
+	if (!link_state.link && retrigger) {
 		pl->mac_link_dropped = false;
 		queue_work(system_power_efficient_wq, &pl->resolve);
 	}
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH net 2/2] net: phylink: Force retrigger in case of latched link-fail indicator
  2021-11-22 23:51 [PATCH net 0/2] Marek Behún
  2021-11-22 23:51 ` [PATCH net 1/2] net: phylink: Force link down and retrigger resolve on interface change Marek Behún
@ 2021-11-22 23:51 ` Marek Behún
  2021-11-23 11:21   ` Russell King (Oracle)
  2021-11-23 11:40   ` Russell King (Oracle)
  1 sibling, 2 replies; 7+ messages in thread
From: Marek Behún @ 2021-11-22 23:51 UTC (permalink / raw)
  To: netdev; +Cc: Russell King, Jakub Kicinski, Andrew Lunn, davem, Marek Behún

On mv88e6xxx 1G/2.5G PCS, the SerDes register 4.2001.2 has the following
description:
  This register bit indicates when link was lost since the last
  read. For the current link status, read this register
  back-to-back.

Thus to get current link state, we need to read the register twice.

But doing that in the link change interrupt handler would lead to
potentially ignoring link down events, which we really want to avoid.

Thus this needs to be solved in phylink's resolve, by retriggering
another resolve in the event when PCS reports link down and previous
link was up.

The wrong value is read when phylink requests change from sgmii to
2500base-x mode, and link won't come up. This fixes the bug.

Fixes: 9525ae83959b ("phylink: add phylink infrastructure")
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Marek Behún <kabel@kernel.org>
---
 drivers/net/phy/phylink.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index 5b8b61daeb98..c6b5d5af8817 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -994,6 +994,12 @@ static void phylink_resolve(struct work_struct *w)
 		case MLO_AN_INBAND:
 			phylink_mac_pcs_get_state(pl, &link_state);
 
+			/* The PCS may have a latching link-fail indicator.
+			 * If the PCS link goes down, retrigger a resolve.
+			 */
+			if (!link_state.link && cur_link_state)
+				retrigger = true;
+
 			/* If we have a phy, the "up" state is the union of
 			 * both the PHY and the MAC
 			 */
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH net 1/2] net: phylink: Force link down and retrigger resolve on interface change
  2021-11-22 23:51 ` [PATCH net 1/2] net: phylink: Force link down and retrigger resolve on interface change Marek Behún
@ 2021-11-23 11:20   ` Russell King (Oracle)
  2021-11-23 12:12     ` Marek Behún
  0 siblings, 1 reply; 7+ messages in thread
From: Russell King (Oracle) @ 2021-11-23 11:20 UTC (permalink / raw)
  To: Marek Behún; +Cc: netdev, Jakub Kicinski, Andrew Lunn, davem

On Tue, Nov 23, 2021 at 12:51:53AM +0100, Marek Behún wrote:
> On PHY state change the phylink_resolve() function can read stale
> information from the MAC and report incorrect link speed and duplex to
> the kernel message log.
> 
> Example with a Marvell 88X3310 PHY connected to a SerDes port on Marvell
> 88E6393X switch:
> - PHY driver triggers state change due to PHY interface mode being
>   changed from 10gbase-r to 2500base-x due to copper change in speed
>   from 10Gbps to 2.5Gbps, but the PHY itself either hasn't yet changed
>   its interface to the host, or the interrupt about loss of SerDes link
>   hadn't arrived yet (there can be a delay of several milliseconds for
>   this), so we still think that the 10gbase-r mode is up
> - phylink_resolve()
>   - phylink_mac_pcs_get_state()
>     - this fills in speed=10g link=up
>   - interface mode is updated to 2500base-x but speed is left at 10Gbps
>   - phylink_major_config()
>     - interface is changed to 2500base-x
>   - phylink_link_up()
>     - mv88e6xxx_mac_link_up()
>       - .port_set_speed_duplex()
>         - speed is set to 10Gbps
>     - reports "Link is Up - 10Gbps/Full" to dmesg
> 
> Afterwards when the interrupt finally arrives for mv88e6xxx, another
> resolve is forced in which we get the correct speed from
> phylink_mac_pcs_get_state(), but since the interface is not being
> changed anymore, we don't call phylink_major_config() but only
> phylink_mac_config(), which does not set speed/duplex anymore.
> 
> To fix this, we need to force the link down and trigger another resolve
> on PHY interface change event.
> 
> Fixes: 9525ae83959b ("phylink: add phylink infrastructure")
> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> Signed-off-by: Marek Behún <kabel@kernel.org>

I'm pretty sure someone will highlight that the author of the patch
should be the first sign-off - which doesn't match given the way
you've sent this patch. That probably needs fixing before it's
applied.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net 2/2] net: phylink: Force retrigger in case of latched link-fail indicator
  2021-11-22 23:51 ` [PATCH net 2/2] net: phylink: Force retrigger in case of latched link-fail indicator Marek Behún
@ 2021-11-23 11:21   ` Russell King (Oracle)
  2021-11-23 11:40   ` Russell King (Oracle)
  1 sibling, 0 replies; 7+ messages in thread
From: Russell King (Oracle) @ 2021-11-23 11:21 UTC (permalink / raw)
  To: Marek Behún; +Cc: netdev, Jakub Kicinski, Andrew Lunn, davem

On Tue, Nov 23, 2021 at 12:51:54AM +0100, Marek Behún wrote:
> On mv88e6xxx 1G/2.5G PCS, the SerDes register 4.2001.2 has the following
> description:
>   This register bit indicates when link was lost since the last
>   read. For the current link status, read this register
>   back-to-back.
> 
> Thus to get current link state, we need to read the register twice.
> 
> But doing that in the link change interrupt handler would lead to
> potentially ignoring link down events, which we really want to avoid.
> 
> Thus this needs to be solved in phylink's resolve, by retriggering
> another resolve in the event when PCS reports link down and previous
> link was up.
> 
> The wrong value is read when phylink requests change from sgmii to
> 2500base-x mode, and link won't come up. This fixes the bug.
> 
> Fixes: 9525ae83959b ("phylink: add phylink infrastructure")
> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> Signed-off-by: Marek Behún <kabel@kernel.org>

Same issue with this patch wrt authorship vs sign-off.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net 2/2] net: phylink: Force retrigger in case of latched link-fail indicator
  2021-11-22 23:51 ` [PATCH net 2/2] net: phylink: Force retrigger in case of latched link-fail indicator Marek Behún
  2021-11-23 11:21   ` Russell King (Oracle)
@ 2021-11-23 11:40   ` Russell King (Oracle)
  1 sibling, 0 replies; 7+ messages in thread
From: Russell King (Oracle) @ 2021-11-23 11:40 UTC (permalink / raw)
  To: Marek Behún; +Cc: netdev, Jakub Kicinski, Andrew Lunn, davem

On Tue, Nov 23, 2021 at 12:51:54AM +0100, Marek Behún wrote:
> On mv88e6xxx 1G/2.5G PCS, the SerDes register 4.2001.2 has the following
> description:
>   This register bit indicates when link was lost since the last
>   read. For the current link status, read this register
>   back-to-back.
> 
> Thus to get current link state, we need to read the register twice.
> 
> But doing that in the link change interrupt handler would lead to
> potentially ignoring link down events, which we really want to avoid.
> 
> Thus this needs to be solved in phylink's resolve, by retriggering
> another resolve in the event when PCS reports link down and previous
> link was up.
> 
> The wrong value is read when phylink requests change from sgmii to
> 2500base-x mode, and link won't come up. This fixes the bug.

I've also been re-thinking this patch - I don't think it's sufficient
to completely solve the problem, and I think this is required to make
it bullet-proof.

I suspect the reason no problem is being seen is that normally, the
BMSR is read prior to calling phylink_mac_change() which will "unlatch"
the bit.

diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index 73cb97285caa..47fe16b4e387 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -1097,10 +1097,17 @@ static void phylink_resolve(struct work_struct *w)
 			phylink_mac_pcs_get_state(pl, &link_state);
 
 			/* The PCS may have a latching link-fail indicator.
-			 * If the PCS link goes down, retrigger a resolve.
+			 * If the link was up, bring the link down and
+			 * re-trigger the resolve. Otherwise, re-read the
+			 * PCS state to get the current status of the link.
 			 */
-			if (!link_state.link && cur_link_state)
-				retrigger = true;
+			if (!link_state.link) {
+				if (cur_link_state)
+					retrigger = true;
+				else
+					phylink_mac_pcs_get_state(pl,
+								  &link_state);
+			}
 
 			/* If we have a phy, the "up" state is the union of
 			 * both the PHY and the MAC

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH net 1/2] net: phylink: Force link down and retrigger resolve on interface change
  2021-11-23 11:20   ` Russell King (Oracle)
@ 2021-11-23 12:12     ` Marek Behún
  0 siblings, 0 replies; 7+ messages in thread
From: Marek Behún @ 2021-11-23 12:12 UTC (permalink / raw)
  To: Russell King (Oracle); +Cc: netdev, Jakub Kicinski, Andrew Lunn, davem

On Tue, 23 Nov 2021 11:20:59 +0000
"Russell King (Oracle)" <linux@armlinux.org.uk> wrote:

> On Tue, Nov 23, 2021 at 12:51:53AM +0100, Marek Behún wrote:
> > On PHY state change the phylink_resolve() function can read stale
> > information from the MAC and report incorrect link speed and duplex to
> > the kernel message log.
> > 
> > Example with a Marvell 88X3310 PHY connected to a SerDes port on Marvell
> > 88E6393X switch:
> > - PHY driver triggers state change due to PHY interface mode being
> >   changed from 10gbase-r to 2500base-x due to copper change in speed
> >   from 10Gbps to 2.5Gbps, but the PHY itself either hasn't yet changed
> >   its interface to the host, or the interrupt about loss of SerDes link
> >   hadn't arrived yet (there can be a delay of several milliseconds for
> >   this), so we still think that the 10gbase-r mode is up
> > - phylink_resolve()
> >   - phylink_mac_pcs_get_state()
> >     - this fills in speed=10g link=up
> >   - interface mode is updated to 2500base-x but speed is left at 10Gbps
> >   - phylink_major_config()
> >     - interface is changed to 2500base-x
> >   - phylink_link_up()
> >     - mv88e6xxx_mac_link_up()
> >       - .port_set_speed_duplex()
> >         - speed is set to 10Gbps
> >     - reports "Link is Up - 10Gbps/Full" to dmesg
> > 
> > Afterwards when the interrupt finally arrives for mv88e6xxx, another
> > resolve is forced in which we get the correct speed from
> > phylink_mac_pcs_get_state(), but since the interface is not being
> > changed anymore, we don't call phylink_major_config() but only
> > phylink_mac_config(), which does not set speed/duplex anymore.
> > 
> > To fix this, we need to force the link down and trigger another resolve
> > on PHY interface change event.
> > 
> > Fixes: 9525ae83959b ("phylink: add phylink infrastructure")
> > Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> > Signed-off-by: Marek Behún <kabel@kernel.org>  
> 
> I'm pretty sure someone will highlight that the author of the patch
> should be the first sign-off - which doesn't match given the way
> you've sent this patch. That probably needs fixing before it's
> applied.
> 

Hmm. Well you're the author of the patch, I only wrote the commit
message. But I forgot to change --author in git commit. I shall resend
this.

Marek

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-11-23 12:12 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-22 23:51 [PATCH net 0/2] Marek Behún
2021-11-22 23:51 ` [PATCH net 1/2] net: phylink: Force link down and retrigger resolve on interface change Marek Behún
2021-11-23 11:20   ` Russell King (Oracle)
2021-11-23 12:12     ` Marek Behún
2021-11-22 23:51 ` [PATCH net 2/2] net: phylink: Force retrigger in case of latched link-fail indicator Marek Behún
2021-11-23 11:21   ` Russell King (Oracle)
2021-11-23 11:40   ` Russell King (Oracle)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.