[PATCH stable-4.19.y] net: phy: reschedule state machine if AN has not completed in PHY_AN state

* [PATCH stable-4.19.y] net: phy: reschedule state machine if AN has not completed in PHY_AN state
@ 2020-05-30 21:43 Vladimir Oltean
  2020-05-30 22:36 ` Heiner Kallweit
  2020-05-31  0:18 ` Russell King - ARM Linux admin
  0 siblings, 2 replies; 9+ messages in thread
From: Vladimir Oltean @ 2020-05-30 21:43 UTC (permalink / raw)
  To: stable, gregkh, netdev, andrew, f.fainelli, hkallweit1, linux,
	davem, kuba
  Cc: linux-kernel

From: Vladimir Oltean <vladimir.oltean@nxp.com>

In kernel 4.19 (and probably earlier too) there are issues surrounding
the PHY_AN state.

For example, if a PHY is in PHY_AN state and AN has not finished, then
what is supposed to happen is that the state machine gets rescheduled
until it is, or until the link_timeout reaches zero which triggers an
autoneg restart process.

But actually the rescheduling never works if the PHY uses interrupts,
because the condition under which rescheduling occurs is just if
phy_polling_mode() is true. So basically, this whole rescheduling
functionality works for AN-not-yet-complete just by mistake. Let me
explain.

Most of the time the AN process manages to finish by the time the
interrupt has triggered. One might say "that should always be the case,
otherwise the PHY wouldn't raise the interrupt, right?".
Well, some PHYs implement an .aneg_done method which allows them to tell
the state machine when the AN is really complete.
The AR8031/AR8033 driver (at803x.c) is one such example. Even when
copper autoneg completes, the driver still keeps the "aneg_done"
variable unset until in-band SGMII autoneg finishes too (there is no
interrupt for that). So we have the premises of a race condition.

In practice, what really happens depends on the log level of the serial
console. If the log level is verbose enough that kernel messages related
to the Ethernet link state are printed to the console, then this gives
in-band AN enough time to complete, which means the link will come up
and everyone will be happy. But if the console is not that verbose, the
link will sometimes come up, and sometimes will be forced down by the
.aneg_done of the PHY driver (forever, since we are not rescheduling).

The conclusion is that an extra condition needs to be explicitly added,
so that the state machine can be rescheduled properly. Otherwise PHY
devices in interrupt mode will never work properly if they have an
.aneg_done callback.

In more recent kernels, the whole PHY_AN state was removed by Heiner
Kallweit in the "[net-next,0/5] net: phy: improve and simplify phylib
state machine" series here:

https://patchwork.ozlabs.org/cover/994464/

and the problem was just masked away instead of being addressed with a
punctual patch.

Fixes: 76a423a3f8f1 ("net: phy: allow driver to implement their own aneg_done")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
I'm not sure the procedure I'm following is correct, sending this
directly to Greg. The patch doesn't apply on net.

 drivers/net/phy/phy.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index cc454b8c032c..ca4fd74fd2c8 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -934,7 +934,7 @@ void phy_state_machine(struct work_struct *work)
 	struct delayed_work *dwork = to_delayed_work(work);
 	struct phy_device *phydev =
 			container_of(dwork, struct phy_device, state_queue);
-	bool needs_aneg = false, do_suspend = false;
+	bool recheck = false, needs_aneg = false, do_suspend = false;
 	enum phy_state old_state;
 	int err = 0;
 	int old_link;
@@ -981,6 +981,8 @@ void phy_state_machine(struct work_struct *work)
 			phy_link_up(phydev);
 		} else if (0 == phydev->link_timeout--)
 			needs_aneg = true;
+		else
+			recheck = true;
 		break;
 	case PHY_NOLINK:
 		if (!phy_polling_mode(phydev))
@@ -1123,7 +1125,7 @@ void phy_state_machine(struct work_struct *work)
 	 * PHY, if PHY_IGNORE_INTERRUPT is set, then we will be moving
 	 * between states from phy_mac_interrupt()
 	 */
-	if (phy_polling_mode(phydev))
+	if (phy_polling_mode(phydev) || recheck)
 		queue_delayed_work(system_power_efficient_wq, &phydev->state_queue,
 				   PHY_STATE_TIME * HZ);
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread