All of lore.kernel.org
 help / color / mirror / Atom feed
* phydev control race condition while AUTONEG_DISABLE
       [not found] <5c5ee244-d40c-4e34-8652-a3f82b453543@nsg.net.ru>
@ 2020-05-18 16:40 ` Val Jurin
  2020-05-19 20:37   ` Heiner Kallweit
  0 siblings, 1 reply; 2+ messages in thread
From: Val Jurin @ 2020-05-18 16:40 UTC (permalink / raw)
  To: netdev

Hello  everybody.

This letter was sent to you because you are signed in as MAINTAINERS
  for  ETHERNET PHY LIBRARY topic.

Issue:
It seems I found "race condition" situation for PHY devices in 
AUTONEG_DISABLE mode.

Description:
I found this issue in my working kernel 4.11.7 but  I cloned git master 
from Linus' git master branch
and practically the same code is still in master. ( for now ver. 5.7.0-rc5 )

Please look at  the part of file
drivers/net/phy/phy.c
with my comments:

=== code ===
int phy_ethtool_ksettings_set(struct phy_device *phydev,
                               const struct ethtool_link_ksettings *cmd)
{

/*  THE PART OF THE CODE SKIPPED */

         phydev->autoneg = autoneg;

         phydev->speed = speed;

         linkmode_copy(phydev->advertising, advertising);

         linkmode_mod_bit(ETHTOOL_LINK_MODE_Autoneg_BIT,
                          phydev->advertising, autoneg == AUTONEG_ENABLE);

         phydev->duplex = duplex;

/* LET'S MARK THIS CODE POINT AS THE "POINT 1" */

         phydev->mdix_ctrl = cmd->base.eth_tp_mdix_ctrl;

         /* Restart the PHY */

/* AND WE ARE GOING TO phy_start_aneg */

         phy_start_aneg(phydev);

         return 0;
}
EXPORT_SYMBOL(phy_ethtool_ksettings_set);
=== EoF code ===

Now look at the beginning of phy_start_aneg:

=== code ===
int phy_start_aneg(struct phy_device *phydev)
{
         int err;

         if (!phydev->drv)
                 return -EIO;

/* LET'S MARK THIS CODE POINT AS THE "POINT 2" */
/* JUST BEFORE MUTEX LOCK */
         mutex_lock(&phydev->lock);

/* THE FOLLOWING CODE SKIPPED */
=== EoF code ===


The statement.

If the short code chain between "POINT 1" and "POINT 2" is running
and physical device is in AUTONEG_DISABLE mode
and one of the following asynchronous events occur between these points 
of code:

a)  phydev related status change interrupt (with following hi-prio tasklet);
b)  task rescheduling which may call phydev status POLL routine
(if phydev is in non-interrupt-driven mode);

the following fields in struct phy_device:

phydev->speed
phydev->duplex

will  be rewritten by phy_device  read_status function.

For example - genphy_read_status ( more strict by 
genphy_read_status_fixed in kernel v.5 )

and the command from ethtool will not set correct speed and duplex which
stays with old values of speed/duplex.

The probability of this event is vanishingly small and practically may never
occur because:
a) most of phy devices work in AUTONEG_ENABLE mode;
b) the affected code chain is too short. It's just a dozen of CPU commands
mostly for functions epilogue and prologue and the probability of incoming
async events for phydev status change is practically close to zero while
this code chain is running.

Anyway, it is a cause for race condition.

How to emulate:
Just inject something like
  phydev->drv->config_aneg(phydev)
in any place of the described code chain between "POINT 1" and "POINT 2"

How it was found :
I've got a card with MX6UL (NXP ARM SOC) with 2x FEC + micrel 8081 PHY.
The interrupt line of one of these PHY devices was tied low by adjusters :-E
and this caused high frequency interrupts + phy read_status call with 
interrupt rate
limited by interrupt controller or something else. But it was enough to see
that ethtool sets speed and duplex one time of three (or more) tries.
Elsewhere it may never occur and it will never be found :)

Possible solution:
Everything is normal in AUTONEG_ENABLE mode because the fields
"speed" and "duplex" are not used in ethtool command by config_aneg
which uses advertise fields as command data in this case.

It looks like the struct phy_device must have separated fields for command
and for status change:

phydev->speed
phydev->duplex

must be left for status
but, for example

phydev->speed_req
phydev->duplex_req

must be used for ethtool in phy_ethtool_ksettings_set  (*)
and then used as command data in phy_start_aneg or later
in appropriate  place _after_  phy_start_aneg locks phydev mutex.

(*) as you know this is the default, other drivers may use their own 
ethtool ksettings

I can not do patch/changes myself because this code is historically very 
old and
trusted and very important and can affect not only the functions 
described above.
I think you better know what to do or leave it "as is"  because the
probability is too low. But as I suspect and as I was taught ...
if probability is very very close to zero and it is not  still strictly zero
just find the way what needs to be done to fix it.
That was the reason to send you this letter.

Thank you for attention.

Regards,
Valentine Jurin,
         hardware support, kernel and boot maintenance,
         NSG Ltd. , http://www.nsg.ru

P.S Also we want to say respects to you and to all kernel team for your 
great job.
Thank you very much.
We like Linux.


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: phydev control race condition while AUTONEG_DISABLE
  2020-05-18 16:40 ` phydev control race condition while AUTONEG_DISABLE Val Jurin
@ 2020-05-19 20:37   ` Heiner Kallweit
  0 siblings, 0 replies; 2+ messages in thread
From: Heiner Kallweit @ 2020-05-19 20:37 UTC (permalink / raw)
  To: Val Jurin; +Cc: netdev, Andrew Lunn, Florian Fainelli, Russell King - ARM Linux

On 18.05.2020 18:40, Val Jurin wrote:
> Hello  everybody.
> 
> This letter was sent to you because you are signed in as MAINTAINERS
>  for  ETHERNET PHY LIBRARY topic.
> 
> Issue:
> It seems I found "race condition" situation for PHY devices in AUTONEG_DISABLE mode.
> 
> Description:
> I found this issue in my working kernel 4.11.7 but  I cloned git master from Linus' git master branch
> and practically the same code is still in master. ( for now ver. 5.7.0-rc5 )
> 
> Please look at  the part of file
> drivers/net/phy/phy.c
> with my comments:
> 
> === code ===
> int phy_ethtool_ksettings_set(struct phy_device *phydev,
>                               const struct ethtool_link_ksettings *cmd)
> {
> 
> /*  THE PART OF THE CODE SKIPPED */
> 
>         phydev->autoneg = autoneg;
> 
>         phydev->speed = speed;
> 
>         linkmode_copy(phydev->advertising, advertising);
> 
>         linkmode_mod_bit(ETHTOOL_LINK_MODE_Autoneg_BIT,
>                          phydev->advertising, autoneg == AUTONEG_ENABLE);
> 
>         phydev->duplex = duplex;
> 
> /* LET'S MARK THIS CODE POINT AS THE "POINT 1" */
> 
>         phydev->mdix_ctrl = cmd->base.eth_tp_mdix_ctrl;
> 
>         /* Restart the PHY */
> 
> /* AND WE ARE GOING TO phy_start_aneg */
> 
>         phy_start_aneg(phydev);
> 
>         return 0;
> }
> EXPORT_SYMBOL(phy_ethtool_ksettings_set);
> === EoF code ===
> 
> Now look at the beginning of phy_start_aneg:
> 
> === code ===
> int phy_start_aneg(struct phy_device *phydev)
> {
>         int err;
> 
>         if (!phydev->drv)
>                 return -EIO;
> 
> /* LET'S MARK THIS CODE POINT AS THE "POINT 2" */
> /* JUST BEFORE MUTEX LOCK */
>         mutex_lock(&phydev->lock);
> 
> /* THE FOLLOWING CODE SKIPPED */
> === EoF code ===
> 
> 
> The statement.
> 
> If the short code chain between "POINT 1" and "POINT 2" is running
> and physical device is in AUTONEG_DISABLE mode
> and one of the following asynchronous events occur between these points of code:
> 
> a)  phydev related status change interrupt (with following hi-prio tasklet);
> b)  task rescheduling which may call phydev status POLL routine
> (if phydev is in non-interrupt-driven mode);
> 
> the following fields in struct phy_device:
> 
> phydev->speed
> phydev->duplex
> 
> will  be rewritten by phy_device  read_status function.
> 
> For example - genphy_read_status ( more strict by genphy_read_status_fixed in kernel v.5 )
> 
> and the command from ethtool will not set correct speed and duplex which
> stays with old values of speed/duplex.
> 
> The probability of this event is vanishingly small and practically may never
> occur because:
> a) most of phy devices work in AUTONEG_ENABLE mode;
> b) the affected code chain is too short. It's just a dozen of CPU commands
> mostly for functions epilogue and prologue and the probability of incoming
> async events for phydev status change is practically close to zero while
> this code chain is running.
> 
> Anyway, it is a cause for race condition.
> 
> How to emulate:
> Just inject something like
>  phydev->drv->config_aneg(phydev)
> in any place of the described code chain between "POINT 1" and "POINT 2"
> 
> How it was found :
> I've got a card with MX6UL (NXP ARM SOC) with 2x FEC + micrel 8081 PHY.
> The interrupt line of one of these PHY devices was tied low by adjusters :-E
> and this caused high frequency interrupts + phy read_status call with interrupt rate
> limited by interrupt controller or something else. But it was enough to see
> that ethtool sets speed and duplex one time of three (or more) tries.
> Elsewhere it may never occur and it will never be found :)
> 
> Possible solution:
> Everything is normal in AUTONEG_ENABLE mode because the fields
> "speed" and "duplex" are not used in ethtool command by config_aneg
> which uses advertise fields as command data in this case.
> 
> It looks like the struct phy_device must have separated fields for command
> and for status change:
> 
> phydev->speed
> phydev->duplex
> 
> must be left for status
> but, for example
> 
> phydev->speed_req
> phydev->duplex_req
> 
> must be used for ethtool in phy_ethtool_ksettings_set  (*)
> and then used as command data in phy_start_aneg or later
> in appropriate  place _after_  phy_start_aneg locks phydev mutex.
> 

I see the point. It should be sufficient to add an unlocked version
of phy_start_aneg() and protect the critical section in
phy_ethtool_ksettings_set() with the phydev mutex.
Could you please test with the two patches below?


> (*) as you know this is the default, other drivers may use their own ethtool ksettings
> 
> I can not do patch/changes myself because this code is historically very old and
> trusted and very important and can affect not only the functions described above.
> I think you better know what to do or leave it "as is"  because the
> probability is too low. But as I suspect and as I was taught ...
> if probability is very very close to zero and it is not  still strictly zero
> just find the way what needs to be done to fix it.
> That was the reason to send you this letter.
> 
> Thank you for attention.
> 
> Regards,
> Valentine Jurin,
>         hardware support, kernel and boot maintenance,
>         NSG Ltd. , http://www.nsg.ru
> 
> P.S Also we want to say respects to you and to all kernel team for your great job.
> Thank you very much.
> We like Linux.
> 

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index d4bbf79da..d945be675 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -602,7 +602,7 @@ static int phy_check_link_status(struct phy_device *phydev)
 }
 
 /**
- * phy_start_aneg - start auto-negotiation for this PHY device
+ * __phy_start_aneg - start auto-negotiation for this PHY device
  * @phydev: the phy_device struct
  *
  * Description: Sanitizes the settings (if we're not autonegotiating
@@ -610,25 +610,33 @@ static int phy_check_link_status(struct phy_device *phydev)
  *   If the PHYCONTROL Layer is operating, we change the state to
  *   reflect the beginning of Auto-negotiation or forcing.
  */
-int phy_start_aneg(struct phy_device *phydev)
+int __phy_start_aneg(struct phy_device *phydev)
 {
 	int err;
 
 	if (!phydev->drv)
 		return -EIO;
 
-	mutex_lock(&phydev->lock);
-
 	if (AUTONEG_DISABLE == phydev->autoneg)
 		phy_sanitize_settings(phydev);
 
 	err = phy_config_aneg(phydev);
 	if (err < 0)
-		goto out_unlock;
+		return err;
 
 	if (phy_is_started(phydev))
 		err = phy_check_link_status(phydev);
-out_unlock:
+
+	return err;
+}
+EXPORT_SYMBOL(__phy_start_aneg);
+
+int phy_start_aneg(struct phy_device *phydev)
+{
+	int err;
+
+	mutex_lock(&phydev->lock);
+	err = __phy_start_aneg(phydev);
 	mutex_unlock(&phydev->lock);
 
 	return err;
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 5d8ff5428..ea9ba6d1c 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -1244,6 +1244,7 @@ void phy_disconnect(struct phy_device *phydev);
 void phy_detach(struct phy_device *phydev);
 void phy_start(struct phy_device *phydev);
 void phy_stop(struct phy_device *phydev);
+int __phy_start_aneg(struct phy_device *phydev);
 int phy_start_aneg(struct phy_device *phydev);
 int phy_aneg_done(struct phy_device *phydev);
 int phy_speed_down(struct phy_device *phydev, bool sync);
-- 
2.26.2



diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index d945be675..7bf37ec3f 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -291,6 +291,8 @@ int phy_ethtool_ksettings_set(struct phy_device *phydev,
 	      duplex != DUPLEX_FULL)))
 		return -EINVAL;
 
+	mutex_lock(&phydev->lock);
+
 	phydev->autoneg = autoneg;
 
 	phydev->speed = speed;
@@ -305,7 +307,9 @@ int phy_ethtool_ksettings_set(struct phy_device *phydev,
 	phydev->mdix_ctrl = cmd->base.eth_tp_mdix_ctrl;
 
 	/* Restart the PHY */
-	phy_start_aneg(phydev);
+	__phy_start_aneg(phydev);
+
+	mutex_unlock(&phydev->lock);
 
 	return 0;
 }
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2020-05-19 20:37 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <5c5ee244-d40c-4e34-8652-a3f82b453543@nsg.net.ru>
2020-05-18 16:40 ` phydev control race condition while AUTONEG_DISABLE Val Jurin
2020-05-19 20:37   ` Heiner Kallweit

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.