* [PATCH] phy: added a PHY_BUSY state into phy_state_machine
@ 2019-07-07 22:32 kwangdo.yi
2019-07-08 3:07 ` Florian Fainelli
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: kwangdo.yi @ 2019-07-07 22:32 UTC (permalink / raw)
To: netdev; +Cc: kwangdo.yi
When mdio driver polling the phy state in the phy_state_machine,
sometimes it results in -ETIMEDOUT and link is down. But the phy
is still alive and just didn't meet the polling deadline.
Closing the phy link in this case seems too radical. Failing to
meet the deadline happens very rarely. When stress test runs for
tens of hours with multiple target boards (Xilinx Zynq7000 with
marvell 88E1512 PHY, Xilinx custom emac IP), it happens. This
patch gives another chance to the phy_state_machine when polling
timeout happens. Only two consecutive failing the deadline is
treated as the real phy halt and close the connection.
Signed-off-by: kwangdo.yi <kwangdo.yi@gmail.com>
---
drivers/net/phy/phy.c | 6 ++++++
include/linux/phy.h | 1 +
2 files changed, 7 insertions(+)
diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index e888542..9e8138b 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -919,7 +919,13 @@ void phy_state_machine(struct work_struct *work)
break;
case PHY_NOLINK:
case PHY_RUNNING:
+ case PHY_BUSY:
err = phy_check_link_status(phydev);
+ if (err == -ETIMEDOUT && old_state == PHY_RUNNING) {
+ phy->state = PHY_BUSY;
+ err = 0;
+
+ }
break;
case PHY_FORCING:
err = genphy_update_link(phydev);
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 6424586..4a49401 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -313,6 +313,7 @@ enum phy_state {
PHY_RUNNING,
PHY_NOLINK,
PHY_FORCING,
+ PHY_BUSY,
};
/**
--
2.7.4
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] phy: added a PHY_BUSY state into phy_state_machine
2019-07-07 22:32 [PATCH] phy: added a PHY_BUSY state into phy_state_machine kwangdo.yi
@ 2019-07-08 3:07 ` Florian Fainelli
2019-07-08 6:03 ` Heiner Kallweit
2019-07-09 3:16 ` kwangdo yi
2019-07-08 4:42 ` Andrew Lunn
2019-07-09 1:58 ` kbuild test robot
2 siblings, 2 replies; 8+ messages in thread
From: Florian Fainelli @ 2019-07-08 3:07 UTC (permalink / raw)
To: kwangdo.yi, netdev, Andrew Lunn, Heiner Kallweit
+Andrew, Heiner (please CC PHY library maintainers).
On 7/7/2019 3:32 PM, kwangdo.yi wrote:
> When mdio driver polling the phy state in the phy_state_machine,
> sometimes it results in -ETIMEDOUT and link is down. But the phy
> is still alive and just didn't meet the polling deadline.
> Closing the phy link in this case seems too radical. Failing to
> meet the deadline happens very rarely. When stress test runs for
> tens of hours with multiple target boards (Xilinx Zynq7000 with
> marvell 88E1512 PHY, Xilinx custom emac IP), it happens. This
> patch gives another chance to the phy_state_machine when polling
> timeout happens. Only two consecutive failing the deadline is
> treated as the real phy halt and close the connection.
How about simply increasing the MDIO polling timeout in the Xilinx EMAC
driver instead? Or if the PHY is where the timeout needs to be
increased, allow the PHY device drivers to advertise min/max timeouts
such that the MDIO bus layer can use that information?
>
>
> Signed-off-by: kwangdo.yi <kwangdo.yi@gmail.com>
> ---
> drivers/net/phy/phy.c | 6 ++++++
> include/linux/phy.h | 1 +
> 2 files changed, 7 insertions(+)
>
> diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
> index e888542..9e8138b 100644
> --- a/drivers/net/phy/phy.c
> +++ b/drivers/net/phy/phy.c
> @@ -919,7 +919,13 @@ void phy_state_machine(struct work_struct *work)
> break;
> case PHY_NOLINK:
> case PHY_RUNNING:
> + case PHY_BUSY:
> err = phy_check_link_status(phydev);
> + if (err == -ETIMEDOUT && old_state == PHY_RUNNING) {
> + phy->state = PHY_BUSY;
> + err = 0;
> +
> + }
> break;
> case PHY_FORCING:
> err = genphy_update_link(phydev);
> diff --git a/include/linux/phy.h b/include/linux/phy.h
> index 6424586..4a49401 100644
> --- a/include/linux/phy.h
> +++ b/include/linux/phy.h
> @@ -313,6 +313,7 @@ enum phy_state {
> PHY_RUNNING,
> PHY_NOLINK,
> PHY_FORCING,
> + PHY_BUSY,
> };
>
> /**
>
--
Florian
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] phy: added a PHY_BUSY state into phy_state_machine
2019-07-07 22:32 [PATCH] phy: added a PHY_BUSY state into phy_state_machine kwangdo.yi
2019-07-08 3:07 ` Florian Fainelli
@ 2019-07-08 4:42 ` Andrew Lunn
2019-07-09 1:58 ` kbuild test robot
2 siblings, 0 replies; 8+ messages in thread
From: Andrew Lunn @ 2019-07-08 4:42 UTC (permalink / raw)
To: kwangdo.yi; +Cc: netdev
On Sun, Jul 07, 2019 at 06:32:12PM -0400, kwangdo.yi wrote:
> When mdio driver polling the phy state in the phy_state_machine,
> sometimes it results in -ETIMEDOUT and link is down. But the phy
> is still alive and just didn't meet the polling deadline.
> Closing the phy link in this case seems too radical. Failing to
> meet the deadline happens very rarely. When stress test runs for
> tens of hours with multiple target boards (Xilinx Zynq7000 with
> marvell 88E1512 PHY, Xilinx custom emac IP), it happens. This
> patch gives another chance to the phy_state_machine when polling
> timeout happens. Only two consecutive failing the deadline is
> treated as the real phy halt and close the connection.
Hi Kwangdo
I agree with Florian here. This does not seem like a PHY problem. It
is an MDIO bus problem. ETIMEDOUT is only returned from
xemaclite_mdio_wait().
What value are using for HZ? If you have 1000, jiffies + 2 could well
be too short.
Andrew
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] phy: added a PHY_BUSY state into phy_state_machine
2019-07-08 3:07 ` Florian Fainelli
@ 2019-07-08 6:03 ` Heiner Kallweit
2019-07-09 3:16 ` kwangdo yi
1 sibling, 0 replies; 8+ messages in thread
From: Heiner Kallweit @ 2019-07-08 6:03 UTC (permalink / raw)
To: Florian Fainelli, kwangdo.yi, netdev, Andrew Lunn
On 08.07.2019 05:07, Florian Fainelli wrote:
> +Andrew, Heiner (please CC PHY library maintainers).
>
> On 7/7/2019 3:32 PM, kwangdo.yi wrote:
>> When mdio driver polling the phy state in the phy_state_machine,
>> sometimes it results in -ETIMEDOUT and link is down. But the phy
>> is still alive and just didn't meet the polling deadline.
>> Closing the phy link in this case seems too radical. Failing to
>> meet the deadline happens very rarely. When stress test runs for
>> tens of hours with multiple target boards (Xilinx Zynq7000 with
>> marvell 88E1512 PHY, Xilinx custom emac IP), it happens. This
>> patch gives another chance to the phy_state_machine when polling
>> timeout happens. Only two consecutive failing the deadline is
>> treated as the real phy halt and close the connection.
>
In addition to what Florian said already there's at least one
issue apart from the quite hacky approach in general.
Let's say we are in interrupt mode and the timeout happens when
reading the PHY status after a link-down interrupt. When ignoring
the error we miss the transition and phylib will report a wrong
link status.
I also would prefer to first check for the root cause and try to
fix it, before adding hacks to upper layers for ignoring errors.
> How about simply increasing the MDIO polling timeout in the Xilinx EMAC
> driver instead? Or if the PHY is where the timeout needs to be
> increased, allow the PHY device drivers to advertise min/max timeouts
> such that the MDIO bus layer can use that information?
>
>>
>>
>> Signed-off-by: kwangdo.yi <kwangdo.yi@gmail.com>
>> ---
>> drivers/net/phy/phy.c | 6 ++++++
>> include/linux/phy.h | 1 +
>> 2 files changed, 7 insertions(+)
>>
>> diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
>> index e888542..9e8138b 100644
>> --- a/drivers/net/phy/phy.c
>> +++ b/drivers/net/phy/phy.c
>> @@ -919,7 +919,13 @@ void phy_state_machine(struct work_struct *work)
>> break;
>> case PHY_NOLINK:
>> case PHY_RUNNING:
>> + case PHY_BUSY:
>> err = phy_check_link_status(phydev);
>> + if (err == -ETIMEDOUT && old_state == PHY_RUNNING) {
>> + phy->state = PHY_BUSY;
>> + err = 0;
>> +
>> + }
>> break;
>> case PHY_FORCING:
>> err = genphy_update_link(phydev);
>> diff --git a/include/linux/phy.h b/include/linux/phy.h
>> index 6424586..4a49401 100644
>> --- a/include/linux/phy.h
>> +++ b/include/linux/phy.h
>> @@ -313,6 +313,7 @@ enum phy_state {
>> PHY_RUNNING,
>> PHY_NOLINK,
>> PHY_FORCING,
>> + PHY_BUSY,
>> };
>>
>> /**
>>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] phy: added a PHY_BUSY state into phy_state_machine
2019-07-07 22:32 [PATCH] phy: added a PHY_BUSY state into phy_state_machine kwangdo.yi
2019-07-08 3:07 ` Florian Fainelli
2019-07-08 4:42 ` Andrew Lunn
@ 2019-07-09 1:58 ` kbuild test robot
2 siblings, 0 replies; 8+ messages in thread
From: kbuild test robot @ 2019-07-09 1:58 UTC (permalink / raw)
To: kwangdo.yi; +Cc: kbuild-all, netdev, kwangdo.yi
[-- Attachment #1: Type: text/plain, Size: 4357 bytes --]
Hi "kwangdo.yi",
Thank you for the patch! Yet something to improve:
[auto build test ERROR on linus/master]
[also build test ERROR on v5.2]
[cannot apply to next-20190708]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/kwangdo-yi/phy-added-a-PHY_BUSY-state-into-phy_state_machine/20190709-075228
config: arm-omap2plus_defconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (GCC) 7.4.0
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=7.4.0 make.cross ARCH=arm
If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>
All error/warnings (new ones prefixed by >>):
drivers/net/phy/phy.c: In function 'phy_state_to_str':
>> drivers/net/phy/phy.c:38:2: warning: enumeration value 'PHY_BUSY' not handled in switch [-Wswitch]
switch (st) {
^~~~~~
drivers/net/phy/phy.c: In function 'phy_state_machine':
>> drivers/net/phy/phy.c:925:4: error: 'phy' undeclared (first use in this function)
phy->state = PHY_BUSY;
^~~
drivers/net/phy/phy.c:925:4: note: each undeclared identifier is reported only once for each function it appears in
vim +/phy +925 drivers/net/phy/phy.c
894
895 /**
896 * phy_state_machine - Handle the state machine
897 * @work: work_struct that describes the work to be done
898 */
899 void phy_state_machine(struct work_struct *work)
900 {
901 struct delayed_work *dwork = to_delayed_work(work);
902 struct phy_device *phydev =
903 container_of(dwork, struct phy_device, state_queue);
904 bool needs_aneg = false, do_suspend = false;
905 enum phy_state old_state;
906 int err = 0;
907
908 mutex_lock(&phydev->lock);
909
910 old_state = phydev->state;
911
912 switch (phydev->state) {
913 case PHY_DOWN:
914 case PHY_READY:
915 break;
916 case PHY_UP:
917 needs_aneg = true;
918
919 break;
920 case PHY_NOLINK:
921 case PHY_RUNNING:
922 case PHY_BUSY:
923 err = phy_check_link_status(phydev);
924 if (err == -ETIMEDOUT && old_state == PHY_RUNNING) {
> 925 phy->state = PHY_BUSY;
926 err = 0;
927
928 }
929 break;
930 case PHY_FORCING:
931 err = genphy_update_link(phydev);
932 if (err)
933 break;
934
935 if (phydev->link) {
936 phydev->state = PHY_RUNNING;
937 phy_link_up(phydev);
938 } else {
939 if (0 == phydev->link_timeout--)
940 needs_aneg = true;
941 phy_link_down(phydev, false);
942 }
943 break;
944 case PHY_HALTED:
945 if (phydev->link) {
946 phydev->link = 0;
947 phy_link_down(phydev, true);
948 do_suspend = true;
949 }
950 break;
951 }
952
953 mutex_unlock(&phydev->lock);
954
955 if (needs_aneg)
956 err = phy_start_aneg(phydev);
957 else if (do_suspend)
958 phy_suspend(phydev);
959
960 if (err < 0)
961 phy_error(phydev);
962
963 if (old_state != phydev->state) {
964 phydev_dbg(phydev, "PHY state change %s -> %s\n",
965 phy_state_to_str(old_state),
966 phy_state_to_str(phydev->state));
967 if (phydev->drv && phydev->drv->link_change_notify)
968 phydev->drv->link_change_notify(phydev);
969 }
970
971 /* Only re-schedule a PHY state machine change if we are polling the
972 * PHY, if PHY_IGNORE_INTERRUPT is set, then we will be moving
973 * between states from phy_mac_interrupt().
974 *
975 * In state PHY_HALTED the PHY gets suspended, so rescheduling the
976 * state machine would be pointless and possibly error prone when
977 * called from phy_disconnect() synchronously.
978 */
979 mutex_lock(&phydev->lock);
980 if (phy_polling_mode(phydev) && phy_is_started(phydev))
981 phy_queue_state_machine(phydev, PHY_STATE_TIME);
982 mutex_unlock(&phydev->lock);
983 }
984
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 36418 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] phy: added a PHY_BUSY state into phy_state_machine
2019-07-08 3:07 ` Florian Fainelli
2019-07-08 6:03 ` Heiner Kallweit
@ 2019-07-09 3:16 ` kwangdo yi
2019-07-09 3:22 ` Andrew Lunn
1 sibling, 1 reply; 8+ messages in thread
From: kwangdo yi @ 2019-07-09 3:16 UTC (permalink / raw)
To: Florian Fainelli; +Cc: netdev, Andrew Lunn, Heiner Kallweit
I simply fixed this issue by increasing the polling time from 20 msec to
60 msec in Xilinx EMAC driver. But the state machine would be in a
better shape if it is capable of handling sub system driver's fake failure.
PHY device driver could advertising the min/max timeouts for its subsystem,
but still some vendor's EMAC driver fails to meet the deadline if this value
is not set properly in PHY driver.
On Sun, Jul 7, 2019 at 11:07 PM Florian Fainelli <f.fainelli@gmail.com> wrote:
>
> +Andrew, Heiner (please CC PHY library maintainers).
>
> On 7/7/2019 3:32 PM, kwangdo.yi wrote:
> > When mdio driver polling the phy state in the phy_state_machine,
> > sometimes it results in -ETIMEDOUT and link is down. But the phy
> > is still alive and just didn't meet the polling deadline.
> > Closing the phy link in this case seems too radical. Failing to
> > meet the deadline happens very rarely. When stress test runs for
> > tens of hours with multiple target boards (Xilinx Zynq7000 with
> > marvell 88E1512 PHY, Xilinx custom emac IP), it happens. This
> > patch gives another chance to the phy_state_machine when polling
> > timeout happens. Only two consecutive failing the deadline is
> > treated as the real phy halt and close the connection.
>
> How about simply increasing the MDIO polling timeout in the Xilinx EMAC
> driver instead? Or if the PHY is where the timeout needs to be
> increased, allow the PHY device drivers to advertise min/max timeouts
> such that the MDIO bus layer can use that information?
>
> >
> >
> > Signed-off-by: kwangdo.yi <kwangdo.yi@gmail.com>
> > ---
> > drivers/net/phy/phy.c | 6 ++++++
> > include/linux/phy.h | 1 +
> > 2 files changed, 7 insertions(+)
> >
> > diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
> > index e888542..9e8138b 100644
> > --- a/drivers/net/phy/phy.c
> > +++ b/drivers/net/phy/phy.c
> > @@ -919,7 +919,13 @@ void phy_state_machine(struct work_struct *work)
> > break;
> > case PHY_NOLINK:
> > case PHY_RUNNING:
> > + case PHY_BUSY:
> > err = phy_check_link_status(phydev);
> > + if (err == -ETIMEDOUT && old_state == PHY_RUNNING) {
> > + phy->state = PHY_BUSY;
> > + err = 0;
> > +
> > + }
> > break;
> > case PHY_FORCING:
> > err = genphy_update_link(phydev);
> > diff --git a/include/linux/phy.h b/include/linux/phy.h
> > index 6424586..4a49401 100644
> > --- a/include/linux/phy.h
> > +++ b/include/linux/phy.h
> > @@ -313,6 +313,7 @@ enum phy_state {
> > PHY_RUNNING,
> > PHY_NOLINK,
> > PHY_FORCING,
> > + PHY_BUSY,
> > };
> >
> > /**
> >
>
> --
> Florian
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] phy: added a PHY_BUSY state into phy_state_machine
2019-07-09 3:16 ` kwangdo yi
@ 2019-07-09 3:22 ` Andrew Lunn
2019-07-09 3:31 ` kwangdo yi
0 siblings, 1 reply; 8+ messages in thread
From: Andrew Lunn @ 2019-07-09 3:22 UTC (permalink / raw)
To: kwangdo yi; +Cc: Florian Fainelli, netdev, Heiner Kallweit
On Mon, Jul 08, 2019 at 11:16:02PM -0400, kwangdo yi wrote:
> I simply fixed this issue by increasing the polling time from 20 msec to
> 60 msec in Xilinx EMAC driver. But the state machine would be in a
> better shape if it is capable of handling sub system driver's fake failure.
> PHY device driver could advertising the min/max timeouts for its subsystem,
> but still some vendor's EMAC driver fails to meet the deadline if this value
> is not set properly in PHY driver.
Hi Kwangdo
That is not how MDIO works. The PHY has two clock cycles to prepare
its response to any request. There is no min/max. This was always an
MDIO bus driver problem, not a PHY problem.
Andrew
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] phy: added a PHY_BUSY state into phy_state_machine
2019-07-09 3:22 ` Andrew Lunn
@ 2019-07-09 3:31 ` kwangdo yi
0 siblings, 0 replies; 8+ messages in thread
From: kwangdo yi @ 2019-07-09 3:31 UTC (permalink / raw)
To: Andrew Lunn; +Cc: Florian Fainelli, netdev, Heiner Kallweit
On Mon, Jul 8, 2019 at 11:22 PM Andrew Lunn <andrew@lunn.ch> wrote:
>
> On Mon, Jul 08, 2019 at 11:16:02PM -0400, kwangdo yi wrote:
> > I simply fixed this issue by increasing the polling time from 20 msec to
> > 60 msec in Xilinx EMAC driver. But the state machine would be in a
> > better shape if it is capable of handling sub system driver's fake failure.
> > PHY device driver could advertising the min/max timeouts for its subsystem,
> > but still some vendor's EMAC driver fails to meet the deadline if this value
> > is not set properly in PHY driver.
>
> Hi Kwangdo
>
> That is not how MDIO works. The PHY has two clock cycles to prepare
> its response to any request. There is no min/max. This was always an
> MDIO bus driver problem, not a PHY problem.
>
> Andrew
Hi Andrew,
I don't think PHY driver has a problem, nor EMAC driver has, but if PHY driver
is capable of handling EMAC driver's fake failure, the PHY driver would be in
a better fit. That's the intention of this patch.
But, it seems this timeout needs to be handled in each MDIO driver properly.
Thanks.
Regards,
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2019-07-09 3:32 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-07 22:32 [PATCH] phy: added a PHY_BUSY state into phy_state_machine kwangdo.yi
2019-07-08 3:07 ` Florian Fainelli
2019-07-08 6:03 ` Heiner Kallweit
2019-07-09 3:16 ` kwangdo yi
2019-07-09 3:22 ` Andrew Lunn
2019-07-09 3:31 ` kwangdo yi
2019-07-08 4:42 ` Andrew Lunn
2019-07-09 1:58 ` kbuild test robot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).