linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net] net: phylink: fix link state on phy-connect
@ 2017-11-28 13:29 Antoine Tenart
  2017-11-28 13:56 ` Andrew Lunn
  2017-11-28 15:53 ` Russell King
  0 siblings, 2 replies; 19+ messages in thread
From: Antoine Tenart @ 2017-11-28 13:29 UTC (permalink / raw)
  To: rmk, andrew, f.fainelli, davem
  Cc: Yan Markman, gregory.clement, thomas.petazzoni, miquel.raynal,
	nadavh, mw, stefanc, netdev, linux-kernel, Antoine Tenart

From: Yan Markman <ymarkman@marvell.com>

When calling successively _connect, _disconnect and _connect again, if
the link configuration changed whilst being down from the phylink
perspective, the last _connect would stay in an incorrect old speed.
Fixes this by setting the link configuration parameters to an unknown
value when calling phylink_bringup_phy.

Fixes: 9525ae83959b ("phylink: add phylink infrastructure")
Signed-off-by: Yan Markman <ymarkman@marvell.com>
[Antoine: commit message]
Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
---
 drivers/net/phy/phylink.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index e3bbc70372d3..c2cec3eef67d 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -621,6 +621,16 @@ static int phylink_bringup_phy(struct phylink *pl, struct phy_device *phy)
 	if (ret)
 		return ret;
 
+	/* On _disconnect, the phy state machine and phylink resolve
+	 * are stopped before executing full gracefull down/reset state.
+	 * The further _connect starts with incorrect init state. Let's set
+	 * init values here.
+	 */
+	pl->phy_state.link = false;
+	pl->link_config.pause = MLO_PAUSE_AN;
+	pl->link_config.speed = SPEED_UNKNOWN;
+	pl->link_config.duplex = DUPLEX_UNKNOWN;
+
 	phy->phylink = pl;
 	phy->phy_link_change = phylink_phy_change;
 
-- 
2.14.3

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net] net: phylink: fix link state on phy-connect
  2017-11-28 13:29 [PATCH net] net: phylink: fix link state on phy-connect Antoine Tenart
@ 2017-11-28 13:56 ` Andrew Lunn
  2017-11-28 14:10   ` Antoine Tenart
  2017-11-28 15:53 ` Russell King
  1 sibling, 1 reply; 19+ messages in thread
From: Andrew Lunn @ 2017-11-28 13:56 UTC (permalink / raw)
  To: Antoine Tenart
  Cc: rmk, f.fainelli, davem, Yan Markman, gregory.clement,
	thomas.petazzoni, miquel.raynal, nadavh, mw, stefanc, netdev,
	linux-kernel

> +	/* On _disconnect, the phy state machine and phylink resolve
> +	 * are stopped before executing full gracefull down/reset state.
> +	 * The further _connect starts with incorrect init state. Let's set
> +	 * init values here.
> +	 */
> +	pl->phy_state.link = false;
> +	pl->link_config.pause = MLO_PAUSE_AN;
> +	pl->link_config.speed = SPEED_UNKNOWN;
> +	pl->link_config.duplex = DUPLEX_UNKNOWN;

Hi Antoine

Looks sensible. My only comment would be, maybe it makes sense to
reduce the duplication by adding a little helper which is called here,
and in phylink_create()?

    Andrew

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net] net: phylink: fix link state on phy-connect
  2017-11-28 13:56 ` Andrew Lunn
@ 2017-11-28 14:10   ` Antoine Tenart
  0 siblings, 0 replies; 19+ messages in thread
From: Antoine Tenart @ 2017-11-28 14:10 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Antoine Tenart, rmk, f.fainelli, davem, Yan Markman,
	gregory.clement, thomas.petazzoni, miquel.raynal, nadavh, mw,
	stefanc, netdev, linux-kernel

Hi Andrew,

On Tue, Nov 28, 2017 at 02:56:10PM +0100, Andrew Lunn wrote:
> > +	/* On _disconnect, the phy state machine and phylink resolve
> > +	 * are stopped before executing full gracefull down/reset state.
> > +	 * The further _connect starts with incorrect init state. Let's set
> > +	 * init values here.
> > +	 */
> > +	pl->phy_state.link = false;
> > +	pl->link_config.pause = MLO_PAUSE_AN;
> > +	pl->link_config.speed = SPEED_UNKNOWN;
> > +	pl->link_config.duplex = DUPLEX_UNKNOWN;
> 
> Looks sensible. My only comment would be, maybe it makes sense to
> reduce the duplication by adding a little helper which is called here,
> and in phylink_create()?

Yes, it could be better to have a little helper to reset the link config
and state values and avoid duplication.

I'll wait for more comments before sending a v2 though. Maybe Russell
will have good suggestions to do this the good way.

Thanks!
Antoine

-- 
Antoine Ténart, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net] net: phylink: fix link state on phy-connect
  2017-11-28 13:29 [PATCH net] net: phylink: fix link state on phy-connect Antoine Tenart
  2017-11-28 13:56 ` Andrew Lunn
@ 2017-11-28 15:53 ` Russell King
  2017-11-28 15:56   ` Russell King
  1 sibling, 1 reply; 19+ messages in thread
From: Russell King @ 2017-11-28 15:53 UTC (permalink / raw)
  To: Antoine Tenart
  Cc: andrew, f.fainelli, davem, Yan Markman, gregory.clement,
	thomas.petazzoni, miquel.raynal, nadavh, mw, stefanc, netdev,
	linux-kernel

On Tue, Nov 28, 2017 at 02:29:32PM +0100, Antoine Tenart wrote:
> From: Yan Markman <ymarkman@marvell.com>

Hi, thanks for the patch.

> When calling successively _connect, _disconnect and _connect again, if
> the link configuration changed whilst being down from the phylink
> perspective, the last _connect would stay in an incorrect old speed.
> Fixes this by setting the link configuration parameters to an unknown
> value when calling phylink_bringup_phy.

Under what circumstances does this occur?

> 
> Fixes: 9525ae83959b ("phylink: add phylink infrastructure")
> Signed-off-by: Yan Markman <ymarkman@marvell.com>
> [Antoine: commit message]
> Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
> ---
>  drivers/net/phy/phylink.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
> index e3bbc70372d3..c2cec3eef67d 100644
> --- a/drivers/net/phy/phylink.c
> +++ b/drivers/net/phy/phylink.c
> @@ -621,6 +621,16 @@ static int phylink_bringup_phy(struct phylink *pl, struct phy_device *phy)
>  	if (ret)
>  		return ret;
>  
> +	/* On _disconnect, the phy state machine and phylink resolve
> +	 * are stopped before executing full gracefull down/reset state.
> +	 * The further _connect starts with incorrect init state. Let's set
> +	 * init values here.
> +	 */
> +	pl->phy_state.link = false;
> +	pl->link_config.pause = MLO_PAUSE_AN;
> +	pl->link_config.speed = SPEED_UNKNOWN;
> +	pl->link_config.duplex = DUPLEX_UNKNOWN;

It would be much better to clean up the phy_state in
phylink_disconnect_phy() and trigger a resolve, rather than doing that
each time a PHY is connected - the link should be taken down when the
PHY is removed.

However, I'd like to know under what circumstances this is happening,
since, if you're hotplugging a PHY you should be doing that via SFP
which has additional link up/down handling.  What board is this with?

Also note that there's a number of patches in my "phy" branch that
I'm intending to send as a result of working with Florian over the
last few weeks.  There's several people working fairly independently
in this area and having everyone send patches independently of each
other could get painful to manage.

I'm intending to send patches once I know that net-next is open.

-- 
Russell King
ARM architecture Linux Kernel maintainer

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net] net: phylink: fix link state on phy-connect
  2017-11-28 15:53 ` Russell King
@ 2017-11-28 15:56   ` Russell King
  2017-11-29  7:22     ` Antoine Tenart
  2017-11-29 19:33     ` [EXT] " Yan Markman
  0 siblings, 2 replies; 19+ messages in thread
From: Russell King @ 2017-11-28 15:56 UTC (permalink / raw)
  To: Antoine Tenart
  Cc: andrew, f.fainelli, davem, Yan Markman, gregory.clement,
	thomas.petazzoni, miquel.raynal, nadavh, mw, stefanc, netdev,
	linux-kernel

Oh, and lastly, please send patches to linux@armlinux.org.uk or the
address I use in the sign-offs - sending them to rmk@armlinux.org.uk
is for personal non-Linux mail only, and has resulted in _all_ of
these messages ending up in my spam folder.

Thanks.

On Tue, Nov 28, 2017 at 03:53:17PM +0000, Russell King wrote:
> On Tue, Nov 28, 2017 at 02:29:32PM +0100, Antoine Tenart wrote:
> > From: Yan Markman <ymarkman@marvell.com>
> 
> Hi, thanks for the patch.
> 
> > When calling successively _connect, _disconnect and _connect again, if
> > the link configuration changed whilst being down from the phylink
> > perspective, the last _connect would stay in an incorrect old speed.
> > Fixes this by setting the link configuration parameters to an unknown
> > value when calling phylink_bringup_phy.
> 
> Under what circumstances does this occur?
> 
> > 
> > Fixes: 9525ae83959b ("phylink: add phylink infrastructure")
> > Signed-off-by: Yan Markman <ymarkman@marvell.com>
> > [Antoine: commit message]
> > Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
> > ---
> >  drivers/net/phy/phylink.c | 10 ++++++++++
> >  1 file changed, 10 insertions(+)
> > 
> > diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
> > index e3bbc70372d3..c2cec3eef67d 100644
> > --- a/drivers/net/phy/phylink.c
> > +++ b/drivers/net/phy/phylink.c
> > @@ -621,6 +621,16 @@ static int phylink_bringup_phy(struct phylink *pl, struct phy_device *phy)
> >  	if (ret)
> >  		return ret;
> >  
> > +	/* On _disconnect, the phy state machine and phylink resolve
> > +	 * are stopped before executing full gracefull down/reset state.
> > +	 * The further _connect starts with incorrect init state. Let's set
> > +	 * init values here.
> > +	 */
> > +	pl->phy_state.link = false;
> > +	pl->link_config.pause = MLO_PAUSE_AN;
> > +	pl->link_config.speed = SPEED_UNKNOWN;
> > +	pl->link_config.duplex = DUPLEX_UNKNOWN;
> 
> It would be much better to clean up the phy_state in
> phylink_disconnect_phy() and trigger a resolve, rather than doing that
> each time a PHY is connected - the link should be taken down when the
> PHY is removed.
> 
> However, I'd like to know under what circumstances this is happening,
> since, if you're hotplugging a PHY you should be doing that via SFP
> which has additional link up/down handling.  What board is this with?
> 
> Also note that there's a number of patches in my "phy" branch that
> I'm intending to send as a result of working with Florian over the
> last few weeks.  There's several people working fairly independently
> in this area and having everyone send patches independently of each
> other could get painful to manage.
> 
> I'm intending to send patches once I know that net-next is open.
> 
> -- 
> Russell King
> ARM architecture Linux Kernel maintainer

-- 
Russell King
ARM architecture Linux Kernel maintainer

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net] net: phylink: fix link state on phy-connect
  2017-11-28 15:56   ` Russell King
@ 2017-11-29  7:22     ` Antoine Tenart
  2017-11-29 19:33     ` [EXT] " Yan Markman
  1 sibling, 0 replies; 19+ messages in thread
From: Antoine Tenart @ 2017-11-29  7:22 UTC (permalink / raw)
  To: Russell King
  Cc: Antoine Tenart, andrew, f.fainelli, davem, Yan Markman,
	gregory.clement, thomas.petazzoni, miquel.raynal, nadavh, mw,
	stefanc, netdev, linux-kernel

Hi Russell,

On Tue, Nov 28, 2017 at 03:56:11PM +0000, Russell King wrote:
> Oh, and lastly, please send patches to linux@armlinux.org.uk or the
> address I use in the sign-offs - sending them to rmk@armlinux.org.uk
> is for personal non-Linux mail only, and has resulted in _all_ of
> these messages ending up in my spam folder.

OK, will do. (Btw, I took the email from the MV88X3310 section of the
maintainers file).

Antoine

-- 
Antoine Ténart, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [EXT] Re: [PATCH net] net: phylink: fix link state on phy-connect
  2017-11-28 15:56   ` Russell King
  2017-11-29  7:22     ` Antoine Tenart
@ 2017-11-29 19:33     ` Yan Markman
  2017-11-29 19:59       ` Russell King - ARM Linux
  1 sibling, 1 reply; 19+ messages in thread
From: Yan Markman @ 2017-11-29 19:33 UTC (permalink / raw)
  To: Russell King, Antoine Tenart
  Cc: andrew, f.fainelli, davem, gregory.clement, thomas.petazzoni,
	miquel.raynal, Nadav Haklai, mw, Stefan Chulski, netdev,
	linux-kernel

Hi Russel

On my board I have [Marvell 88E1510] phy working with STATUS-POLLING
I see some inconsistencies  -- first ifconfig-up is different from furthers, no "link is down" reports.
Please refer the behavior example below.
My patch is a "simple solution"  -- always reset/clear Link-state-parameters before going UP.
Possibly, more correct (but much more complicated) solution would be in the   phy state machine   and   phylink resolve modification.
I just found that 
    On ifconfig-down, the phy-state-machine and phylink-resolve
    are stopped before executing before passing over full graceful down/reset state.
    The further ifconfig-up starts with old state parameters.
Special cases not-tested but logic 2 test-cases are:
   remote side changes speed whilst link is Down or Disconnected. But local ifconfig-up starts with old speed.

Best regards
Yan Markman
----------------------------------------------------
EXAMPLE:
buildroot login: root
~# ifconfig eth1 192.169.0.81 up
[   34.072042] mvpp2 f2000000.ethernet eth1: PHY [f212a200.mdio-mii:01] driver [Marvell 88E1510]
[   34.080654] mvpp2 f2000000.ethernet eth1: configuring for phy/rgmii-id link mode
[   37.220506] mvpp2 f2000000.ethernet eth1: Link is Up - 1Gbps/Full - flow control off

~# ifconfig eth1 down
          No print "link is down"

~# ifconfig eth1 up
          "Link is Up" passed twice:
[   60.748041] mvpp2 f2000000.ethernet eth1: PHY [f212a200.mdio-mii:01] driver [Marvell 88E1510]
[   60.756653] mvpp2 f2000000.ethernet eth1: configuring for phy/rgmii-id link mode
[   60.764169] mvpp2 f2000000.ethernet eth1: Link is Up - 1Gbps/Full - flow control off
[   63.908504] mvpp2 f2000000.ethernet eth1: Link is Up - 1Gbps/Full - flow control off

On Link physical disconnect/break 
          No print "link is down"
          But link is in correct state --   ifconfig UP but not-RUNNING
On Link physical re-connect
[   84.388501] mvpp2 f2000000.ethernet eth1: Link is Up - 1Gbps/Full - flow control off
-------------------------------------------------------------------------------------------
YAN's findings:


Best regards
Yan Markman

-----Original Message-----
From: Russell King [mailto:rmk@armlinux.org.uk] 
Sent: Tuesday, November 28, 2017 5:56 PM
To: Antoine Tenart <antoine.tenart@free-electrons.com>
Cc: andrew@lunn.ch; f.fainelli@gmail.com; davem@davemloft.net; Yan Markman <ymarkman@marvell.com>; gregory.clement@free-electrons.com; thomas.petazzoni@free-electrons.com; miquel.raynal@free-electrons.com; Nadav Haklai <nadavh@marvell.com>; mw@semihalf.com; Stefan Chulski <stefanc@marvell.com>; netdev@vger.kernel.org; linux-kernel@vger.kernel.org
Subject: [EXT] Re: [PATCH net] net: phylink: fix link state on phy-connect

External Email

----------------------------------------------------------------------
Oh, and lastly, please send patches to linux@armlinux.org.uk or the address I use in the sign-offs - sending them to rmk@armlinux.org.uk is for personal non-Linux mail only, and has resulted in _all_ of these messages ending up in my spam folder.

Thanks.

On Tue, Nov 28, 2017 at 03:53:17PM +0000, Russell King wrote:
> On Tue, Nov 28, 2017 at 02:29:32PM +0100, Antoine Tenart wrote:
> > From: Yan Markman <ymarkman@marvell.com>
> 
> Hi, thanks for the patch.
> 
> > When calling successively _connect, _disconnect and _connect again, 
> > if the link configuration changed whilst being down from the phylink 
> > perspective, the last _connect would stay in an incorrect old speed.
> > Fixes this by setting the link configuration parameters to an 
> > unknown value when calling phylink_bringup_phy.
> 
> Under what circumstances does this occur?
> 
> > 
> > Fixes: 9525ae83959b ("phylink: add phylink infrastructure")
> > Signed-off-by: Yan Markman <ymarkman@marvell.com>
> > [Antoine: commit message]
> > Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
> > ---
> >  drivers/net/phy/phylink.c | 10 ++++++++++
> >  1 file changed, 10 insertions(+)
> > 
> > diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c 
> > index e3bbc70372d3..c2cec3eef67d 100644
> > --- a/drivers/net/phy/phylink.c
> > +++ b/drivers/net/phy/phylink.c
> > @@ -621,6 +621,16 @@ static int phylink_bringup_phy(struct phylink *pl, struct phy_device *phy)
> >  	if (ret)
> >  		return ret;
> >  
> > +	/* On _disconnect, the phy state machine and phylink resolve
> > +	 * are stopped before executing full gracefull down/reset state.
> > +	 * The further _connect starts with incorrect init state. Let's set
> > +	 * init values here.
> > +	 */
> > +	pl->phy_state.link = false;
> > +	pl->link_config.pause = MLO_PAUSE_AN;
> > +	pl->link_config.speed = SPEED_UNKNOWN;
> > +	pl->link_config.duplex = DUPLEX_UNKNOWN;
> 
> It would be much better to clean up the phy_state in
> phylink_disconnect_phy() and trigger a resolve, rather than doing that 
> each time a PHY is connected - the link should be taken down when the 
> PHY is removed.
> 
> However, I'd like to know under what circumstances this is happening, 
> since, if you're hotplugging a PHY you should be doing that via SFP 
> which has additional link up/down handling.  What board is this with?
> 
> Also note that there's a number of patches in my "phy" branch that I'm 
> intending to send as a result of working with Florian over the last 
> few weeks.  There's several people working fairly independently in 
> this area and having everyone send patches independently of each other 
> could get painful to manage.
> 
> I'm intending to send patches once I know that net-next is open.
> 
> --
> Russell King
> ARM architecture Linux Kernel maintainer

--
Russell King
ARM architecture Linux Kernel maintainer

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net] net: phylink: fix link state on phy-connect
  2017-11-29 19:33     ` [EXT] " Yan Markman
@ 2017-11-29 19:59       ` Russell King - ARM Linux
  2017-11-29 21:06         ` [EXT] " Yan Markman
  0 siblings, 1 reply; 19+ messages in thread
From: Russell King - ARM Linux @ 2017-11-29 19:59 UTC (permalink / raw)
  To: Yan Markman
  Cc: Antoine Tenart, andrew, f.fainelli, davem, gregory.clement,
	thomas.petazzoni, miquel.raynal, Nadav Haklai, mw,
	Stefan Chulski, netdev, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2076 bytes --]

On Wed, Nov 29, 2017 at 07:33:44PM +0000, Yan Markman wrote:
> Hi Russel
> 
> On my board I have [Marvell 88E1510] phy working with STATUS-POLLING
> I see some inconsistencies  -- first ifconfig-up is different from furthers, no "link is down" reports.
> Please refer the behavior example below.
> My patch is a "simple solution"  -- always reset/clear Link-state-parameters before going UP.
> Possibly, more correct (but much more complicated) solution would be in the   phy state machine   and   phylink resolve modification.
> I just found that 
>     On ifconfig-down, the phy-state-machine and phylink-resolve
>     are stopped before executing before passing over full graceful down/reset state.
>     The further ifconfig-up starts with old state parameters.
> Special cases not-tested but logic 2 test-cases are:
>    remote side changes speed whilst link is Down or Disconnected. But local ifconfig-up starts with old speed.

Hi,

I think this is covered in my "phy" branch - but could probably do with
further testing, specifically this patch (which I've attached):

"phylink: ensure we take the link down when phylink_stop() is called"

This takes the link down on the MAC side synchronously when phylink_stop()
is called.  However, I think your case might also benefit from this
patch - please test the patch referred to without this change, and let
me know if you need this change to solve your problem:

diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index 8f43f8779317..c90ad50204b0 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -798,6 +798,7 @@ void phylink_disconnect_phy(struct phylink *pl)
 		mutex_lock(&pl->state_mutex);
 		pl->netdev->phydev = NULL;
 		pl->phydev = NULL;
+		pl->phy_state.link = false;
 		mutex_unlock(&pl->state_mutex);
 		mutex_unlock(&phy->lock);
 		flush_work(&pl->resolve);

Thanks.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

[-- Attachment #2: p21 --]
[-- Type: text/plain, Size: 909 bytes --]

From: Russell King <rmk+kernel@armlinux.org.uk>
Subject: [PATCH] phylink: ensure we take the link down when phylink_stop() is
 called

Ensure that we tell the MAC to take the link down when phylink_stop()
is called.

Reported-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
---
 drivers/net/phy/phylink.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index fb7ae3f925f8..cb446b8acac2 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -880,6 +880,7 @@ void phylink_stop(struct phylink *pl)
 		sfp_upstream_stop(pl->sfp_bus);
 
 	set_bit(PHYLINK_DISABLE_STOPPED, &pl->phylink_disable_state);
+	queue_work(system_power_efficient_wq, &pl->resolve);
 	flush_work(&pl->resolve);
 }
 EXPORT_SYMBOL_GPL(phylink_stop);
-- 
2.7.4


^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [EXT] Re: [PATCH net] net: phylink: fix link state on phy-connect
  2017-11-29 19:59       ` Russell King - ARM Linux
@ 2017-11-29 21:06         ` Yan Markman
  2017-11-29 21:20           ` Russell King - ARM Linux
  0 siblings, 1 reply; 19+ messages in thread
From: Yan Markman @ 2017-11-29 21:06 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Antoine Tenart, andrew, f.fainelli, davem, gregory.clement,
	thomas.petazzoni, miquel.raynal, Nadav Haklai, mw,
	Stefan Chulski, netdev, linux-kernel

The attached p21 patch doesn't change anything.
But another one from the mail-text is good
	void phylink_disconnect_phy(struct phylink *pl)
	+		pl->phy_state.link = false;

There still (not for my MRVL-PP2) problem:
	It is expected that on  ifconfig-down the callback
		pl->ops->mac_link_down(ndev, pl->link_an_mode);
would be called, but it isn't


-----Original Message-----
From: Russell King - ARM Linux [mailto:linux@armlinux.org.uk] 
Sent: Wednesday, November 29, 2017 9:59 PM
To: Yan Markman <ymarkman@marvell.com>
Cc: Antoine Tenart <antoine.tenart@free-electrons.com>; andrew@lunn.ch; f.fainelli@gmail.com; davem@davemloft.net; gregory.clement@free-electrons.com; thomas.petazzoni@free-electrons.com; miquel.raynal@free-electrons.com; Nadav Haklai <nadavh@marvell.com>; mw@semihalf.com; Stefan Chulski <stefanc@marvell.com>; netdev@vger.kernel.org; linux-kernel@vger.kernel.org
Subject: [EXT] Re: [PATCH net] net: phylink: fix link state on phy-connect

External Email

----------------------------------------------------------------------
On Wed, Nov 29, 2017 at 07:33:44PM +0000, Yan Markman wrote:
> Hi Russel
> 
> On my board I have [Marvell 88E1510] phy working with STATUS-POLLING I 
> see some inconsistencies  -- first ifconfig-up is different from furthers, no "link is down" reports.
> Please refer the behavior example below.
> My patch is a "simple solution"  -- always reset/clear Link-state-parameters before going UP.
> Possibly, more correct (but much more complicated) solution would be in the   phy state machine   and   phylink resolve modification.
> I just found that 
>     On ifconfig-down, the phy-state-machine and phylink-resolve
>     are stopped before executing before passing over full graceful down/reset state.
>     The further ifconfig-up starts with old state parameters.
> Special cases not-tested but logic 2 test-cases are:
>    remote side changes speed whilst link is Down or Disconnected. But local ifconfig-up starts with old speed.

Hi,

I think this is covered in my "phy" branch - but could probably do with further testing, specifically this patch (which I've attached):

"phylink: ensure we take the link down when phylink_stop() is called"

This takes the link down on the MAC side synchronously when phylink_stop() is called.  However, I think your case might also benefit from this patch - please test the patch referred to without this change, and let me know if you need this change to solve your problem:

diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c index 8f43f8779317..c90ad50204b0 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -798,6 +798,7 @@ void phylink_disconnect_phy(struct phylink *pl)
 		mutex_lock(&pl->state_mutex);
 		pl->netdev->phydev = NULL;
 		pl->phydev = NULL;
+		pl->phy_state.link = false;
 		mutex_unlock(&pl->state_mutex);
 		mutex_unlock(&phy->lock);
 		flush_work(&pl->resolve);

Thanks.

--
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [EXT] Re: [PATCH net] net: phylink: fix link state on phy-connect
  2017-11-29 21:06         ` [EXT] " Yan Markman
@ 2017-11-29 21:20           ` Russell King - ARM Linux
  2017-11-30  8:51             ` Yan Markman
  0 siblings, 1 reply; 19+ messages in thread
From: Russell King - ARM Linux @ 2017-11-29 21:20 UTC (permalink / raw)
  To: Yan Markman
  Cc: Antoine Tenart, andrew, f.fainelli, davem, gregory.clement,
	thomas.petazzoni, miquel.raynal, Nadav Haklai, mw,
	Stefan Chulski, netdev, linux-kernel

On Wed, Nov 29, 2017 at 09:06:56PM +0000, Yan Markman wrote:
> The attached p21 patch doesn't change anything.
> But another one from the mail-text is good
> 	void phylink_disconnect_phy(struct phylink *pl)
> 	+		pl->phy_state.link = false;
> 
> There still (not for my MRVL-PP2) problem:
> 	It is expected that on  ifconfig-down the callback
> 		pl->ops->mac_link_down(ndev, pl->link_an_mode);
> would be called, but it isn't

Are you calling phylink_stop() or are you just calling phylink_disconnect() ?

You must call phylink_stop() prior to phylink_disconnect().  This
probably explains why the p21 patch did nothing.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [EXT] Re: [PATCH net] net: phylink: fix link state on phy-connect
  2017-11-29 21:20           ` Russell King - ARM Linux
@ 2017-11-30  8:51             ` Yan Markman
  2017-11-30 10:10               ` Russell King - ARM Linux
  0 siblings, 1 reply; 19+ messages in thread
From: Yan Markman @ 2017-11-30  8:51 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Antoine Tenart, andrew, f.fainelli, davem, gregory.clement,
	thomas.petazzoni, miquel.raynal, Nadav Haklai, mw,
	Stefan Chulski, netdev, linux-kernel

The phylink_stop is called before phylink_disconnect_phy
You could see in mvpp2.c:

mvpp2_stop_dev() {
	phylink_stop(port->phylink); 
}

mvpp2_stop()       { 
	mvpp2_stop_dev(port);
	phylink_disconnect_phy(port->phylink);
}

.ndo_stop = mvpp2_stop,

-----Original Message-----
From: Russell King - ARM Linux [mailto:linux@armlinux.org.uk] 
Sent: Wednesday, November 29, 2017 11:21 PM
To: Yan Markman <ymarkman@marvell.com>
Cc: Antoine Tenart <antoine.tenart@free-electrons.com>; andrew@lunn.ch; f.fainelli@gmail.com; davem@davemloft.net; gregory.clement@free-electrons.com; thomas.petazzoni@free-electrons.com; miquel.raynal@free-electrons.com; Nadav Haklai <nadavh@marvell.com>; mw@semihalf.com; Stefan Chulski <stefanc@marvell.com>; netdev@vger.kernel.org; linux-kernel@vger.kernel.org
Subject: Re: [EXT] Re: [PATCH net] net: phylink: fix link state on phy-connect

On Wed, Nov 29, 2017 at 09:06:56PM +0000, Yan Markman wrote:
> The attached p21 patch doesn't change anything.
> But another one from the mail-text is good
> 	void phylink_disconnect_phy(struct phylink *pl)
> 	+		pl->phy_state.link = false;
> 
> There still (not for my MRVL-PP2) problem:
> 	It is expected that on  ifconfig-down the callback
> 		pl->ops->mac_link_down(ndev, pl->link_an_mode); would be called, but 
> it isn't

Are you calling phylink_stop() or are you just calling phylink_disconnect() ?

You must call phylink_stop() prior to phylink_disconnect().  This probably explains why the p21 patch did nothing.

--
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [EXT] Re: [PATCH net] net: phylink: fix link state on phy-connect
  2017-11-30  8:51             ` Yan Markman
@ 2017-11-30 10:10               ` Russell King - ARM Linux
  2017-11-30 13:28                 ` Russell King - ARM Linux
  0 siblings, 1 reply; 19+ messages in thread
From: Russell King - ARM Linux @ 2017-11-30 10:10 UTC (permalink / raw)
  To: Yan Markman
  Cc: Antoine Tenart, andrew, f.fainelli, davem, gregory.clement,
	thomas.petazzoni, miquel.raynal, Nadav Haklai, mw,
	Stefan Chulski, netdev, linux-kernel

On Thu, Nov 30, 2017 at 08:51:21AM +0000, Yan Markman wrote:
> The phylink_stop is called before phylink_disconnect_phy
> You could see in mvpp2.c:
> 
> mvpp2_stop_dev() {
> 	phylink_stop(port->phylink); 
> }
> 
> mvpp2_stop()       { 
> 	mvpp2_stop_dev(port);
> 	phylink_disconnect_phy(port->phylink);
> }
> 
> .ndo_stop = mvpp2_stop,

Sorry, I don't have this in mvpp2.c, so I have no visibility of what
you're working with.

What you have above looks correct, and I see no reason why the p21
patch would not have resolved your issue.  The p21 patch ensures
that phylink_resolve() gets called and completes before phylink_stop()
returns.  In that case, phylink_resolve() will call the mac_link_down()
method if the link is not already down.  It will also print the "Link
is Down" message.

Florian has already tested this patch after encountering a similar
issue, and has reported that it solves the problem for him.  I've also
tested it with mvneta, and the original mvpp2x driver on Macchiatobin.

Maybe there's something different about mvpp2, but as I have no
visibility of that driver and the modifications therein, I can't
comment further other than stating that it works for three different
implementations.

Maybe you could try and work out what's going on with the p21 patch
in your case?

> -----Original Message-----
> From: Russell King - ARM Linux [mailto:linux@armlinux.org.uk] 
> Sent: Wednesday, November 29, 2017 11:21 PM
> To: Yan Markman <ymarkman@marvell.com>
> Cc: Antoine Tenart <antoine.tenart@free-electrons.com>; andrew@lunn.ch; f.fainelli@gmail.com; davem@davemloft.net; gregory.clement@free-electrons.com; thomas.petazzoni@free-electrons.com; miquel.raynal@free-electrons.com; Nadav Haklai <nadavh@marvell.com>; mw@semihalf.com; Stefan Chulski <stefanc@marvell.com>; netdev@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [EXT] Re: [PATCH net] net: phylink: fix link state on phy-connect
> 
> On Wed, Nov 29, 2017 at 09:06:56PM +0000, Yan Markman wrote:
> > The attached p21 patch doesn't change anything.
> > But another one from the mail-text is good
> > 	void phylink_disconnect_phy(struct phylink *pl)
> > 	+		pl->phy_state.link = false;
> > 
> > There still (not for my MRVL-PP2) problem:
> > 	It is expected that on  ifconfig-down the callback
> > 		pl->ops->mac_link_down(ndev, pl->link_an_mode); would be called, but 
> > it isn't
> 
> Are you calling phylink_stop() or are you just calling phylink_disconnect() ?
> 
> You must call phylink_stop() prior to phylink_disconnect().  This probably explains why the p21 patch did nothing.
> 
> --
> RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
> FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [EXT] Re: [PATCH net] net: phylink: fix link state on phy-connect
  2017-11-30 10:10               ` Russell King - ARM Linux
@ 2017-11-30 13:28                 ` Russell King - ARM Linux
  2017-12-01 17:07                   ` Grygorii Strashko
  0 siblings, 1 reply; 19+ messages in thread
From: Russell King - ARM Linux @ 2017-11-30 13:28 UTC (permalink / raw)
  To: Yan Markman
  Cc: Antoine Tenart, andrew, f.fainelli, davem, gregory.clement,
	thomas.petazzoni, miquel.raynal, Nadav Haklai, mw,
	Stefan Chulski, netdev, linux-kernel

On Thu, Nov 30, 2017 at 10:10:18AM +0000, Russell King - ARM Linux wrote:
> On Thu, Nov 30, 2017 at 08:51:21AM +0000, Yan Markman wrote:
> > The phylink_stop is called before phylink_disconnect_phy
> > You could see in mvpp2.c:
> > 
> > mvpp2_stop_dev() {
> > 	phylink_stop(port->phylink); 
> > }
> > 
> > mvpp2_stop()       { 
> > 	mvpp2_stop_dev(port);
> > 	phylink_disconnect_phy(port->phylink);
> > }
> > 
> > .ndo_stop = mvpp2_stop,
> 
> Sorry, I don't have this in mvpp2.c, so I have no visibility of what
> you're working with.
> 
> What you have above looks correct, and I see no reason why the p21
> patch would not have resolved your issue.  The p21 patch ensures
> that phylink_resolve() gets called and completes before phylink_stop()
> returns.  In that case, phylink_resolve() will call the mac_link_down()
> method if the link is not already down.  It will also print the "Link
> is Down" message.
> 
> Florian has already tested this patch after encountering a similar
> issue, and has reported that it solves the problem for him.  I've also
> tested it with mvneta, and the original mvpp2x driver on Macchiatobin.
> 
> Maybe there's something different about mvpp2, but as I have no
> visibility of that driver and the modifications therein, I can't
> comment further other than stating that it works for three different
> implementations.
> 
> Maybe you could try and work out what's going on with the p21 patch
> in your case?

I think I now realise what's probably going on.

If you call netif_carrier_off() before phylink_stop(), then phylink will
believe that the link is already down, and so it won't bother calling
mac_link_down() - it will believe that the link is already down.

I'll update the documentation for phylink_stop() to spell out this
aspect.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [EXT] Re: [PATCH net] net: phylink: fix link state on phy-connect
  2017-11-30 13:28                 ` Russell King - ARM Linux
@ 2017-12-01 17:07                   ` Grygorii Strashko
  2017-12-01 17:24                     ` Russell King - ARM Linux
  0 siblings, 1 reply; 19+ messages in thread
From: Grygorii Strashko @ 2017-12-01 17:07 UTC (permalink / raw)
  To: Russell King - ARM Linux, Yan Markman
  Cc: Antoine Tenart, andrew, f.fainelli, davem, gregory.clement,
	thomas.petazzoni, miquel.raynal, Nadav Haklai, mw,
	Stefan Chulski, netdev, linux-kernel

Hi Russell,

On 11/30/2017 07:28 AM, Russell King - ARM Linux wrote:
> On Thu, Nov 30, 2017 at 10:10:18AM +0000, Russell King - ARM Linux wrote:
>> On Thu, Nov 30, 2017 at 08:51:21AM +0000, Yan Markman wrote:
>>> The phylink_stop is called before phylink_disconnect_phy
>>> You could see in mvpp2.c:
>>>
>>> mvpp2_stop_dev() {
>>> 	phylink_stop(port->phylink);
>>> }
>>>
>>> mvpp2_stop()       {
>>> 	mvpp2_stop_dev(port);
>>> 	phylink_disconnect_phy(port->phylink);
>>> }
>>>
>>> .ndo_stop = mvpp2_stop,
>>
>> Sorry, I don't have this in mvpp2.c, so I have no visibility of what
>> you're working with.
>>
>> What you have above looks correct, and I see no reason why the p21
>> patch would not have resolved your issue.  The p21 patch ensures
>> that phylink_resolve() gets called and completes before phylink_stop()
>> returns.  In that case, phylink_resolve() will call the mac_link_down()
>> method if the link is not already down.  It will also print the "Link
>> is Down" message.
>>
>> Florian has already tested this patch after encountering a similar
>> issue, and has reported that it solves the problem for him.  I've also
>> tested it with mvneta, and the original mvpp2x driver on Macchiatobin.
>>
>> Maybe there's something different about mvpp2, but as I have no
>> visibility of that driver and the modifications therein, I can't
>> comment further other than stating that it works for three different
>> implementations.
>>
>> Maybe you could try and work out what's going on with the p21 patch
>> in your case?
> 
> I think I now realise what's probably going on.
> 
> If you call netif_carrier_off() before phylink_stop(), then phylink will
> believe that the link is already down, and so it won't bother calling
> mac_link_down() - it will believe that the link is already down.
> 
> I'll update the documentation for phylink_stop() to spell out this
> aspect.
> 

There are pretty high number of net drivers which do call
	netif_carrier_off(dev);
before
	phy_stop(dev->phydev);
in .ndo_stop() callback.

As per you comment this seems to be incorrect, so should such calls be removed?

-- 
regards,
-grygorii

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [EXT] Re: [PATCH net] net: phylink: fix link state on phy-connect
  2017-12-01 17:07                   ` Grygorii Strashko
@ 2017-12-01 17:24                     ` Russell King - ARM Linux
  2017-12-01 17:36                       ` Florian Fainelli
  0 siblings, 1 reply; 19+ messages in thread
From: Russell King - ARM Linux @ 2017-12-01 17:24 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: Yan Markman, Antoine Tenart, andrew, f.fainelli, davem,
	gregory.clement, thomas.petazzoni, miquel.raynal, Nadav Haklai,
	mw, Stefan Chulski, netdev, linux-kernel

On Fri, Dec 01, 2017 at 11:07:22AM -0600, Grygorii Strashko wrote:
> Hi Russell,
> 
> On 11/30/2017 07:28 AM, Russell King - ARM Linux wrote:
> > On Thu, Nov 30, 2017 at 10:10:18AM +0000, Russell King - ARM Linux wrote:
> >> On Thu, Nov 30, 2017 at 08:51:21AM +0000, Yan Markman wrote:
> >>> The phylink_stop is called before phylink_disconnect_phy
> >>> You could see in mvpp2.c:
> >>>
> >>> mvpp2_stop_dev() {
> >>> 	phylink_stop(port->phylink);
> >>> }
> >>>
> >>> mvpp2_stop()       {
> >>> 	mvpp2_stop_dev(port);
> >>> 	phylink_disconnect_phy(port->phylink);
> >>> }
> >>>
> >>> .ndo_stop = mvpp2_stop,
> >>
> >> Sorry, I don't have this in mvpp2.c, so I have no visibility of what
> >> you're working with.
> >>
> >> What you have above looks correct, and I see no reason why the p21
> >> patch would not have resolved your issue.  The p21 patch ensures
> >> that phylink_resolve() gets called and completes before phylink_stop()
> >> returns.  In that case, phylink_resolve() will call the mac_link_down()
> >> method if the link is not already down.  It will also print the "Link
> >> is Down" message.
> >>
> >> Florian has already tested this patch after encountering a similar
> >> issue, and has reported that it solves the problem for him.  I've also
> >> tested it with mvneta, and the original mvpp2x driver on Macchiatobin.
> >>
> >> Maybe there's something different about mvpp2, but as I have no
> >> visibility of that driver and the modifications therein, I can't
> >> comment further other than stating that it works for three different
> >> implementations.
> >>
> >> Maybe you could try and work out what's going on with the p21 patch
> >> in your case?
> > 
> > I think I now realise what's probably going on.
> > 
> > If you call netif_carrier_off() before phylink_stop(), then phylink will
> > believe that the link is already down, and so it won't bother calling
> > mac_link_down() - it will believe that the link is already down.
> > 
> > I'll update the documentation for phylink_stop() to spell out this
> > aspect.
> > 
> 
> There are pretty high number of net drivers which do call
> 	netif_carrier_off(dev);
> before
> 	phy_stop(dev->phydev);
> in .ndo_stop() callback.
> 
> As per you comment this seems to be incorrect, so should such calls be
> removed?

Well, I think the question that needs to be asked is this:

  Is calling netif_carrier_off() before phy_stop() safe?

Well, reading the phylib code, this is the answer I've come to:

  Between phy_start() and phy_stop(), phylib is free to manage the
  carrier state itself through the phylib state machine.

  This means if you call netif_carrier_off() prior to phy_stop(),
  there is nothing preventing the phylib state machine from running,
  and a co-incident poll of the PHY could notice that the link has
  come up, and re-enable the carrier while your ndo_stop() method
  is still running.

So, my conclusion is that this practice is provably racy, though
it's probably not that easy to trigger the race (which is probably
why no one has reported the problem.)

Given that it's racy, it's not something that I think phylink should
care about, and should "softly" discourage it.  So, I'm happy with
what phylink is doing here, and I suggest fixing the drivers for
this race.

In any case, it should result in less code in the drivers - since
the work you need to do when the link goes down is a subset of the
work you need to do when the network interface is taken down.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [EXT] Re: [PATCH net] net: phylink: fix link state on phy-connect
  2017-12-01 17:24                     ` Russell King - ARM Linux
@ 2017-12-01 17:36                       ` Florian Fainelli
  2017-12-01 17:47                         ` Russell King - ARM Linux
  0 siblings, 1 reply; 19+ messages in thread
From: Florian Fainelli @ 2017-12-01 17:36 UTC (permalink / raw)
  To: Russell King - ARM Linux, Grygorii Strashko
  Cc: Yan Markman, Antoine Tenart, andrew, davem, gregory.clement,
	thomas.petazzoni, miquel.raynal, Nadav Haklai, mw,
	Stefan Chulski, netdev, linux-kernel

On 12/01/2017 09:24 AM, Russell King - ARM Linux wrote:
> On Fri, Dec 01, 2017 at 11:07:22AM -0600, Grygorii Strashko wrote:
>> Hi Russell,
>>
>> On 11/30/2017 07:28 AM, Russell King - ARM Linux wrote:
>>> On Thu, Nov 30, 2017 at 10:10:18AM +0000, Russell King - ARM Linux wrote:
>>>> On Thu, Nov 30, 2017 at 08:51:21AM +0000, Yan Markman wrote:
>>>>> The phylink_stop is called before phylink_disconnect_phy
>>>>> You could see in mvpp2.c:
>>>>>
>>>>> mvpp2_stop_dev() {
>>>>> 	phylink_stop(port->phylink);
>>>>> }
>>>>>
>>>>> mvpp2_stop()       {
>>>>> 	mvpp2_stop_dev(port);
>>>>> 	phylink_disconnect_phy(port->phylink);
>>>>> }
>>>>>
>>>>> .ndo_stop = mvpp2_stop,
>>>>
>>>> Sorry, I don't have this in mvpp2.c, so I have no visibility of what
>>>> you're working with.
>>>>
>>>> What you have above looks correct, and I see no reason why the p21
>>>> patch would not have resolved your issue.  The p21 patch ensures
>>>> that phylink_resolve() gets called and completes before phylink_stop()
>>>> returns.  In that case, phylink_resolve() will call the mac_link_down()
>>>> method if the link is not already down.  It will also print the "Link
>>>> is Down" message.
>>>>
>>>> Florian has already tested this patch after encountering a similar
>>>> issue, and has reported that it solves the problem for him.  I've also
>>>> tested it with mvneta, and the original mvpp2x driver on Macchiatobin.
>>>>
>>>> Maybe there's something different about mvpp2, but as I have no
>>>> visibility of that driver and the modifications therein, I can't
>>>> comment further other than stating that it works for three different
>>>> implementations.
>>>>
>>>> Maybe you could try and work out what's going on with the p21 patch
>>>> in your case?
>>>
>>> I think I now realise what's probably going on.
>>>
>>> If you call netif_carrier_off() before phylink_stop(), then phylink will
>>> believe that the link is already down, and so it won't bother calling
>>> mac_link_down() - it will believe that the link is already down.
>>>
>>> I'll update the documentation for phylink_stop() to spell out this
>>> aspect.
>>>
>>
>> There are pretty high number of net drivers which do call
>> 	netif_carrier_off(dev);
>> before
>> 	phy_stop(dev->phydev);
>> in .ndo_stop() callback.
>>
>> As per you comment this seems to be incorrect, so should such calls be
>> removed?
> 
> Well, I think the question that needs to be asked is this:
> 
>   Is calling netif_carrier_off() before phy_stop() safe?
> 
> Well, reading the phylib code, this is the answer I've come to:
> 
>   Between phy_start() and phy_stop(), phylib is free to manage the
>   carrier state itself through the phylib state machine.
> 
>   This means if you call netif_carrier_off() prior to phy_stop(),
>   there is nothing preventing the phylib state machine from running,
>   and a co-incident poll of the PHY could notice that the link has
>   come up, and re-enable the carrier while your ndo_stop() method
>   is still running.
> 
> So, my conclusion is that this practice is provably racy, though
> it's probably not that easy to trigger the race (which is probably
> why no one has reported the problem.)
> 
> Given that it's racy, it's not something that I think phylink should
> care about, and should "softly" discourage it.  So, I'm happy with
> what phylink is doing here, and I suggest fixing the drivers for
> this race.
> 
> In any case, it should result in less code in the drivers - since
> the work you need to do when the link goes down is a subset of the
> work you need to do when the network interface is taken down.
> 

While I agree with all of what written before, in practice, calling
netif_carrier_off() when using PHYLIB can cause inconsistent carrier
states at most, but it would not be messing the state machine itself
because PHYLIB does not make uses of netif_carrier_ok() to make any
decisions as whether the link has dropped or not, it bases its
information solely on phydev->link.

This is not true with PHYLINK, which is why the problem was observed here.
-- 
Florian

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [EXT] Re: [PATCH net] net: phylink: fix link state on phy-connect
  2017-12-01 17:36                       ` Florian Fainelli
@ 2017-12-01 17:47                         ` Russell King - ARM Linux
  2017-12-02 11:08                           ` Yan Markman
  0 siblings, 1 reply; 19+ messages in thread
From: Russell King - ARM Linux @ 2017-12-01 17:47 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Grygorii Strashko, Yan Markman, Antoine Tenart, andrew, davem,
	gregory.clement, thomas.petazzoni, miquel.raynal, Nadav Haklai,
	mw, Stefan Chulski, netdev, linux-kernel

On Fri, Dec 01, 2017 at 09:36:42AM -0800, Florian Fainelli wrote:
> On 12/01/2017 09:24 AM, Russell King - ARM Linux wrote:
> > On Fri, Dec 01, 2017 at 11:07:22AM -0600, Grygorii Strashko wrote:
> >> Hi Russell,
> >>
> >> On 11/30/2017 07:28 AM, Russell King - ARM Linux wrote:
> >>> On Thu, Nov 30, 2017 at 10:10:18AM +0000, Russell King - ARM Linux wrote:
> >>>> On Thu, Nov 30, 2017 at 08:51:21AM +0000, Yan Markman wrote:
> >>>>> The phylink_stop is called before phylink_disconnect_phy
> >>>>> You could see in mvpp2.c:
> >>>>>
> >>>>> mvpp2_stop_dev() {
> >>>>> 	phylink_stop(port->phylink);
> >>>>> }
> >>>>>
> >>>>> mvpp2_stop()       {
> >>>>> 	mvpp2_stop_dev(port);
> >>>>> 	phylink_disconnect_phy(port->phylink);
> >>>>> }
> >>>>>
> >>>>> .ndo_stop = mvpp2_stop,
> >>>>
> >>>> Sorry, I don't have this in mvpp2.c, so I have no visibility of what
> >>>> you're working with.
> >>>>
> >>>> What you have above looks correct, and I see no reason why the p21
> >>>> patch would not have resolved your issue.  The p21 patch ensures
> >>>> that phylink_resolve() gets called and completes before phylink_stop()
> >>>> returns.  In that case, phylink_resolve() will call the mac_link_down()
> >>>> method if the link is not already down.  It will also print the "Link
> >>>> is Down" message.
> >>>>
> >>>> Florian has already tested this patch after encountering a similar
> >>>> issue, and has reported that it solves the problem for him.  I've also
> >>>> tested it with mvneta, and the original mvpp2x driver on Macchiatobin.
> >>>>
> >>>> Maybe there's something different about mvpp2, but as I have no
> >>>> visibility of that driver and the modifications therein, I can't
> >>>> comment further other than stating that it works for three different
> >>>> implementations.
> >>>>
> >>>> Maybe you could try and work out what's going on with the p21 patch
> >>>> in your case?
> >>>
> >>> I think I now realise what's probably going on.
> >>>
> >>> If you call netif_carrier_off() before phylink_stop(), then phylink will
> >>> believe that the link is already down, and so it won't bother calling
> >>> mac_link_down() - it will believe that the link is already down.
> >>>
> >>> I'll update the documentation for phylink_stop() to spell out this
> >>> aspect.
> >>>
> >>
> >> There are pretty high number of net drivers which do call
> >> 	netif_carrier_off(dev);
> >> before
> >> 	phy_stop(dev->phydev);
> >> in .ndo_stop() callback.
> >>
> >> As per you comment this seems to be incorrect, so should such calls be
> >> removed?
> > 
> > Well, I think the question that needs to be asked is this:
> > 
> >   Is calling netif_carrier_off() before phy_stop() safe?
> > 
> > Well, reading the phylib code, this is the answer I've come to:
> > 
> >   Between phy_start() and phy_stop(), phylib is free to manage the
> >   carrier state itself through the phylib state machine.
> > 
> >   This means if you call netif_carrier_off() prior to phy_stop(),
> >   there is nothing preventing the phylib state machine from running,
> >   and a co-incident poll of the PHY could notice that the link has
> >   come up, and re-enable the carrier while your ndo_stop() method
> >   is still running.
> > 
> > So, my conclusion is that this practice is provably racy, though
> > it's probably not that easy to trigger the race (which is probably
> > why no one has reported the problem.)
> > 
> > Given that it's racy, it's not something that I think phylink should
> > care about, and should "softly" discourage it.  So, I'm happy with
> > what phylink is doing here, and I suggest fixing the drivers for
> > this race.
> > 
> > In any case, it should result in less code in the drivers - since
> > the work you need to do when the link goes down is a subset of the
> > work you need to do when the network interface is taken down.
> > 
> 
> While I agree with all of what written before, in practice, calling
> netif_carrier_off() when using PHYLIB can cause inconsistent carrier
> states at most, but it would not be messing the state machine itself
> because PHYLIB does not make uses of netif_carrier_ok() to make any
> decisions as whether the link has dropped or not, it bases its
> information solely on phydev->link.

Indeed, but the point I'm making is that this sequence is very
possible with drivers that mess about by fiddling with stuff
before they call phy_stop():

	CPU0					CPU1
	netif_carrier_off()
	mvpp2_egress_disable()
						phy_state_machine()
						 (phydev->state = PHY_AN)
						phy_link_up()
						phy_link_change()
						netif_carrier_on()
						mvpp2_link_event()
						mvpp2_egress_enable()
						mvpp2_ingress_enable()
	mvpp2_port_disable()
	phy_stop(ndev->phydev)

At this point, egress has not been disabled as mvpp2_stop_dev() wants,
because the phylib state machine got in before it was stopped, called
the adjust link function which then had the effect of re-enabling the
egress.

If that doesn't matter, then what's the point of the
mvpp2_egress_disable() call in the mvpp2_stop_dev() path... either
it matters and the mvpp2_stop_dev() sequence is broken, or it doesn't
matter and some the work that mvpp2_stop_dev() is doing is unnecessary.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [EXT] Re: [PATCH net] net: phylink: fix link state on phy-connect
  2017-12-01 17:47                         ` Russell King - ARM Linux
@ 2017-12-02 11:08                           ` Yan Markman
  2017-12-02 14:58                             ` Russell King - ARM Linux
  0 siblings, 1 reply; 19+ messages in thread
From: Yan Markman @ 2017-12-02 11:08 UTC (permalink / raw)
  To: Russell King - ARM Linux, Florian Fainelli
  Cc: Grygorii Strashko, Antoine Tenart, andrew, davem,
	gregory.clement, thomas.petazzoni, miquel.raynal, Nadav Haklai,
	mw, Stefan Chulski, netdev, linux-kernel

Hi Russel
The Grygorii has raised one Additional point (about netif_carrier_off) I just didn't want to start before finishing the previous one.
On ifconfig-down the mac_config() called but with LINK=0. 
The config has no any knowledge what is intention -- up or down and should be done under disabled ingress/egress,
       and so the mac_config one of its action is    netif_carrier_off.

After calling mac_config() the phylink checks  if (!link  &&  !netif_carrier_ok()) and decides to abort further down since all-done...

REMOVE netif_carrier_off looks like correct BUT has cases where de driver stops to works properly (sorry, I can't remember now what exactly).
So finally I have placed there the CONDITIONAL carrier-off depending upon link:

static void mvpp2_mac_config(){
	if (state->link)        --- occasionally is TRUE on UP but FALSE on down
		netif_carrier_off(port->dev);//YANM

BTW: It's seems your below patch should be present anyway.
+++ b/drivers/net/phy/phylink.c
@@ -798,6 +798,7 @@ void phylink_disconnect_phy(struct phylink *pl)
+		pl->phy_state.link = false;

Thank you
Best regards
Yan Markman

-----Original Message-----
From: Russell King - ARM Linux [mailto:linux@armlinux.org.uk] 
Sent: Friday, December 01, 2017 7:48 PM
To: Florian Fainelli <f.fainelli@gmail.com>
Cc: Grygorii Strashko <grygorii.strashko@ti.com>; Yan Markman <ymarkman@marvell.com>; Antoine Tenart <antoine.tenart@free-electrons.com>; andrew@lunn.ch; davem@davemloft.net; gregory.clement@free-electrons.com; thomas.petazzoni@free-electrons.com; miquel.raynal@free-electrons.com; Nadav Haklai <nadavh@marvell.com>; mw@semihalf.com; Stefan Chulski <stefanc@marvell.com>; netdev@vger.kernel.org; linux-kernel@vger.kernel.org
Subject: Re: [EXT] Re: [PATCH net] net: phylink: fix link state on phy-connect

On Fri, Dec 01, 2017 at 09:36:42AM -0800, Florian Fainelli wrote:
> On 12/01/2017 09:24 AM, Russell King - ARM Linux wrote:
> > On Fri, Dec 01, 2017 at 11:07:22AM -0600, Grygorii Strashko wrote:
> >> Hi Russell,
> >>
> >> On 11/30/2017 07:28 AM, Russell King - ARM Linux wrote:
> >>> On Thu, Nov 30, 2017 at 10:10:18AM +0000, Russell King - ARM Linux wrote:
> >>>> On Thu, Nov 30, 2017 at 08:51:21AM +0000, Yan Markman wrote:
> >>>>> The phylink_stop is called before phylink_disconnect_phy You 
> >>>>> could see in mvpp2.c:
> >>>>>
> >>>>> mvpp2_stop_dev() {
> >>>>> 	phylink_stop(port->phylink);
> >>>>> }
> >>>>>
> >>>>> mvpp2_stop()       {
> >>>>> 	mvpp2_stop_dev(port);
> >>>>> 	phylink_disconnect_phy(port->phylink);
> >>>>> }
> >>>>>
> >>>>> .ndo_stop = mvpp2_stop,
> >>>>
> >>>> Sorry, I don't have this in mvpp2.c, so I have no visibility of 
> >>>> what you're working with.
> >>>>
> >>>> What you have above looks correct, and I see no reason why the 
> >>>> p21 patch would not have resolved your issue.  The p21 patch 
> >>>> ensures that phylink_resolve() gets called and completes before 
> >>>> phylink_stop() returns.  In that case, phylink_resolve() will 
> >>>> call the mac_link_down() method if the link is not already down.  
> >>>> It will also print the "Link is Down" message.
> >>>>
> >>>> Florian has already tested this patch after encountering a 
> >>>> similar issue, and has reported that it solves the problem for 
> >>>> him.  I've also tested it with mvneta, and the original mvpp2x driver on Macchiatobin.
> >>>>
> >>>> Maybe there's something different about mvpp2, but as I have no 
> >>>> visibility of that driver and the modifications therein, I can't 
> >>>> comment further other than stating that it works for three 
> >>>> different implementations.
> >>>>
> >>>> Maybe you could try and work out what's going on with the p21 
> >>>> patch in your case?
> >>>
> >>> I think I now realise what's probably going on.
> >>>
> >>> If you call netif_carrier_off() before phylink_stop(), then 
> >>> phylink will believe that the link is already down, and so it 
> >>> won't bother calling
> >>> mac_link_down() - it will believe that the link is already down.
> >>>
> >>> I'll update the documentation for phylink_stop() to spell out this 
> >>> aspect.
> >>>
> >>
> >> There are pretty high number of net drivers which do call
> >> 	netif_carrier_off(dev);
> >> before
> >> 	phy_stop(dev->phydev);
> >> in .ndo_stop() callback.
> >>
> >> As per you comment this seems to be incorrect, so should such calls 
> >> be removed?
> > 
> > Well, I think the question that needs to be asked is this:
> > 
> >   Is calling netif_carrier_off() before phy_stop() safe?
> > 
> > Well, reading the phylib code, this is the answer I've come to:
> > 
> >   Between phy_start() and phy_stop(), phylib is free to manage the
> >   carrier state itself through the phylib state machine.
> > 
> >   This means if you call netif_carrier_off() prior to phy_stop(),
> >   there is nothing preventing the phylib state machine from running,
> >   and a co-incident poll of the PHY could notice that the link has
> >   come up, and re-enable the carrier while your ndo_stop() method
> >   is still running.
> > 
> > So, my conclusion is that this practice is provably racy, though 
> > it's probably not that easy to trigger the race (which is probably 
> > why no one has reported the problem.)
> > 
> > Given that it's racy, it's not something that I think phylink should 
> > care about, and should "softly" discourage it.  So, I'm happy with 
> > what phylink is doing here, and I suggest fixing the drivers for 
> > this race.
> > 
> > In any case, it should result in less code in the drivers - since 
> > the work you need to do when the link goes down is a subset of the 
> > work you need to do when the network interface is taken down.
> > 
> 
> While I agree with all of what written before, in practice, calling
> netif_carrier_off() when using PHYLIB can cause inconsistent carrier 
> states at most, but it would not be messing the state machine itself 
> because PHYLIB does not make uses of netif_carrier_ok() to make any 
> decisions as whether the link has dropped or not, it bases its 
> information solely on phydev->link.

Indeed, but the point I'm making is that this sequence is very possible with drivers that mess about by fiddling with stuff before they call phy_stop():

	CPU0					CPU1
	netif_carrier_off()
	mvpp2_egress_disable()
						phy_state_machine()
						 (phydev->state = PHY_AN)
						phy_link_up()
						phy_link_change()
						netif_carrier_on()
						mvpp2_link_event()
						mvpp2_egress_enable()
						mvpp2_ingress_enable()
	mvpp2_port_disable()
	phy_stop(ndev->phydev)

At this point, egress has not been disabled as mvpp2_stop_dev() wants, because the phylib state machine got in before it was stopped, called the adjust link function which then had the effect of re-enabling the egress.

If that doesn't matter, then what's the point of the
mvpp2_egress_disable() call in the mvpp2_stop_dev() path... either it matters and the mvpp2_stop_dev() sequence is broken, or it doesn't matter and some the work that mvpp2_stop_dev() is doing is unnecessary.

--
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [EXT] Re: [PATCH net] net: phylink: fix link state on phy-connect
  2017-12-02 11:08                           ` Yan Markman
@ 2017-12-02 14:58                             ` Russell King - ARM Linux
  0 siblings, 0 replies; 19+ messages in thread
From: Russell King - ARM Linux @ 2017-12-02 14:58 UTC (permalink / raw)
  To: Yan Markman
  Cc: Florian Fainelli, Grygorii Strashko, Antoine Tenart, andrew,
	davem, gregory.clement, thomas.petazzoni, miquel.raynal,
	Nadav Haklai, mw, Stefan Chulski, netdev, linux-kernel

On Sat, Dec 02, 2017 at 11:08:45AM +0000, Yan Markman wrote:
> Hi Russel
>
> The Grygorii has raised one Additional point (about netif_carrier_off)
> I just didn't want to start before finishing the previous one.
>
> On ifconfig-down the mac_config() called but with LINK=0. 
> The config has no any knowledge what is intention -- up or down and
> should be done under disabled ingress/egress, and so the mac_config
> one of its action is    netif_carrier_off.

With the "p21" patch applied, which is now queued for 4.15-rc by davem,
the behaviour of phylink when phylink_stop() is called becomes entirely
predictable.

When phylink_stop() has been called, provided the carrier state is left
alone, it is guaranteed that mac_link_down() will be called if the link
was originally up, and this will complete prior to phylink_stop()
returning.

After that call has been made, and provided no further calls from the
MAC driver to phylink are made, phylink will make no further calls
to the MAC driver via mac_config(), mac_link_up() or mac_link_down().

It will only resume making these calls once phylink_start() is called.
phylink_start() will cause mac_config() to be called for the current
link mode.  A resolve of the current state is then triggered, which
may trigger further mac_config() calls to be made.  If the link is
then deemed to be up, a call to mac_link_up() will be made.

> After calling mac_config() the phylink checks
>   if (!link  &&  !netif_carrier_ok())
> and decides to abort further down since all-done...

phylink does not contain any such if () statement, so I'm not sure
what code you are referring to.

> REMOVE netif_carrier_off looks like correct BUT has cases where de driver stops to works properly (sorry, I can't remember now what exactly).
> So finally I have placed there the CONDITIONAL carrier-off depending upon link:
> 
> static void mvpp2_mac_config(){
> 	if (state->link)        --- occasionally is TRUE on UP but FALSE on down
> 		netif_carrier_off(port->dev);//YANM

You should not be changing the carrier state in your mac_config()
function, because, again, just like having netif_carrier_off() before
phylink_stop(), it will mess phylink's tracking of the current state
and will cause the mac_link_*() functions to be called erratically.

> BTW: It's seems your below patch should be present anyway.
> +++ b/drivers/net/phy/phylink.c
> @@ -798,6 +798,7 @@ void phylink_disconnect_phy(struct phylink *pl)
> +		pl->phy_state.link = false;

Here's an example without the above on Macchiatobin of a up -> down -> up
sequence on the gigabit wired ethernet port on this board (which I have
bound to a Linux bridge device).  The exact command used for this was:

# ifconfig eth2 down; sleep 2; ifconfig eth2 up

[66926.127009] mvpp2x f4000000.ppv22 eth2: Link is Down
[66926.131557] br0: port 1(eth2) entered disabled state
[66928.144845] mvpp2x f4000000.ppv22 eth2: configuring for inband/sgmii link mode
[66928.144853] mvpp2x f4000000.ppv22 eth2: reconfig: pm 4->4 cm 201->201 f 2->2
[66928.154937] IPv6: ADDRCONF(NETDEV_UP): eth2: link is not ready
[66929.783866] IPv6: ADDRCONF(NETDEV_UP): br0: link is not ready
[66929.979499] IPv6: ADDRCONF(NETDEV_UP): br0: link is not ready
[66931.213407] mvpp2x f4000000.ppv22 eth2: reconfig: pm 4->4 cm 201->201 f a->a
[66931.213424] mvpp2x f4000000.ppv22 eth2: Link is Up - 1Gbps/Full - flow control off
[66931.213433] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
[66931.213682] br0: port 1(eth2) entered blocking state
[66931.213685] br0: port 1(eth2) entered forwarding state
[66931.213920] IPv6: ADDRCONF(NETDEV_CHANGE): br0: link becomes ready

This is with the "p21" patch applied, and mvpp2x_the netif_carrier_off()
before phylink_stop() in mvpp2x removed.  Basically:

void mv_pp2x_stop_dev(struct mv_pp2x_port *port)
{
        struct gop_hw *gop = &port->priv->hw.gop;
        struct mv_mac_data *mac = &port->mac_data;

        if (port->mac_data.phylink) {
                phylink_stop(port->mac_data.phylink);

                /* Disable interrupts on all CPUs */
                mv_pp2x_port_interrupts_disable(port);
                mv_pp2x_port_napi_disable(port);
                netif_tx_stop_all_queues(port->dev);
        } else {
                /* Stop new packets from arriving to RXQs */
                mv_pp2x_ingress_disable(port);

                mdelay(10);

                /* Disable interrupts on all CPUs */
                mv_pp2x_port_interrupts_disable(port);

                mv_pp2x_port_napi_disable(port);

                netif_carrier_off(port->dev);
                netif_tx_stop_all_queues(port->dev);

                mv_pp2x_egress_disable(port);
        }

        if (port->comphy)
                phy_power_off(port->comphy);

        if (port->priv->pp2_version == PPV21) {
                mv_pp21_port_disable(port);
        } else {
                mv_gop110_port_events_mask(gop, mac);
                mv_gop110_port_disable(gop, mac);
                port->mac_data.flags &= ~MV_EMAC_F_LINK_UP;
                port->mac_data.flags &= ~MV_EMAC_F_PORT_UP;
        }

        if (!port->mac_data.phylink) {
                if (port->mac_data.phy_dev)
                        phy_stop(port->mac_data.phy_dev);
                else
                        tasklet_kill(&port->link_change_tasklet);
        }
}

with the mac_link_down() being:

static void mv_pp22_mac_link_down(struct net_device *dev, unsigned int mode)
{
        struct mv_pp2x_port *port = netdev_priv(dev);

        port->mac_data.link = 0;
        mv_pp2x_ingress_disable(port);
        mv_pp2x_egress_disable(port);
        port->mac_data.flags &= ~MV_EMAC_F_LINK_UP;
}

The phylink case is the same, but with the ingress/egress disable in
a slightly different position - as it would be if the interface were
taken down without the link being up.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2017-12-02 14:59 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-28 13:29 [PATCH net] net: phylink: fix link state on phy-connect Antoine Tenart
2017-11-28 13:56 ` Andrew Lunn
2017-11-28 14:10   ` Antoine Tenart
2017-11-28 15:53 ` Russell King
2017-11-28 15:56   ` Russell King
2017-11-29  7:22     ` Antoine Tenart
2017-11-29 19:33     ` [EXT] " Yan Markman
2017-11-29 19:59       ` Russell King - ARM Linux
2017-11-29 21:06         ` [EXT] " Yan Markman
2017-11-29 21:20           ` Russell King - ARM Linux
2017-11-30  8:51             ` Yan Markman
2017-11-30 10:10               ` Russell King - ARM Linux
2017-11-30 13:28                 ` Russell King - ARM Linux
2017-12-01 17:07                   ` Grygorii Strashko
2017-12-01 17:24                     ` Russell King - ARM Linux
2017-12-01 17:36                       ` Florian Fainelli
2017-12-01 17:47                         ` Russell King - ARM Linux
2017-12-02 11:08                           ` Yan Markman
2017-12-02 14:58                             ` Russell King - ARM Linux

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).