From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751828AbdLBLJJ convert rfc822-to-8bit (ORCPT ); Sat, 2 Dec 2017 06:09:09 -0500 Received: from mx0b-0016f401.pphosted.com ([67.231.156.173]:47964 "EHLO mx0b-0016f401.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751740AbdLBLJF (ORCPT ); Sat, 2 Dec 2017 06:09:05 -0500 From: Yan Markman To: Russell King - ARM Linux , Florian Fainelli CC: Grygorii Strashko , Antoine Tenart , "andrew@lunn.ch" , "davem@davemloft.net" , "gregory.clement@free-electrons.com" , "thomas.petazzoni@free-electrons.com" , "miquel.raynal@free-electrons.com" , "Nadav Haklai" , "mw@semihalf.com" , "Stefan Chulski" , "netdev@vger.kernel.org" , "linux-kernel@vger.kernel.org" Subject: RE: [EXT] Re: [PATCH net] net: phylink: fix link state on phy-connect Thread-Topic: [EXT] Re: [PATCH net] net: phylink: fix link state on phy-connect Thread-Index: AQHTaE0J1bz47nEl5EqXJeKzEHOhkKMpz+GAgAAA0ICAAeUZ0P//8SEAgAAt2KD//+jiAIAA4Y6g///1hAAABuwZAAA572wAAACarQAAAGuWAAAAYLWAACeXWFA= Date: Sat, 2 Dec 2017 11:08:45 +0000 Message-ID: References: <20448667430e434aad5bb8cd1b082611@IL-EXCH01.marvell.com> <20171129195911.GG8356@n2100.armlinux.org.uk> <21ec97be76d54a6c8a80fd5b56d35678@IL-EXCH01.marvell.com> <20171129212032.GI8356@n2100.armlinux.org.uk> <20171130101018.GA10595@n2100.armlinux.org.uk> <20171130132830.GA5529@n2100.armlinux.org.uk> <51496048-7e11-e0aa-7c47-3c04eee70e3a@ti.com> <20171201172440.GK10595@n2100.armlinux.org.uk> <221420a4-9f56-373e-f5cd-0d2fcb02e5fb@gmail.com> <20171201174730.GM10595@n2100.armlinux.org.uk> In-Reply-To: <20171201174730.GM10595@n2100.armlinux.org.uk> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [199.203.130.14] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-12-02_05:,, signatures=0 X-Proofpoint-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1011 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1712020164 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Russel The Grygorii has raised one Additional point (about netif_carrier_off) I just didn't want to start before finishing the previous one. On ifconfig-down the mac_config() called but with LINK=0. The config has no any knowledge what is intention -- up or down and should be done under disabled ingress/egress, and so the mac_config one of its action is netif_carrier_off. After calling mac_config() the phylink checks if (!link && !netif_carrier_ok()) and decides to abort further down since all-done... REMOVE netif_carrier_off looks like correct BUT has cases where de driver stops to works properly (sorry, I can't remember now what exactly). So finally I have placed there the CONDITIONAL carrier-off depending upon link: static void mvpp2_mac_config(){ if (state->link) --- occasionally is TRUE on UP but FALSE on down netif_carrier_off(port->dev);//YANM BTW: It's seems your below patch should be present anyway. +++ b/drivers/net/phy/phylink.c @@ -798,6 +798,7 @@ void phylink_disconnect_phy(struct phylink *pl) + pl->phy_state.link = false; Thank you Best regards Yan Markman -----Original Message----- From: Russell King - ARM Linux [mailto:linux@armlinux.org.uk] Sent: Friday, December 01, 2017 7:48 PM To: Florian Fainelli Cc: Grygorii Strashko ; Yan Markman ; Antoine Tenart ; andrew@lunn.ch; davem@davemloft.net; gregory.clement@free-electrons.com; thomas.petazzoni@free-electrons.com; miquel.raynal@free-electrons.com; Nadav Haklai ; mw@semihalf.com; Stefan Chulski ; netdev@vger.kernel.org; linux-kernel@vger.kernel.org Subject: Re: [EXT] Re: [PATCH net] net: phylink: fix link state on phy-connect On Fri, Dec 01, 2017 at 09:36:42AM -0800, Florian Fainelli wrote: > On 12/01/2017 09:24 AM, Russell King - ARM Linux wrote: > > On Fri, Dec 01, 2017 at 11:07:22AM -0600, Grygorii Strashko wrote: > >> Hi Russell, > >> > >> On 11/30/2017 07:28 AM, Russell King - ARM Linux wrote: > >>> On Thu, Nov 30, 2017 at 10:10:18AM +0000, Russell King - ARM Linux wrote: > >>>> On Thu, Nov 30, 2017 at 08:51:21AM +0000, Yan Markman wrote: > >>>>> The phylink_stop is called before phylink_disconnect_phy You > >>>>> could see in mvpp2.c: > >>>>> > >>>>> mvpp2_stop_dev() { > >>>>> phylink_stop(port->phylink); > >>>>> } > >>>>> > >>>>> mvpp2_stop() { > >>>>> mvpp2_stop_dev(port); > >>>>> phylink_disconnect_phy(port->phylink); > >>>>> } > >>>>> > >>>>> .ndo_stop = mvpp2_stop, > >>>> > >>>> Sorry, I don't have this in mvpp2.c, so I have no visibility of > >>>> what you're working with. > >>>> > >>>> What you have above looks correct, and I see no reason why the > >>>> p21 patch would not have resolved your issue. The p21 patch > >>>> ensures that phylink_resolve() gets called and completes before > >>>> phylink_stop() returns. In that case, phylink_resolve() will > >>>> call the mac_link_down() method if the link is not already down. > >>>> It will also print the "Link is Down" message. > >>>> > >>>> Florian has already tested this patch after encountering a > >>>> similar issue, and has reported that it solves the problem for > >>>> him. I've also tested it with mvneta, and the original mvpp2x driver on Macchiatobin. > >>>> > >>>> Maybe there's something different about mvpp2, but as I have no > >>>> visibility of that driver and the modifications therein, I can't > >>>> comment further other than stating that it works for three > >>>> different implementations. > >>>> > >>>> Maybe you could try and work out what's going on with the p21 > >>>> patch in your case? > >>> > >>> I think I now realise what's probably going on. > >>> > >>> If you call netif_carrier_off() before phylink_stop(), then > >>> phylink will believe that the link is already down, and so it > >>> won't bother calling > >>> mac_link_down() - it will believe that the link is already down. > >>> > >>> I'll update the documentation for phylink_stop() to spell out this > >>> aspect. > >>> > >> > >> There are pretty high number of net drivers which do call > >> netif_carrier_off(dev); > >> before > >> phy_stop(dev->phydev); > >> in .ndo_stop() callback. > >> > >> As per you comment this seems to be incorrect, so should such calls > >> be removed? > > > > Well, I think the question that needs to be asked is this: > > > > Is calling netif_carrier_off() before phy_stop() safe? > > > > Well, reading the phylib code, this is the answer I've come to: > > > > Between phy_start() and phy_stop(), phylib is free to manage the > > carrier state itself through the phylib state machine. > > > > This means if you call netif_carrier_off() prior to phy_stop(), > > there is nothing preventing the phylib state machine from running, > > and a co-incident poll of the PHY could notice that the link has > > come up, and re-enable the carrier while your ndo_stop() method > > is still running. > > > > So, my conclusion is that this practice is provably racy, though > > it's probably not that easy to trigger the race (which is probably > > why no one has reported the problem.) > > > > Given that it's racy, it's not something that I think phylink should > > care about, and should "softly" discourage it. So, I'm happy with > > what phylink is doing here, and I suggest fixing the drivers for > > this race. > > > > In any case, it should result in less code in the drivers - since > > the work you need to do when the link goes down is a subset of the > > work you need to do when the network interface is taken down. > > > > While I agree with all of what written before, in practice, calling > netif_carrier_off() when using PHYLIB can cause inconsistent carrier > states at most, but it would not be messing the state machine itself > because PHYLIB does not make uses of netif_carrier_ok() to make any > decisions as whether the link has dropped or not, it bases its > information solely on phydev->link. Indeed, but the point I'm making is that this sequence is very possible with drivers that mess about by fiddling with stuff before they call phy_stop(): CPU0 CPU1 netif_carrier_off() mvpp2_egress_disable() phy_state_machine() (phydev->state = PHY_AN) phy_link_up() phy_link_change() netif_carrier_on() mvpp2_link_event() mvpp2_egress_enable() mvpp2_ingress_enable() mvpp2_port_disable() phy_stop(ndev->phydev) At this point, egress has not been disabled as mvpp2_stop_dev() wants, because the phylib state machine got in before it was stopped, called the adjust link function which then had the effect of re-enabling the egress. If that doesn't matter, then what's the point of the mvpp2_egress_disable() call in the mvpp2_stop_dev() path... either it matters and the mvpp2_stop_dev() sequence is broken, or it doesn't matter and some the work that mvpp2_stop_dev() is doing is unnecessary. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up From mboxrd@z Thu Jan 1 00:00:00 1970 From: Yan Markman Subject: RE: [EXT] Re: [PATCH net] net: phylink: fix link state on phy-connect Date: Sat, 2 Dec 2017 11:08:45 +0000 Message-ID: References: <20448667430e434aad5bb8cd1b082611@IL-EXCH01.marvell.com> <20171129195911.GG8356@n2100.armlinux.org.uk> <21ec97be76d54a6c8a80fd5b56d35678@IL-EXCH01.marvell.com> <20171129212032.GI8356@n2100.armlinux.org.uk> <20171130101018.GA10595@n2100.armlinux.org.uk> <20171130132830.GA5529@n2100.armlinux.org.uk> <51496048-7e11-e0aa-7c47-3c04eee70e3a@ti.com> <20171201172440.GK10595@n2100.armlinux.org.uk> <221420a4-9f56-373e-f5cd-0d2fcb02e5fb@gmail.com> <20171201174730.GM10595@n2100.armlinux.org.uk> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT Cc: Grygorii Strashko , Antoine Tenart , "andrew@lunn.ch" , "davem@davemloft.net" , "gregory.clement@free-electrons.com" , "thomas.petazzoni@free-electrons.com" , "miquel.raynal@free-electrons.com" , "Nadav Haklai" , "mw@semihalf.com" , "Stefan Chulski" , "netdev@vger.kernel.org" , "linux-kernel@vger.kernel.org" To: Russell King - ARM Linux , Florian Fainelli Return-path: In-Reply-To: <20171201174730.GM10595@n2100.armlinux.org.uk> Content-Language: en-US Sender: linux-kernel-owner@vger.kernel.org List-Id: netdev.vger.kernel.org Hi Russel The Grygorii has raised one Additional point (about netif_carrier_off) I just didn't want to start before finishing the previous one. On ifconfig-down the mac_config() called but with LINK=0. The config has no any knowledge what is intention -- up or down and should be done under disabled ingress/egress, and so the mac_config one of its action is netif_carrier_off. After calling mac_config() the phylink checks if (!link && !netif_carrier_ok()) and decides to abort further down since all-done... REMOVE netif_carrier_off looks like correct BUT has cases where de driver stops to works properly (sorry, I can't remember now what exactly). So finally I have placed there the CONDITIONAL carrier-off depending upon link: static void mvpp2_mac_config(){ if (state->link) --- occasionally is TRUE on UP but FALSE on down netif_carrier_off(port->dev);//YANM BTW: It's seems your below patch should be present anyway. +++ b/drivers/net/phy/phylink.c @@ -798,6 +798,7 @@ void phylink_disconnect_phy(struct phylink *pl) + pl->phy_state.link = false; Thank you Best regards Yan Markman -----Original Message----- From: Russell King - ARM Linux [mailto:linux@armlinux.org.uk] Sent: Friday, December 01, 2017 7:48 PM To: Florian Fainelli Cc: Grygorii Strashko ; Yan Markman ; Antoine Tenart ; andrew@lunn.ch; davem@davemloft.net; gregory.clement@free-electrons.com; thomas.petazzoni@free-electrons.com; miquel.raynal@free-electrons.com; Nadav Haklai ; mw@semihalf.com; Stefan Chulski ; netdev@vger.kernel.org; linux-kernel@vger.kernel.org Subject: Re: [EXT] Re: [PATCH net] net: phylink: fix link state on phy-connect On Fri, Dec 01, 2017 at 09:36:42AM -0800, Florian Fainelli wrote: > On 12/01/2017 09:24 AM, Russell King - ARM Linux wrote: > > On Fri, Dec 01, 2017 at 11:07:22AM -0600, Grygorii Strashko wrote: > >> Hi Russell, > >> > >> On 11/30/2017 07:28 AM, Russell King - ARM Linux wrote: > >>> On Thu, Nov 30, 2017 at 10:10:18AM +0000, Russell King - ARM Linux wrote: > >>>> On Thu, Nov 30, 2017 at 08:51:21AM +0000, Yan Markman wrote: > >>>>> The phylink_stop is called before phylink_disconnect_phy You > >>>>> could see in mvpp2.c: > >>>>> > >>>>> mvpp2_stop_dev() { > >>>>> phylink_stop(port->phylink); > >>>>> } > >>>>> > >>>>> mvpp2_stop() { > >>>>> mvpp2_stop_dev(port); > >>>>> phylink_disconnect_phy(port->phylink); > >>>>> } > >>>>> > >>>>> .ndo_stop = mvpp2_stop, > >>>> > >>>> Sorry, I don't have this in mvpp2.c, so I have no visibility of > >>>> what you're working with. > >>>> > >>>> What you have above looks correct, and I see no reason why the > >>>> p21 patch would not have resolved your issue. The p21 patch > >>>> ensures that phylink_resolve() gets called and completes before > >>>> phylink_stop() returns. In that case, phylink_resolve() will > >>>> call the mac_link_down() method if the link is not already down. > >>>> It will also print the "Link is Down" message. > >>>> > >>>> Florian has already tested this patch after encountering a > >>>> similar issue, and has reported that it solves the problem for > >>>> him. I've also tested it with mvneta, and the original mvpp2x driver on Macchiatobin. > >>>> > >>>> Maybe there's something different about mvpp2, but as I have no > >>>> visibility of that driver and the modifications therein, I can't > >>>> comment further other than stating that it works for three > >>>> different implementations. > >>>> > >>>> Maybe you could try and work out what's going on with the p21 > >>>> patch in your case? > >>> > >>> I think I now realise what's probably going on. > >>> > >>> If you call netif_carrier_off() before phylink_stop(), then > >>> phylink will believe that the link is already down, and so it > >>> won't bother calling > >>> mac_link_down() - it will believe that the link is already down. > >>> > >>> I'll update the documentation for phylink_stop() to spell out this > >>> aspect. > >>> > >> > >> There are pretty high number of net drivers which do call > >> netif_carrier_off(dev); > >> before > >> phy_stop(dev->phydev); > >> in .ndo_stop() callback. > >> > >> As per you comment this seems to be incorrect, so should such calls > >> be removed? > > > > Well, I think the question that needs to be asked is this: > > > > Is calling netif_carrier_off() before phy_stop() safe? > > > > Well, reading the phylib code, this is the answer I've come to: > > > > Between phy_start() and phy_stop(), phylib is free to manage the > > carrier state itself through the phylib state machine. > > > > This means if you call netif_carrier_off() prior to phy_stop(), > > there is nothing preventing the phylib state machine from running, > > and a co-incident poll of the PHY could notice that the link has > > come up, and re-enable the carrier while your ndo_stop() method > > is still running. > > > > So, my conclusion is that this practice is provably racy, though > > it's probably not that easy to trigger the race (which is probably > > why no one has reported the problem.) > > > > Given that it's racy, it's not something that I think phylink should > > care about, and should "softly" discourage it. So, I'm happy with > > what phylink is doing here, and I suggest fixing the drivers for > > this race. > > > > In any case, it should result in less code in the drivers - since > > the work you need to do when the link goes down is a subset of the > > work you need to do when the network interface is taken down. > > > > While I agree with all of what written before, in practice, calling > netif_carrier_off() when using PHYLIB can cause inconsistent carrier > states at most, but it would not be messing the state machine itself > because PHYLIB does not make uses of netif_carrier_ok() to make any > decisions as whether the link has dropped or not, it bases its > information solely on phydev->link. Indeed, but the point I'm making is that this sequence is very possible with drivers that mess about by fiddling with stuff before they call phy_stop(): CPU0 CPU1 netif_carrier_off() mvpp2_egress_disable() phy_state_machine() (phydev->state = PHY_AN) phy_link_up() phy_link_change() netif_carrier_on() mvpp2_link_event() mvpp2_egress_enable() mvpp2_ingress_enable() mvpp2_port_disable() phy_stop(ndev->phydev) At this point, egress has not been disabled as mvpp2_stop_dev() wants, because the phylib state machine got in before it was stopped, called the adjust link function which then had the effect of re-enabling the egress. If that doesn't matter, then what's the point of the mvpp2_egress_disable() call in the mvpp2_stop_dev() path... either it matters and the mvpp2_stop_dev() sequence is broken, or it doesn't matter and some the work that mvpp2_stop_dev() is doing is unnecessary. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up