From: Florian Fainelli <f.fainelli@gmail.com>
To: Russell King - ARM Linux <linux@armlinux.org.uk>,
Grygorii Strashko <grygorii.strashko@ti.com>
Cc: Yan Markman <ymarkman@marvell.com>,
Antoine Tenart <antoine.tenart@free-electrons.com>,
"andrew@lunn.ch" <andrew@lunn.ch>,
"davem@davemloft.net" <davem@davemloft.net>,
"gregory.clement@free-electrons.com"
<gregory.clement@free-electrons.com>,
"thomas.petazzoni@free-electrons.com"
<thomas.petazzoni@free-electrons.com>,
"miquel.raynal@free-electrons.com"
<miquel.raynal@free-electrons.com>,
Nadav Haklai <nadavh@marvell.com>,
"mw@semihalf.com" <mw@semihalf.com>,
Stefan Chulski <stefanc@marvell.com>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [EXT] Re: [PATCH net] net: phylink: fix link state on phy-connect
Date: Fri, 1 Dec 2017 09:36:42 -0800 [thread overview]
Message-ID: <221420a4-9f56-373e-f5cd-0d2fcb02e5fb@gmail.com> (raw)
In-Reply-To: <20171201172440.GK10595@n2100.armlinux.org.uk>
On 12/01/2017 09:24 AM, Russell King - ARM Linux wrote:
> On Fri, Dec 01, 2017 at 11:07:22AM -0600, Grygorii Strashko wrote:
>> Hi Russell,
>>
>> On 11/30/2017 07:28 AM, Russell King - ARM Linux wrote:
>>> On Thu, Nov 30, 2017 at 10:10:18AM +0000, Russell King - ARM Linux wrote:
>>>> On Thu, Nov 30, 2017 at 08:51:21AM +0000, Yan Markman wrote:
>>>>> The phylink_stop is called before phylink_disconnect_phy
>>>>> You could see in mvpp2.c:
>>>>>
>>>>> mvpp2_stop_dev() {
>>>>> phylink_stop(port->phylink);
>>>>> }
>>>>>
>>>>> mvpp2_stop() {
>>>>> mvpp2_stop_dev(port);
>>>>> phylink_disconnect_phy(port->phylink);
>>>>> }
>>>>>
>>>>> .ndo_stop = mvpp2_stop,
>>>>
>>>> Sorry, I don't have this in mvpp2.c, so I have no visibility of what
>>>> you're working with.
>>>>
>>>> What you have above looks correct, and I see no reason why the p21
>>>> patch would not have resolved your issue. The p21 patch ensures
>>>> that phylink_resolve() gets called and completes before phylink_stop()
>>>> returns. In that case, phylink_resolve() will call the mac_link_down()
>>>> method if the link is not already down. It will also print the "Link
>>>> is Down" message.
>>>>
>>>> Florian has already tested this patch after encountering a similar
>>>> issue, and has reported that it solves the problem for him. I've also
>>>> tested it with mvneta, and the original mvpp2x driver on Macchiatobin.
>>>>
>>>> Maybe there's something different about mvpp2, but as I have no
>>>> visibility of that driver and the modifications therein, I can't
>>>> comment further other than stating that it works for three different
>>>> implementations.
>>>>
>>>> Maybe you could try and work out what's going on with the p21 patch
>>>> in your case?
>>>
>>> I think I now realise what's probably going on.
>>>
>>> If you call netif_carrier_off() before phylink_stop(), then phylink will
>>> believe that the link is already down, and so it won't bother calling
>>> mac_link_down() - it will believe that the link is already down.
>>>
>>> I'll update the documentation for phylink_stop() to spell out this
>>> aspect.
>>>
>>
>> There are pretty high number of net drivers which do call
>> netif_carrier_off(dev);
>> before
>> phy_stop(dev->phydev);
>> in .ndo_stop() callback.
>>
>> As per you comment this seems to be incorrect, so should such calls be
>> removed?
>
> Well, I think the question that needs to be asked is this:
>
> Is calling netif_carrier_off() before phy_stop() safe?
>
> Well, reading the phylib code, this is the answer I've come to:
>
> Between phy_start() and phy_stop(), phylib is free to manage the
> carrier state itself through the phylib state machine.
>
> This means if you call netif_carrier_off() prior to phy_stop(),
> there is nothing preventing the phylib state machine from running,
> and a co-incident poll of the PHY could notice that the link has
> come up, and re-enable the carrier while your ndo_stop() method
> is still running.
>
> So, my conclusion is that this practice is provably racy, though
> it's probably not that easy to trigger the race (which is probably
> why no one has reported the problem.)
>
> Given that it's racy, it's not something that I think phylink should
> care about, and should "softly" discourage it. So, I'm happy with
> what phylink is doing here, and I suggest fixing the drivers for
> this race.
>
> In any case, it should result in less code in the drivers - since
> the work you need to do when the link goes down is a subset of the
> work you need to do when the network interface is taken down.
>
While I agree with all of what written before, in practice, calling
netif_carrier_off() when using PHYLIB can cause inconsistent carrier
states at most, but it would not be messing the state machine itself
because PHYLIB does not make uses of netif_carrier_ok() to make any
decisions as whether the link has dropped or not, it bases its
information solely on phydev->link.
This is not true with PHYLINK, which is why the problem was observed here.
--
Florian
next prev parent reply other threads:[~2017-12-01 17:36 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-11-28 13:29 [PATCH net] net: phylink: fix link state on phy-connect Antoine Tenart
2017-11-28 13:56 ` Andrew Lunn
2017-11-28 14:10 ` Antoine Tenart
2017-11-28 15:53 ` Russell King
2017-11-28 15:56 ` Russell King
2017-11-29 7:22 ` Antoine Tenart
2017-11-29 19:33 ` [EXT] " Yan Markman
2017-11-29 19:59 ` Russell King - ARM Linux
2017-11-29 21:06 ` [EXT] " Yan Markman
2017-11-29 21:20 ` Russell King - ARM Linux
2017-11-30 8:51 ` Yan Markman
2017-11-30 10:10 ` Russell King - ARM Linux
2017-11-30 13:28 ` Russell King - ARM Linux
2017-12-01 17:07 ` Grygorii Strashko
2017-12-01 17:24 ` Russell King - ARM Linux
2017-12-01 17:36 ` Florian Fainelli [this message]
2017-12-01 17:47 ` Russell King - ARM Linux
2017-12-02 11:08 ` Yan Markman
2017-12-02 14:58 ` Russell King - ARM Linux
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=221420a4-9f56-373e-f5cd-0d2fcb02e5fb@gmail.com \
--to=f.fainelli@gmail.com \
--cc=andrew@lunn.ch \
--cc=antoine.tenart@free-electrons.com \
--cc=davem@davemloft.net \
--cc=gregory.clement@free-electrons.com \
--cc=grygorii.strashko@ti.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux@armlinux.org.uk \
--cc=miquel.raynal@free-electrons.com \
--cc=mw@semihalf.com \
--cc=nadavh@marvell.com \
--cc=netdev@vger.kernel.org \
--cc=stefanc@marvell.com \
--cc=thomas.petazzoni@free-electrons.com \
--cc=ymarkman@marvell.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).