All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Russell King (Oracle)" <linux@armlinux.org.uk>
To: Joakim Zhang <qiangqing.zhang@nxp.com>
Cc: Andrew Lunn <andrew@lunn.ch>, Vladimir Oltean <olteanv@gmail.com>,
	"peppe.cavallaro@st.com" <peppe.cavallaro@st.com>,
	"alexandre.torgue@foss.st.com" <alexandre.torgue@foss.st.com>,
	"joabreu@synopsys.com" <joabreu@synopsys.com>,
	"davem@davemloft.net" <davem@davemloft.net>,
	"kuba@kernel.org" <kuba@kernel.org>,
	"mcoquelin.stm32@gmail.com" <mcoquelin.stm32@gmail.com>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"f.fainelli@gmail.com" <f.fainelli@gmail.com>,
	"hkallweit1@gmail.com" <hkallweit1@gmail.com>,
	dl-linux-imx <linux-imx@nxp.com>
Subject: Re: [PATCH] net: stmmac: fix MAC not working when system resume back with WoL enabled
Date: Fri, 3 Sep 2021 10:32:46 +0100	[thread overview]
Message-ID: <20210903093246.GT22278@shell.armlinux.org.uk> (raw)
In-Reply-To: <DB8PR04MB679518228AB7B2C5CD47A1B3E6CF9@DB8PR04MB6795.eurprd04.prod.outlook.com>

On Fri, Sep 03, 2021 at 08:39:23AM +0000, Joakim Zhang wrote:
> 
> Hi Russell,
> 
> > -----Original Message-----
> > From: Russell King <linux@armlinux.org.uk>
> > Sent: 2021年9月3日 16:02
> > To: Joakim Zhang <qiangqing.zhang@nxp.com>
> > Cc: Andrew Lunn <andrew@lunn.ch>; Vladimir Oltean <olteanv@gmail.com>;
> > peppe.cavallaro@st.com; alexandre.torgue@foss.st.com;
> > joabreu@synopsys.com; davem@davemloft.net; kuba@kernel.org;
> > mcoquelin.stm32@gmail.com; netdev@vger.kernel.org; f.fainelli@gmail.com;
> > hkallweit1@gmail.com; dl-linux-imx <linux-imx@nxp.com>
> > Subject: Re: [PATCH] net: stmmac: fix MAC not working when system resume
> > back with WoL enabled
> > 
> > On Fri, Sep 03, 2021 at 06:51:09AM +0000, Joakim Zhang wrote:
> > >
> > > > -----Original Message-----
> > > > From: Andrew Lunn <andrew@lunn.ch>
> > > > Sent: 2021年9月2日 20:24
> > > > To: Joakim Zhang <qiangqing.zhang@nxp.com>
> > > > Cc: Russell King <linux@armlinux.org.uk>; Vladimir Oltean
> > > > <olteanv@gmail.com>; peppe.cavallaro@st.com;
> > > > alexandre.torgue@foss.st.com; joabreu@synopsys.com;
> > > > davem@davemloft.net; kuba@kernel.org; mcoquelin.stm32@gmail.com;
> > > > netdev@vger.kernel.org; f.fainelli@gmail.com; hkallweit1@gmail.com;
> > > > dl-linux-imx <linux-imx@nxp.com>
> > > > Subject: Re: [PATCH] net: stmmac: fix MAC not working when system
> > > > resume back with WoL enabled
> > > >
> > > > > Emm, @andrew@lunn.ch, Andrew is much familiar with FEC, and PHY
> > > > > maintainers, Could you please help put insights here if possible?
> > > >
> > > > All the boards i have either have an Ethernet Switch connected to
> > > > the MAC, or a Micrel PHY. None are setup for WoL, since it is not
> > > > used in the use case of these boards.
> > > >
> > > > I think you need to scatter some printk() in various places to
> > > > confirm what is going on. Where is the WoL implemented: MAC or PHY,
> > > > what is suspended or not, etc.
> > >
> > > Thanks Andrew, Russell,
> > >
> > > I confirmed FEC is MAC-based WoL, and PHY is active when system
> > suspended if MAC-based WoL is active.
> > > I scatter printk() in both phy_device.c and realtek.c phy driver to debug this
> > for both WoL active and inactive case.
> > >
> > > When MAC-based WoL is active, phy_suspend() is the last point to actually
> > suspend the PHY, you can see that,
> > > 	phy_ethtool_get_wol(phydev, &wol);
> > > 	if (wol.wolopts || (netdev && netdev->wol_enabled))
> > > 		return -EBUSY;
> > >
> > > Here, netdev is true and netdev->wol_enabled is ture
> > > (net/ethtool/wol.c: ethnl_set_wol() -> dev->wol_enabled =
> > > !!wol.wolopts;) So that phydev->suspend() would not be called, PHY is active
> > after system suspended. PHY can receive packets and pass to MAC, MAC is
> > responsible for detecting magic packets then generate a wakeup interrupt. So
> > all is fine for FEC, and the behavior is clear.
> > 
> > What happens on resume with FEC?
> 
> Since we call phy_stop() in fec_suspend(), the link is down, but the PHY is active, after receiving
> magic packets, the system resume back; In fec_resume(), after restart/init FEC, we call phy_start()
> to let link up, then all is going well.

... but the link never went down! So I don't understand the last point.

> > > For STMMAC, when MAC-based WoL is active, according to the current
> > > implementation, only call phylink_mac_change()=false, PHY would be
> > > active, so PHY can receive packets then pass to MAC, MAC ignore packets
> > except magic packets. System can be waked up successfully.
> > >
> > > The issue is that phylink_mac_change()=false only notify a phylink of
> > > a change in MAC state, as we analyzed before, PHY would link up again
> > > before system suspended, which lead to .mac_link_up can't be called when
> > system resume back. Unfortunately, all MAC configurations are in
> > stmmac_mac_link_up(), as a result, MAC has not been initialized correctly
> > when system resume back, so that it can't work any longer.
> > 
> > Oh, I thought your problem was that the system didn't wake up.
> > 
> > In any case, remove the calls to phylink_mac_change() from the suspend and
> > resume functions, they are completely _incorrect_.
> 
> Ok, I will do that.
> 
> > > Intend to fix this obvious breakage, I did some work:
> > > Removing phylink_mac_change() (Russell said it's for MLO_AN_INBAND,
> > > but we have a MLO_AN_PHY) from suspend/resume path, then adding
> > > phylink_stop() in suspend, phylink_start() in resume() also for WoL active path.
> > I found remote magic packets can't wake up the system, I firstly suspect PHY
> > may be suspended. After further debug, I confirm that PHY is active, and
> > stmmac_pmt() is correctly configured.
> > 
> > As I've said a few times now, if the MAC is doing the wakeup, you need the PHY
> > to MAC link to be up, so you should _not_ call
> > phylink_stop() and phylink_start() from the suspend/resume functions because
> > they will take the link down.
> 
> Yes, I recall you said this before. Is it the requirement for phylink?
> For FEC, we call phy_stop() when suspend, and phy_start() when resume, with MAC is doing
> the wakeup.

phylink_stop() will synchronously force the phy/mac link down if it
wasn't already down, and it'll do this by calling the mac_link_down()
method.

phylink_start() will remove the force-down that phylink_stop() places,
and will re-resolve the link. You will get a "major reconfiguration"
event, and if the link is up, a mac_link_up() call.

These are primarily designed to be called from the .ndo_open and
.ndo_stop calls (as their kerneldoc mentions) but have found their way
into suspend/resume methods.

If a MAC is being suspended - as in powered down - you definitely want
to bring it down to a safe state where link events are not going to
affect the MAC. Calling phylink_stop() will do that. On resume, you
want to reconfigure and allow the MAC to receive link events, and
calling phylink_start() will do that.

However, if the MAC is not being suspended because you want WoL to
work, then you need the PHY/MAC link _not_ to be brought down so the
MAC can receive packets and examine them to see if it should wake up.
Phylink does not really cater for that case - it wasn't on my radar
as I don't have any modern systems that support suspend/resume, much
less MAC based WoL.

> > Maybe I should provide phylink_suspend()/phylink_resume() which look at the
> > netdev state just like phylib does, and conditionally call
> > phylink_stop() and phylink_start() so driver authors don't have to consider this.
> > 
> > Something like:
> > 
> > /**
> >  *...
> >  * @mac_wol: true if the MAC needs to receive packets for Wake-on-Lan  */
> > void phylink_suspend(struct phylink *phylink, bool mac_wol) {
> > 	ASSERT_RTNL();
> > 
> > 	if (!mac_wol && !(phylink->netdev && phylink->netdev->wol_active))
> > 		phylink_stop(phylink);
> > }
> > 
> > /**
> >  *...
> >  * @mac_wol: true if the MAC needs to receive packets for Wake-on-Lan
> >  *
> >  * @mac_wol must have the same value as passed previously to
> >  * phylink_suspend().
> >  */
> > void phylink_resume(struct phylink *phylink, bool mac_wol) {
> > 	ASSERT_RTNL();
> > 
> > 	if (!mac_wol && !(phylink->netdev && phylink->netdev->wol_active))
> > 		phylink_start(phylink);
> > }
> 
> That's great!!! MAC driver authors don't need to distinguish the different cases.
> 
> > > The conclusion is that, as long as we call phylink_stop() for WoL
> > > active in suspend(), then system can't be waked up any longer, and the
> > > PHY situation is active. This let me recall what Russell mentioned in this
> > thread, if we need bring MAC link up with phylink framework to let MAC can
> > see traffic from PHY when MAC-based WoL is active?
> > >
> > > Now, I don't know where I can further dig into this issue, if you have any
> > advice please share with me , thanks in advance.
> > 
> > So my question now is: as the MAC needs to be alive while the system is
> > suspended, that implies that it has been configured to receive packets. When
> > the system resumes, why exactly doesn't the MAC continue to work? Does the
> > MAC get reset after the system comes out of resume and lose all of its
> > configuration?
> 
> Yes, as I described in commit message, when STMMAC resume back, either WoL is active or not,
> it reset the hardware then reconfig the MAC.
> stmmac_resume()->stmmac_hw_setup()->stmmac_init_dma_engine()...

Okay, so that means I need to make phylink_resume() trigger a major
config if we are using WoL.

There's a few more questions:
1. Since the state at this point will be that netdev and phylink believe
   the link to still be up, should phylink_suspend() force a
   netif_carrier_off() event to stop the netdev transmit watchdog - I
   think it ought to, even though the link will actually remain up.
2. Should we call mac_link_down() prior to the major reconfig - I think
   we should to keep the mac_link_down()/mac_link_up() calls balanced
   (as we do already guarantee.) Will that do any harm for stmmac if we
   were to call mac_link_down() as a result of a call to
   phylink_resume() ?

> > Reading what stmmac_resume() does, it seems that may well be the case, or if
> > not, the actions of stmmac_resume() ends up reprogramming a great deal of
> > the device setup. If this is the case, then yes, we need phylink to be triggered
> > to reconfigure the link - which we could do in
> > phylink_resume() if mac_wol was active.
> > 
> > While reading stmmac_resume(), I have to question the placement of this code
> > block:
> > 
> >         if (!device_may_wakeup(priv->device) || !priv->plat->pmt) {
> >                 rtnl_lock();
> >                 phylink_start(priv->phylink);
> >                 /* We may have called phylink_speed_down before */
> >                 phylink_speed_up(priv->phylink);
> >                 rtnl_unlock();
> >         }
> > 
> > in the sequence there - phylink_start() should be called when you're ready for
> > the link to come up - in other words, when you're ready to start seeing packets
> > arrive at the MAC's interface. However, the code following is clearing and
> > resetting up queues, restoring receive modes, setting up the hardware, and
> > restoring the vlan filtering.
> > Surely all that should happen before calling phylink_start(), much like it already
> > does in stmmac_open() ?
> 
> There is a story here, SNPS EQOS IP need PHY provides RXC clock for MAC's receive
> logic, so we need phylink_start() to bring PHY link up, that make PHY resume back,
> PHY could stop RXC clock when in suspended state. This is the reason why calling phylink_start()
> before re-config MAC.

Why is it different from the .ndo_stop/.ndo_open case, where the PHY may
have been suspended by the actions of .ndo_stop?

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

  reply	other threads:[~2021-09-03  9:33 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-01  9:02 Joakim Zhang
2021-09-01  9:13 ` Russell King (Oracle)
2021-09-01 10:21   ` Joakim Zhang
2021-09-01 12:56     ` Russell King (Oracle)
2021-09-02  7:01       ` Joakim Zhang
2021-09-01 15:40     ` Heiner Kallweit
2021-09-02  7:35       ` Joakim Zhang
2021-09-01  9:21 ` Vladimir Oltean
2021-09-01 10:25   ` Joakim Zhang
2021-09-01 10:56     ` Vladimir Oltean
2021-09-01 11:42       ` Joakim Zhang
2021-09-01 13:25         ` Russell King (Oracle)
2021-09-02  7:28           ` Joakim Zhang
2021-09-02  8:32             ` Russell King (Oracle)
2021-09-02 10:26               ` Joakim Zhang
2021-09-02 10:49                 ` Russell King (Oracle)
2021-09-02 11:15                   ` Joakim Zhang
2021-09-02 12:24                     ` Andrew Lunn
2021-09-03  6:51                       ` Joakim Zhang
2021-09-03  8:01                         ` Russell King (Oracle)
2021-09-03  8:39                           ` Joakim Zhang
2021-09-03  9:32                             ` Russell King (Oracle) [this message]
2021-09-03 11:04                               ` Joakim Zhang
2021-09-03 12:01                                 ` Russell King (Oracle)
2021-09-03 20:12                                   ` Russell King - ARM Linux admin
2021-09-06  2:29                                     ` Joakim Zhang
2021-09-06  9:34                                       ` Russell King (Oracle)
2021-09-06 10:41                                         ` Joakim Zhang
2021-09-06 11:21                                           ` Russell King (Oracle)
2021-09-06 13:23                                             ` Andrew Lunn
2021-09-07  8:52                                             ` Russell King (Oracle)
2021-09-06  2:21                                   ` Joakim Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210903093246.GT22278@shell.armlinux.org.uk \
    --to=linux@armlinux.org.uk \
    --cc=alexandre.torgue@foss.st.com \
    --cc=andrew@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=f.fainelli@gmail.com \
    --cc=hkallweit1@gmail.com \
    --cc=joabreu@synopsys.com \
    --cc=kuba@kernel.org \
    --cc=linux-imx@nxp.com \
    --cc=mcoquelin.stm32@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=olteanv@gmail.com \
    --cc=peppe.cavallaro@st.com \
    --cc=qiangqing.zhang@nxp.com \
    --subject='Re: [PATCH] net: stmmac: fix MAC not working when system resume back with WoL enabled' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.