From: Russell King - ARM Linux admin <linux@armlinux.org.uk>
To: Heiner Kallweit <hkallweit1@gmail.com>
Cc: Andrew Lunn <andrew@lunn.ch>,
John David Anglin <dave.anglin@bell.net>,
Vivien Didelot <vivien.didelot@savoirfairelinux.com>,
Florian Fainelli <f.fainelli@gmail.com>,
netdev@vger.kernel.org
Subject: Re: [PATCH net] dsa: mv88e6xxx: Ensure all pending interrupts are handled prior to exit
Date: Tue, 12 Feb 2019 22:55:05 +0000 [thread overview]
Message-ID: <20190212225505.hoodvbnnru6dliu7@shell.armlinux.org.uk> (raw)
In-Reply-To: <5ba5b654-ca61-253f-042a-2a178ff86b36@gmail.com>
On Tue, Feb 12, 2019 at 09:54:55PM +0100, Heiner Kallweit wrote:
> On 12.02.2019 17:30, Russell King - ARM Linux admin wrote:
> > On Tue, Feb 12, 2019 at 07:51:05AM +0100, Heiner Kallweit wrote:
> >> On 12.02.2019 04:58, Andrew Lunn wrote:
> >>> That change means we don't check the PHY device if it caused an
> >>> interrupt when its state is less than UP.
> >>>
> >>> What i'm seeing is that the PHY is interrupting pretty early on after
> >>> a reboot when the previous boot had the interface up.
> >>>
> >> So this means that when going down for reboot the interrupts are not
> >> properly masked / disabled? Because (at least for net-next) we enable
> >> interrupts in phy_start() only.
> >
> [..]
> > In looking at this, I came across this chunk of code:
> >
> > static inline bool __phy_is_started(struct phy_device *phydev)
> > {
> > WARN_ON(!mutex_is_locked(&phydev->lock));
> >
> > return phydev->state >= PHY_UP;
> > }
> >
> > /**
> > * phy_is_started - Convenience function to check whether PHY is started
> > * @phydev: The phy_device struct
> > */
> > static inline bool phy_is_started(struct phy_device *phydev)
> > {
> > bool started;
> >
> > mutex_lock(&phydev->lock);
> > started = __phy_is_started(phydev);
> > mutex_unlock(&phydev->lock);
> >
> > return started;
> > }
> >
> > which looks to me like over-complication. The mutex locking there is
> > completely pointless - what are you trying to achieve with it?
> >
> > Let's go through this. The above is exactly equivalent to:
> >
> > bool phy_is_started(phydev)
> > {
> > int state;
> >
> > mutex_lock(&phydev->lock);
> > state = phydev->state;
> > mutex_unlock(&phydev->lock);
> >
> > return state >= PHY_UP;
> > }
> >
> > since when we do the test is irrelevant. Architectures that Linux
> > runs on are single-copy atomic, which means that reading phydev->state
> > itself is an atomic operation. So, the mutex locking around that
> > doesn't add to the atomicity of the entire operation.
> >
> > How, depending on what you do with the rest of this function depends
> > whether the entire operation is safe or not. For example, let's take
> > this code at the end of phy_state_machine():
> >
> > if (phy_polling_mode(phydev) && phy_is_started(phydev))
> > phy_queue_state_machine(phydev, PHY_STATE_TIME);
> >
> > state = PHY_UP
> > thread 0 thread 1
> > phy_disconnect()
> > +-phy_is_started()
> > phy_is_started() |
> > `-phy_stop()
> > +-phydev->state = PHY_HALTED
> > `-phy_stop_machine()
> > `-cancel_delayed_work_sync()
> > phy_queue_state_machine()
> > `-mod_delayed_work()
> >
> > At this point, the phydev->state_queue() has been added back onto the
> > system workqueue despite phy_stop_machine() having been called and
> > cancel_delayed_work_sync() called on it.
> >
> > The original code in 4.20 did not have this race condition.
> >
> > Basically, the lock inside phy_is_started() does nothing useful, and
> > I'd say is dangerously misleading.
> >
> Then idea would be to first remove the locking from phy_is_started()
> and in a second step do the following to prevent the described race
> (phy_stop() takes phydev->lock too).
>
> diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
> index c1ed03800..69dc64a4d 100644
> --- a/drivers/net/phy/phy.c
> +++ b/drivers/net/phy/phy.c
> @@ -957,8 +957,10 @@ void phy_state_machine(struct work_struct *work)
> * state machine would be pointless and possibly error prone when
> * called from phy_disconnect() synchronously.
> */
> + mutex_lock(&phydev->lock);
> if (phy_polling_mode(phydev) && phy_is_started(phydev))
> phy_queue_state_machine(phydev, PHY_STATE_TIME);
> + mutex_unlock(&phydev->lock);
> }
Yep, that approach would certainly be better. I didn't exhaustively
audit the 5.0-rc code though.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up
next prev parent reply other threads:[~2019-02-12 22:55 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-01-22 19:16 net: phylink: dsa: mv88e6xxx: flaky link detection on switch ports with internal PHYs John David Anglin
2019-01-22 20:28 ` Andrew Lunn
2019-01-22 21:40 ` John David Anglin
2019-01-22 22:36 ` Andrew Lunn
2019-01-22 23:52 ` John David Anglin
2019-01-23 0:11 ` John David Anglin
2019-01-23 0:22 ` Andrew Lunn
2019-01-25 16:30 ` John David Anglin
2019-01-25 16:48 ` Russell King - ARM Linux admin
2019-01-25 18:38 ` John David Anglin
2019-01-30 17:08 ` John David Anglin
2019-01-30 17:28 ` Andrew Lunn
2019-01-30 19:01 ` John David Anglin
2019-01-30 19:09 ` Andrew Lunn
2019-01-30 22:24 ` John David Anglin
2019-01-30 22:38 ` Andrew Lunn
2019-01-31 1:27 ` John David Anglin
2019-01-31 17:27 ` John David Anglin
2019-02-04 18:37 ` [PATCH] net: phylink: dsa: mv88e6xxx: Revise irq setup ordering John David Anglin
2019-02-04 19:35 ` Andrew Lunn
2019-02-04 19:52 ` John David Anglin
2019-02-04 20:19 ` Andrew Lunn
2019-02-04 21:38 ` John David Anglin
2019-02-04 22:47 ` Andrew Lunn
2019-02-04 21:59 ` [PATCH v2] net: " John David Anglin
2019-02-04 23:14 ` Andrew Lunn
2019-02-05 0:38 ` John David Anglin
2019-02-05 2:21 ` Andrew Lunn
2019-02-05 19:20 ` John David Anglin
2019-02-05 19:54 ` Andrew Lunn
2019-02-05 18:37 ` David Miller
2019-02-11 18:40 ` [PATCH net] dsa: mv88e6xxx: Ensure all pending interrupts are handled prior to exit John David Anglin
2019-02-11 23:33 ` Andrew Lunn
2019-02-12 0:57 ` John David Anglin
2019-02-12 1:21 ` Andrew Lunn
2019-02-12 3:58 ` Andrew Lunn
2019-02-12 6:51 ` Heiner Kallweit
2019-02-12 12:56 ` Andrew Lunn
2019-02-12 18:42 ` Heiner Kallweit
2019-02-12 20:09 ` John David Anglin
2019-02-12 16:30 ` Russell King - ARM Linux admin
2019-02-12 20:11 ` Heiner Kallweit
2019-02-12 20:54 ` Heiner Kallweit
2019-02-12 22:55 ` Russell King - ARM Linux admin [this message]
2019-02-14 2:07 ` Andrew Lunn
2019-02-14 4:47 ` David Miller
2019-02-14 4:50 ` Andrew Lunn
2019-02-14 15:27 ` David Miller
2019-01-22 23:12 ` net: phylink: dsa: mv88e6xxx: flaky link detection on switch ports with internal PHYs Andrew Lunn
2019-01-22 23:48 ` John David Anglin
2019-01-23 0:00 ` John David Anglin
2019-01-23 0:04 ` Florian Fainelli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190212225505.hoodvbnnru6dliu7@shell.armlinux.org.uk \
--to=linux@armlinux.org.uk \
--cc=andrew@lunn.ch \
--cc=dave.anglin@bell.net \
--cc=f.fainelli@gmail.com \
--cc=hkallweit1@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=vivien.didelot@savoirfairelinux.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).