From mboxrd@z Thu Jan 1 00:00:00 1970 From: Florian Fainelli Subject: Re: [PATCH RFC 0/4] Fixes for Marvell MII paged register access races Date: Sat, 9 Dec 2017 10:19:49 -0800 Message-ID: <204d19c0-05eb-3b03-275e-2a6d111cd1b0@gmail.com> References: <20171208154756.GF10595@n2100.armlinux.org.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Cc: netdev@vger.kernel.org To: Russell King - ARM Linux , Andrew Lunn Return-path: Received: from mail-ot0-f193.google.com ([74.125.82.193]:35455 "EHLO mail-ot0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750954AbdLISTx (ORCPT ); Sat, 9 Dec 2017 13:19:53 -0500 Received: by mail-ot0-f193.google.com with SMTP id q3so11671953oth.2 for ; Sat, 09 Dec 2017 10:19:53 -0800 (PST) In-Reply-To: <20171208154756.GF10595@n2100.armlinux.org.uk> Content-Language: en-US Sender: netdev-owner@vger.kernel.org List-ID: On 12/08/2017 07:47 AM, Russell King - ARM Linux wrote: > Hi, > > While doing final testing of the mvneta changes for phylink, a very > easy to trigger race condition was found with the Marvell PHY driver > which manifested itself as the link going down when a hibernate cycle > terminates. > > The issue turned out to be a race between two threads accessing the > PHY - one trying to do a status read and the other configuring the PHY. > > The result is the configuration thread tries to read-modify-write a > paged register in a non-copper page, but the status read thread > switches the PHY back to the copper page half-way through. > > Various solutions involving phy->lock were considered, but found to > create more lock dependency issues than were nice to deal with. I can certainly imagine that, because you would ideally want a phy_device wide lock which serializes the page entry/exit, which needs to be able to run within the PHY state machine (so with phydev->lock held), but also from a context where phydev->lock is not held, wheee. > > The solution proposed here uses the mdiobus lock to ensure that accesses > to paged registers become atomic with respect to all other bus accesses, > including those from userspace. > > There is an open question whether there should be generic helpers for > this. Generic helpers would mean: > > - Additional couple of function pointers in phy_driver to read/write the > paging register. This has the restriction that there must only be one > paging register. > > - The helpers become more expensive, and because they're in a separate > compilation unit, the compiler will be unable to optimise them by > inlining the static functions. > > - The helpers would be re-usable, saving replications of that code, and > making it more likely for phy authors to safely access the PHY. > > Another potential question is whether using the mdiobus lock (which > excludes all other MII bus access) is best - while it has the advantage > of also ensuring atomicity with userspace accesses, it means that no one > else can access an independent PHY on the same bus while a paged access > is on-going. It feels like a big hammer, but I'm not convinced that we > will see a lot of contention on it. Regarding that last topic, this could become a fairly contended lock on a switch with lots (e.g: > 5-6) of built-in PHYs, all being polled (which is usually the case right now). One would expect that the polling should be limited to 2 BMSR reads to minimize the bus utilization. > > Comments? > > drivers/net/phy/marvell.c | 365 +++++++++++++++++++++------------------------ > drivers/net/phy/mdio_bus.c | 65 ++++++-- > drivers/net/phy/phy-core.c | 11 +- > include/linux/mdio.h | 3 + > include/linux/phy.h | 26 ++++ > 5 files changed, 256 insertions(+), 214 deletions(-) > -- Florian