From mboxrd@z Thu Jan 1 00:00:00 1970 From: Florian Fainelli Subject: Re: [PATCH net] Revert "net: phy: Correctly process PHY_HALTED in phy_stop_machine()" Date: Thu, 31 Aug 2017 10:03:21 -0700 Message-ID: References: <1504140569-2063-1-git-send-email-f.fainelli@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Cc: netdev , Geert Uytterhoeven , David Miller , Andrew Lunn , Mans Rullgard , Mason To: Marc Gonzalez , David Daney Return-path: Received: from mail-qk0-f194.google.com ([209.85.220.194]:34038 "EHLO mail-qk0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750815AbdHaRD2 (ORCPT ); Thu, 31 Aug 2017 13:03:28 -0400 Received: by mail-qk0-f194.google.com with SMTP id a77so179203qkb.1 for ; Thu, 31 Aug 2017 10:03:27 -0700 (PDT) In-Reply-To: Content-Language: en-US Sender: netdev-owner@vger.kernel.org List-ID: On 08/31/2017 05:29 AM, Marc Gonzalez wrote: > On 31/08/2017 02:49, Florian Fainelli wrote: > >> This reverts commit 7ad813f208533cebfcc32d3d7474dc1677d1b09a ("net: phy: >> Correctly process PHY_HALTED in phy_stop_machine()") because it is >> creating the possibility for a NULL pointer dereference. >> >> David Daney provide the following call trace and diagram of events: >> >> When ndo_stop() is called we call: >> >> phy_disconnect() >> +---> phy_stop_interrupts() implies: phydev->irq = PHY_POLL; > > What does this mean? > > On the contrary, phy_stop_interrupts() is only called when *not* polling. > > if (phydev->irq > 0) > phy_stop_interrupts(phydev); > >> +---> phy_stop_machine() >> | +---> phy_state_machine() >> | +----> queue_delayed_work(): Work queued. > > You're referring to the fact that, at the end of phy_state_machine() > (in polling mode) the code reschedules itself through: > > if (phydev->irq == PHY_POLL) > queue_delayed_work(system_power_efficient_wq, &phydev->state_queue, PHY_STATE_TIME * HZ); > >> +--->phy_detach() implies: phydev->attached_dev = NULL; >> >> Now at a later time the queued work does: >> >> phy_state_machine() >> +---->netif_carrier_off(phydev->attached_dev): Oh no! It is NULL: > > I tested a sequence of 500 link up / link down in polling mode, > and saw no such issue. Race condition? > > For what case in phy_state_machine() is netif_carrier_off() > being called? Surely not PHY_HALTED? > > >> The original motivation for this change originated from Marc Gonzales >> indicating that his network driver did not have its adjust_link callback >> executing with phydev->link = 0 while he was expecting it. > > I expect the core to call phy_adjust_link() for link changes. > This used to work back in 3.4 and was broken somewhere along > the way. If that was working correctly in 3.4 surely we can look at the diff and figure out what changed, even maybe find the offending commit, can you do that? > >> PHYLIB has never made any such guarantees ever because phy_stop() merely >> just tells the workqueue to move into PHY_HALTED state which will happen >> asynchronously. > > My original proposal was to fix the issue in the driver. > I'll try locating it in my archives. Yes I remember you telling that, by the way I don't think you ever provided a clear explanation why this is absolutely necessary for your driver though? -- Florian