From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.1 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B413CC282C4 for ; Tue, 12 Feb 2019 20:55:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 7376D222C0 for ; Tue, 12 Feb 2019 20:55:09 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="R0ni+b4P" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732191AbfBLUzI (ORCPT ); Tue, 12 Feb 2019 15:55:08 -0500 Received: from mail-wr1-f65.google.com ([209.85.221.65]:32806 "EHLO mail-wr1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730035AbfBLUzH (ORCPT ); Tue, 12 Feb 2019 15:55:07 -0500 Received: by mail-wr1-f65.google.com with SMTP id i12so120328wrw.0 for ; Tue, 12 Feb 2019 12:55:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=tN1tObT6Y168b4WaGS75kqKd6GN+mxK89sspmsAnYzo=; b=R0ni+b4PLLc9HG8SWhWks0uvOk2WZnsdQh+ibluMOsX416dOUNDpMxjW6upac/dHrf LuhdbD/5D5MH65VcTMJISAwt8k5ikS/yh/h1YeVH8C+DNk4YAJX+sdVzwnmu2V6sqLVA 53zJf8/1wmIZlUs+d7QtpwwFT1l7ORIt67S6LcoXnDkwEqGNZym5cugSiuVuxJHpYQbQ kxUijH2Z0jutCTcNEGeImS8TMtoA3fUwN2dcxJ6h3Xl/y3DPMUmRpdEBwPqk+quAkoDS wlWh/pY9WMt2a/BPw7SkT4XJmbdmAUT4QvYTybBGh9gsEuVTa39R+lOrmbgTfZ384ArS 1uZA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=tN1tObT6Y168b4WaGS75kqKd6GN+mxK89sspmsAnYzo=; b=mqEovmQ6Sn8sLBYJYisMOMwtRA1yYwsvv3eRMD6RDZTb5nIoQW4s/MMXm9TxXc9K6v G0f1jpz04yWHzRmw5e5S7qAKCaP2rC8CqZ2v9rKZ/EqP3HFyJCp+bvZZOT7Y5PEU77yN GyiQdVbC3bHPIEYtnqGq3PX7kn52+dG+hx8mVKM7/vpB8Y1xitaBEy/qsJPXl879rxDx kXeySR+krokBs9N72AK4kjQrH12mM4ha810WAYnTIoolBf+oALF4bx5o9odGxnrECZOW dbH5741G4FjUJgBs0gf5Zip/lRQWINX7bLNOqPJPdLa39bwz2BMo5b1JpfV9CsDlok9U A9vA== X-Gm-Message-State: AHQUAuZVSTnDyAUVP6r13dvCla4rTZHhkuWYM1+ETwJex5hB1DP+GjA7 GnB3RKnCbjpjROpNjVJ7ibc8hSHP X-Google-Smtp-Source: AHgI3IYm9wdJe1WsRJUC7ZTJ39+SkjyZw7LON+Da4xcw7mLPC14NSorONJvZgEt3MT0BR63kl3VAkA== X-Received: by 2002:a5d:620d:: with SMTP id y13mr4356664wru.119.1550004904995; Tue, 12 Feb 2019 12:55:04 -0800 (PST) Received: from ?IPv6:2003:ea:8bf1:e200:788d:bad3:281d:6cd0? (p200300EA8BF1E200788DBAD3281D6CD0.dip0.t-ipconnect.de. [2003:ea:8bf1:e200:788d:bad3:281d:6cd0]) by smtp.googlemail.com with ESMTPSA id f22sm3404922wmj.26.2019.02.12.12.55.03 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 12 Feb 2019 12:55:03 -0800 (PST) Subject: Re: [PATCH net] dsa: mv88e6xxx: Ensure all pending interrupts are handled prior to exit To: Russell King - ARM Linux admin Cc: Andrew Lunn , John David Anglin , Vivien Didelot , Florian Fainelli , netdev@vger.kernel.org References: <20190130223846.GB30115@lunn.ch> <9415d82e-965b-7777-0ad0-f23d6c9f177e@bell.net> <53b49df8-53ed-704f-9197-230b18d83090@bell.net> <824d011b-3692-69c3-5e2c-58e950a80abf@bell.net> <6a1ebc61-3505-beb8-21cb-ea42ad9fe67e@bell.net> <20190211233327.GB8591@lunn.ch> <2b6bbb4c-1346-461b-ff7a-cb96b4142f7a@bell.net> <20190212035806.GE19023@lunn.ch> <13c1e6d5-c287-0091-3b24-1978f9a18e7e@gmail.com> <20190212163017.lwstmgtyw76cwrd7@shell.armlinux.org.uk> From: Heiner Kallweit Message-ID: <5ba5b654-ca61-253f-042a-2a178ff86b36@gmail.com> Date: Tue, 12 Feb 2019 21:54:55 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.5.0 MIME-Version: 1.0 In-Reply-To: <20190212163017.lwstmgtyw76cwrd7@shell.armlinux.org.uk> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On 12.02.2019 17:30, Russell King - ARM Linux admin wrote: > On Tue, Feb 12, 2019 at 07:51:05AM +0100, Heiner Kallweit wrote: >> On 12.02.2019 04:58, Andrew Lunn wrote: >>> That change means we don't check the PHY device if it caused an >>> interrupt when its state is less than UP. >>> >>> What i'm seeing is that the PHY is interrupting pretty early on after >>> a reboot when the previous boot had the interface up. >>> >> So this means that when going down for reboot the interrupts are not >> properly masked / disabled? Because (at least for net-next) we enable >> interrupts in phy_start() only. > [..] > In looking at this, I came across this chunk of code: > > static inline bool __phy_is_started(struct phy_device *phydev) > { > WARN_ON(!mutex_is_locked(&phydev->lock)); > > return phydev->state >= PHY_UP; > } > > /** > * phy_is_started - Convenience function to check whether PHY is started > * @phydev: The phy_device struct > */ > static inline bool phy_is_started(struct phy_device *phydev) > { > bool started; > > mutex_lock(&phydev->lock); > started = __phy_is_started(phydev); > mutex_unlock(&phydev->lock); > > return started; > } > > which looks to me like over-complication. The mutex locking there is > completely pointless - what are you trying to achieve with it? > > Let's go through this. The above is exactly equivalent to: > > bool phy_is_started(phydev) > { > int state; > > mutex_lock(&phydev->lock); > state = phydev->state; > mutex_unlock(&phydev->lock); > > return state >= PHY_UP; > } > > since when we do the test is irrelevant. Architectures that Linux > runs on are single-copy atomic, which means that reading phydev->state > itself is an atomic operation. So, the mutex locking around that > doesn't add to the atomicity of the entire operation. > > How, depending on what you do with the rest of this function depends > whether the entire operation is safe or not. For example, let's take > this code at the end of phy_state_machine(): > > if (phy_polling_mode(phydev) && phy_is_started(phydev)) > phy_queue_state_machine(phydev, PHY_STATE_TIME); > > state = PHY_UP > thread 0 thread 1 > phy_disconnect() > +-phy_is_started() > phy_is_started() | > `-phy_stop() > +-phydev->state = PHY_HALTED > `-phy_stop_machine() > `-cancel_delayed_work_sync() > phy_queue_state_machine() > `-mod_delayed_work() > > At this point, the phydev->state_queue() has been added back onto the > system workqueue despite phy_stop_machine() having been called and > cancel_delayed_work_sync() called on it. > > The original code in 4.20 did not have this race condition. > > Basically, the lock inside phy_is_started() does nothing useful, and > I'd say is dangerously misleading. > Then idea would be to first remove the locking from phy_is_started() and in a second step do the following to prevent the described race (phy_stop() takes phydev->lock too). diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c index c1ed03800..69dc64a4d 100644 --- a/drivers/net/phy/phy.c +++ b/drivers/net/phy/phy.c @@ -957,8 +957,10 @@ void phy_state_machine(struct work_struct *work) * state machine would be pointless and possibly error prone when * called from phy_disconnect() synchronously. */ + mutex_lock(&phydev->lock); if (phy_polling_mode(phydev) && phy_is_started(phydev)) phy_queue_state_machine(phydev, PHY_STATE_TIME); + mutex_unlock(&phydev->lock); } Heiner