From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 85E18C433FE for ; Thu, 2 Sep 2021 23:30:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6B3ED6101A for ; Thu, 2 Sep 2021 23:30:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239521AbhIBXbC (ORCPT ); Thu, 2 Sep 2021 19:31:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33600 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233191AbhIBXbB (ORCPT ); Thu, 2 Sep 2021 19:31:01 -0400 Received: from mail-yb1-xb2c.google.com (mail-yb1-xb2c.google.com [IPv6:2607:f8b0:4864:20::b2c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6ED13C061760 for ; Thu, 2 Sep 2021 16:30:02 -0700 (PDT) Received: by mail-yb1-xb2c.google.com with SMTP id k65so6907734yba.13 for ; Thu, 02 Sep 2021 16:30:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=6xTg00PepVPylOtm1J0Y6rkZ7Wc4zH5g0ApsRZq0FUI=; b=n0XEJv7UdxOAcuracft+Xvw5QLLeajqnN5CBStW2IoBqQrqE1Dbq87vM10iYzXFByF kUaZoFC1HUuRs53G3sY8xqoEzzF0znynaeWGk0K4NlkxVzF375NsTwszhwJPQ/fLd5QE wDNFWc4sRuP44bSNwqAWirv21TiIhf/t/RQ4VnwmBrB85MFz4Sd2psvxdmin8UuwQjKV BmuvEzVXD1CJKZiwK/c2LdjhBveXGZZ9u4UEZ8RnKeBLujx7P69Uemzzhc4rWPxaoI4m nksWWTXfCGzbqYSGzFUNuSeq0v30pBFOHG99yWvFShn47nWm5dZS+jHyOZYEfCcbLbxL /daw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=6xTg00PepVPylOtm1J0Y6rkZ7Wc4zH5g0ApsRZq0FUI=; b=PJSTObogAErvjAAAQoJuYtXtDrgorUbp2oLkRDGlmK6VwUioKaTIXgjflI9nGOpT8c 3sM3a3neG56XzWa1I/XoeqtUT2R0+lHwtLUF+mITaFPEn8T+JhpTyqvxKCEJ2Ozlmslr sZr2a8rUTUHY+kbhokxouoClnGnsvbwqMM9pBPOak6mp/4RRd2xRMIBeH9uP6VNan/Zb yX+qD1DAZQ9/KLR4cfroRhFHKtnT9NVxRZeeLpcB52+r06CG3eTAh1P9Cvy4V+veQSOj xaQ1OJdEurbkCuEHvKPO5TgcjWGQm/TFcB7SEHe6Ku10IBbzZXVG7kPn4ki1s+PwEKAP 4LAQ== X-Gm-Message-State: AOAM531zCk99aYUw5jYoCVha6apeLzlhg/PLPjk3h13wG3X502Ud/jcd QQy6W+i5g0Ihg5xgZbeBnTi50sW16EDXdiPkuwTVIA== X-Google-Smtp-Source: ABdhPJwDEizZwDXmFQo9X/jBsqiugcTqCUwjHAa6hvmWH2Qm4+fQv6S4de/oQBbZ8ps6OnpvyMtT6nU3W/ioGOS3xS4= X-Received: by 2002:a25:6746:: with SMTP id b67mr1341051ybc.96.1630625401243; Thu, 02 Sep 2021 16:30:01 -0700 (PDT) MIME-Version: 1.0 References: <20210901225053.1205571-1-vladimir.oltean@nxp.com> <20210902220520.hyybu6k3mjzbl7mn@skbuf> In-Reply-To: <20210902220520.hyybu6k3mjzbl7mn@skbuf> From: Saravana Kannan Date: Thu, 2 Sep 2021 16:29:25 -0700 Message-ID: Subject: Re: [RFC PATCH net-next 0/3] Make the PHY library stop being so greedy when binding the generic PHY driver To: Vladimir Oltean Cc: Vladimir Oltean , netdev@vger.kernel.org, Greg Kroah-Hartman , "Rafael J. Wysocki" , Andrew Lunn , Heiner Kallweit , Russell King , "David S. Miller" , Jakub Kicinski , Vivien Didelot , Florian Fainelli , linux-kernel@vger.kernel.org, Linus Walleij , =?UTF-8?Q?Alvin_=C5=A0ipraga?= , ACPI Devel Maling List , kernel-team , Len Brown Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Sep 2, 2021 at 3:05 PM Vladimir Oltean wrote: > > On Thu, Sep 02, 2021 at 01:50:50AM +0300, Vladimir Oltean wrote: > > This is a continuation of the discussion on patch "[v1,1/2] driver core: > > fw_devlink: Add support for FWNODE_FLAG_BROKEN_PARENT" from here: > > https://patchwork.kernel.org/project/netdevbpf/patch/20210826074526.825517-2-saravanak@google.com/ > > > > Summary: in a complex combination of device dependencies which is not > > really relevant to what is being proposed here, DSA ends up calling > > phylink_of_phy_connect during a period in which the PHY driver goes > > through a series of probe deferral events. > > > > The central point of that discussion is that DSA seems "broken" for > > expecting the PHY driver to probe immediately on PHYs belonging to the > > internal MDIO buses of switches. A few suggestions were made about what > > to do, but some were not satisfactory and some did not solve the problem. > > > > In fact, fw_devlink, the mechanism that causes the PHY driver to defer > > probing in this particular case, has some significant "issues" too, but > > its "issues" are only in quotes "because at worst it'd allow a few > > unnecessary deferred probes": > > https://patchwork.kernel.org/project/netdevbpf/patch/20210826074526.825517-2-saravanak@google.com/#24418895 > > > > So if that's the criterion by which an issue is an issue, maybe we > > should take a step back and look at the bigger picture. > > > > There is nothing about the idea that a PHY might defer probing, or about > > the changes proposed here, that has anything with DSA. Furthermore, the > > changes done by this series solve the problem in the same way: "they > > allow a few unnecessary deferred probes" <- in this case they provoke > > this to the caller of phy_attach_direct. > > > > If we look at commit 16983507742c ("net: phy: probe PHY drivers > > synchronously"), we see that the PHY library expectation is for the PHY > > device to have a PHY driver bound to it as soon as device_add() finishes. > > > > Well, as it turns out, in case the PHY device has any supplier which is > > not ready, this is not possible, but that patch still seems to ensure > > that the process of binding a driver to the device has at least started. > > That process will continue for a while, and will race with > > phy_attach_direct calls, so we need to make the latter observe the fact > > that a driver is struggling to probe, and wait for it a bit more. > > > > What I've not tested is loading the PHY module at runtime, and seeing > > how phy_attach_direct behaves then. I expect that this change set will > > not alter the behavior in that case: the genphy will still bind to a > > device with no driver, and phy_attach_direct will not return -EPROBE_DEFER > > in that case. > > > > I might not be very versed in the device core/internals, but the patches > > make sense to me, and worked as intended from the first try on my system > > (Turris MOX with mv88e6xxx), which was modified to make the same "sins" > > as those called out in the thread above: > > > > - use PHY interrupts provided by the switch itself as an interrupt-controller > > - call of_mdiobus_register from setup() and not from probe(), so as to > > not circumvent fw_devlink's limitations, and still get to hit the PHY > > probe deferral conditions. > > > > So feedback and testing on other platforms is very appreciated. > > > > Vladimir Oltean (3): > > net: phy: don't bind genphy in phy_attach_direct if the specific > > driver defers probe > > net: dsa: destroy the phylink instance on any error in > > dsa_slave_phy_setup > > net: dsa: allow the phy_connect() call to return -EPROBE_DEFER > > > > drivers/base/dd.c | 21 +++++++++++++++++++-- > > drivers/net/phy/phy_device.c | 8 ++++++++ > > include/linux/device.h | 1 + > > net/dsa/dsa2.c | 2 ++ > > net/dsa/slave.c | 12 +++++------- > > 5 files changed, 35 insertions(+), 9 deletions(-) > > > > -- > > 2.25.1 > > > > Ouch, I just realized that Saravana, the person whose reaction I've been > waiting for the most, is not copied.... > > Saravana, you can find the thread here to sync up with what has been > discussed: > https://patchwork.kernel.org/project/netdevbpf/cover/20210901225053.1205571-1-vladimir.oltean@nxp.com/ Woah! The thread blew up. > > Sorry. No worries. I'll read through the thread later and maybe provide more responses, but one thing I wanted to say right away: Don't depend on dev->p->deferred_probe. It can be "empty" for a device that has returned -EPROBE_DEFER for a bunch of reasons: 1. When the device is in the middle of being reattempted, it would be empty. You can't hold any lock that'll ensure correctness either because deferred probe locking is a mess (I'm working on cleaning that up). 2. I'm working on actually not adding devices to that list if there's a known supplier that hasn't been probed yet. No point retrying it again and again for every deferred probe trigger when we know it's going to fail. And we'll basically get topological probe ordering. Your closest bet right now is d->can_match. Only caveat is that it's not cleared if the actual driver gets unregistered. -Saravana