From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9BAAAC432BE for ; Fri, 27 Aug 2021 04:02:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 76EC660F45 for ; Fri, 27 Aug 2021 04:02:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234759AbhH0EDo (ORCPT ); Fri, 27 Aug 2021 00:03:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47764 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234024AbhH0EDm (ORCPT ); Fri, 27 Aug 2021 00:03:42 -0400 Received: from mail-yb1-xb33.google.com (mail-yb1-xb33.google.com [IPv6:2607:f8b0:4864:20::b33]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 039F1C061796 for ; Thu, 26 Aug 2021 21:02:54 -0700 (PDT) Received: by mail-yb1-xb33.google.com with SMTP id z5so10078619ybj.2 for ; Thu, 26 Aug 2021 21:02:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=5EX0Ib5T66NZEZQUvNen4rx8W6LbQAQc16oQp/aoMH8=; b=q2wz4tOPNUK/CXTrqsq327gnvpzMOo2L//zvFOMKtCAvsMZWguv+T8jRd35dUN7jm0 3h7Vef3WRREfLGFp4HP9fb4AzB8oJNmS7Przcq/dfU5QgNU76mFG5ggqILgq63LH3fIw FJJZgu1FH3UTKcBOi0588DRyPPZ3DFo0mhdcqwy0wOutq9ZPY9LUBr4X/TAlMKlJ/iu1 /EYSG7UnjZ+o2iHGPPeLeBrd0XmE5jHZICZcNESLQXv2idlHQxE2k4K4uZeRzUzfRprE orRrgVI+cydeAu181BPJa6GCZCIJ6URe6iQpA7AtfhvF2LI3x9HLALTjlR3cDed15lNV 2HuQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=5EX0Ib5T66NZEZQUvNen4rx8W6LbQAQc16oQp/aoMH8=; b=kHYjs1i5++VfJYYYcGlugNatQnKMuVzFWMmvXQFT/R7TKB1BEB2gMjx6nduzgJiLwj 8wwBj7877PnZMsI50mYth7JfHVMyoLrQVBPqiONtlt/k0f6ehYE2YysADdALwKyf7Kcd nqc14ryyqqptmcGURW7XRCy44to/ZEzLqUITACzOO5w62U3H+mRWZ9Xd2PXz8qHI92Rl dNEbuJWFEM2baTLx9jN94lZgmIUsvKh7u/ezY8DoRv1LBHgul3pu9YgEH72/OSw25yX2 KyvvSiTb7nQ82OPYTDKeAaqSQH4LGBmlgB4GVXoY9VTQJORHkG+ja8l5veMwqzSHIFDk 0hjg== X-Gm-Message-State: AOAM531aPN/FO0LiZjm08Dg+59U13On1oLnITL3EH7nc88Lja2+8+mnp dU2h1gWDCfFrpLn4a3zPBvdMvRs54J5flLYnPmpArw== X-Google-Smtp-Source: ABdhPJyM0ggNxJXeHokAmTYi2nyU/dC2IeiojBgwYL1tAXquHUrAImkrDV6i+MVdfHqIjbkB7k+XWvdvv8LmFN6fZEU= X-Received: by 2002:a25:d213:: with SMTP id j19mr2860896ybg.20.1630036972958; Thu, 26 Aug 2021 21:02:52 -0700 (PDT) MIME-Version: 1.0 References: <20210826074526.825517-1-saravanak@google.com> <20210826074526.825517-2-saravanak@google.com> In-Reply-To: From: Saravana Kannan Date: Thu, 26 Aug 2021 21:02:16 -0700 Message-ID: Subject: Re: [PATCH v1 1/2] driver core: fw_devlink: Add support for FWNODE_FLAG_BROKEN_PARENT To: Andrew Lunn Cc: Greg Kroah-Hartman , "Rafael J. Wysocki" , Linus Walleij , Vivien Didelot , Florian Fainelli , Vladimir Oltean , "David S. Miller" , Jakub Kicinski , Len Brown , Alvin Sipraga , kernel-team@android.com, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, linux-acpi@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Aug 26, 2021 at 6:23 PM Andrew Lunn wrote: > > > Doesn't add much to the discussion. In the example I gave, the driver > > already does synchronous probing. If the device can't probe > > successfully because a supplier isn't ready, it doesn't matter if it's > > a synchronous probe. The probe would still be deferred and we'll hit > > the same issue. Even in the situation the commit [5] describes, if > > parallelized probing is done and the PHY depended on something (say a > > clock), you'd still end up not probing the PHY even if the driver is > > present and the generic PHY would end up force probing it. > > > genphy is meant to be used when there is no other driver available. > It is a best effort, better than nothing, might work. And quite a few > boards rely on it. However, it should not be used when there is a > specific driver. Agreed, that's what we are trying to ensure. > So if the PHY device has been probed, and -EPROBE_DEFER was returned, > we also need to return -EPROBE_DEFER here when deciding if genphy > should be used. It should then all unwind and try again later. Yes, I think dsa_register_switch() returning -EPROBE_DEFER if the PHYs returned -EPROBE_DEFER might be okay (I think we should do it), but that doesn't solve the problem for this driver. fw_devlink=on/device links short circuits the probe() call of a consumer (in this case the PHY) and returns -EPROBE_DEFER if the supplier's (in this case switch) probe hasn't finished without an error. fw_devlink/device links effectively does the probe in graph topological order and there's a ton of good reasons to do it that way -- what's why fw_devlink=on was implemented. In this specific case though, since the PHY depends on the parent device, if we fail the parent's probe realtek_smi_probe() because the PHYs failed to probe, we'll get into a catch-22/chicken-n-egg situation and the switch/PHYs will never probe. I think a clean way to fix this at the driver level is to do what I said in [6]. Copy pasting it here and expanding it a bit: 1. The IRQ registration and mdio bus registration should get moved to realtek_smi_probe() which probes "realtek,rtl8366rb". So realtek_smi_probe() succeeding doesn't depend on its child devices probing successfully (which makes sense for any parent device). 2. realtek_smi_probe() should also create/register a component-master/aggregate device that's "made up of" realtek,rtl8366rb and all the PHYs. So the component-master will wait for all of them to finish probing before it's initialized. 3. PHYs will probe successfully now because realtek,rtl8366rb probe() which is the supplier's probe has finished probing without problems. 4. The component device's init (the .bind op) would call dsa_register_switch() which kinda makes sense because the rtl8366rb and all the PHYs combined together is what makes up the logical DSA switch. The dsa_register_switch() will succeed and will be using the right/specific PHY driver. The same applies for any switch that has the PHYs as it's child device AND (this is the key part) the PHYs depend on the switch as a supplier (remember, if we didn't have the interrupt dependency, this would not be an issue). > I don't know the device core, but it looks like dev->can_match tells > us what we need to know. If true, we know there is a driver for this > device. But i'm hesitant to make use of this outside of driver/base. can_match is never cleared once it's set and it's meant as an optimization/preserving some probe order stuff. I wouldn't depend on it for this case. We can just have a phy_has_driver() function that searches all the currently registered PHY drivers to check if there's a matching driver. And dsa_register_switch() or phy_attach_direct() can return -EPROBE_DEFER if there is a driver but it isn't bound yet. Again, this is orthogonal to the realtek driver fix though because of the catch-22 situation above. -Saravana [6] - https://lore.kernel.org/netdev/CAGETcx8_vxxPxF8WrXqk=PZYfEggsozP+z9KyOu5C2bEW0VW8g@mail.gmail.com/