From: Saravana Kannan <email@example.com> To: Vladimir Oltean <firstname.lastname@example.org> Cc: Andrew Lunn <email@example.com>, Greg Kroah-Hartman <firstname.lastname@example.org>, "Rafael J. Wysocki" <email@example.com>, Linus Walleij <firstname.lastname@example.org>, Vivien Didelot <email@example.com>, Florian Fainelli <firstname.lastname@example.org>, "David S. Miller" <email@example.com>, Jakub Kicinski <firstname.lastname@example.org>, Len Brown <email@example.com>, Alvin Sipraga <ALSI@bang-olufsen.dk>, firstname.lastname@example.org, email@example.com, firstname.lastname@example.org, email@example.com Subject: Re: [PATCH v1 1/2] driver core: fw_devlink: Add support for FWNODE_FLAG_BROKEN_PARENT Date: Tue, 31 Aug 2021 19:19:40 -0700 [thread overview] Message-ID: <CAGETcx8r7o9u9bveQx6TAXG8YLH+aiuz9VZ5pLACm=S6KxNpWQ@mail.gmail.com> (raw) In-Reply-To: <20210901013830.yst73ubhsrlml54i@skbuf> On Tue, Aug 31, 2021 at 6:38 PM Vladimir Oltean <firstname.lastname@example.org> wrote: > > On Wed, Sep 01, 2021 at 04:28:26AM +0300, Vladimir Oltean wrote: > > On Wed, Sep 01, 2021 at 02:18:04AM +0300, Vladimir Oltean wrote: > > > On Wed, Sep 01, 2021 at 01:02:09AM +0200, Andrew Lunn wrote: > > > > Rev B is interesting because switch0 and switch1 got genphy, while > > > > switch2 got the correct Marvell PHY driver. switch2 PHYs don't have > > > > interrupt properties, so don't loop back to their parent device. > > > > > > This is interesting and not what I really expected to happen. It goes to > > > show that we really need more time to understand all the subtleties of > > > device dependencies before jumping on patching stuff. > > > > There is an even more interesting variation which I would like to point > > out. It seems like a very odd loophole in the device links. > > > > Take the example of the mv88e6xxx DSA driver. On my board > > (arch/arm64/boot/dts/marvell/armada-3720-turris-mox.dts), even after I > > had to declare the switches as interrupt controller and add interrupts > > to their internal PHYs, I still need considerable force to 'break' this > > board in the way discussed in this thread. The correct PHY driver insists > > to probe, and not genphy. Let me explain. > > > > The automatic device links between the switch (supplier, as interrupt-controller) > > and PHYs (consumers) are added by fwnode_link_add, called from of_link_to_phandle. > > > > Important note: fwnode_link_add does not link devices, it links OF nodes. > > > > Even more important node, in the form of a comment: > > > > * The driver core will use the fwnode link to create a device link between the > > * two device objects corresponding to @con and @sup when they are created. The > > * driver core will automatically delete the fwnode link between @con and @sup > > * after doing that. > > > > Okay?! > > > > What seems to be omitted is that the DSA switch driver's probing itself > > can be deferred. For example: > > > > dsa_register_switch > > -> dsa_switch_probe > > -> dsa_switch_parse_of > > -> dsa_switch_parse_ports_of > > -> dsa_port_parse_of > > -> of_find_net_device_by_node(of_parse_phandle(dn, "ethernet", 0)); > > -> not found => return -EPROBE_DEFER > > > > When dsa_register_switch() returns -EPROBE_DEFER, it is effectively > > an error path. So the reverse of initialization is performed. > > > > The mv88e6xxx driver calls mv88e6xxx_mdios_register() right _before_ > > dsa_register_switch. So when dsa_register_switch returns error code, > > mv88e6xxx_mdios_unregister() will be called. > > > > When mv88e6xxx_mdios_unregister() is called, the MDIO buses with > > internal PHYs are destroyed. So the PHY devices themselves are destroyed > > too. And the device links between the DSA switch and the internal PHYs, > > those created based on the firmware node links created by fwnode_link_add, > > are dropped too. > > > > Now remember the comment that the device links created based on > > fwnode_link_add are not restored. > > > > So probing of the DSA switch finally resumes, and this time > > device_links_check_suppliers() is effectively bypassed, the PHYs no > > longer request probe deferral due to their supplier not being ready, > > because the device link no longer exists. > > > > Isn't this self-sabotaging?! Yeah, this is a known "issue". I'm saying "issue" because at worst it'd allow a few unnecessary deferred probes. And if you want to break or get fw_devlink to ignore your child devices or your consumers, there are simpler APIs to do it without having to intentionally defer a probe. Fixing this "issue" would just use up more memory and increase boot time for no meaningful benefit. > > > > Now generally, DSA drivers defer probing because they probe in parallel > > with the DSA master. This is typical if the switch is on a SPI bus, or > > I2C, or on an MDIO bus provided by a _standalone_ MDIO controller. > > > > If the MDIO controller is not standalone, but is provided by Ethernet > > controller that is the DSA master itself, then things change a lot, > > because probing can never be parallel. The DSA master probes, > > initializes its MDIO bus, and this triggers the probing of the MDIO > > devices on that bus, one of which is the DSA switch. So DSA can no > > longer defer the probe due to that reason. > > > > Secondly, in DSA we even have variation between drivers as to where they > > register their internal MDIO buses. The mv88e6xxx driver does this in > > mv88e6xxx_probe (the probe function on the MDIO bus). The rtl8366rb > > driver calls realtek_smi_setup_mdio() from rtl8366rb_setup(), and this > > is important. DSA provides drivers with a .setup() callback, which is > > guaranteed to take place after nothing can defer the switch's probe > > anymore. > > > > So putting two and two together, sure enough, if I move mv88e6xxx_mdios_register > > from mv88e6xxx_probe to mv88e6xxx_setup, then I can reliably break this > > setup, because the device links framework isn't sabotaging itself anymore. > > > > Conversely, I am pretty sure that if rtl8366rb was to call of_mdiobus_register() > > from the probe method and not the setup method, the entire design issue > > with interrupts on internal DSA switch ports would have went absolutely > > unnoticed for a few more years. > > > > I have not tested this, but it also seems plausible that DSA can > > trivially and reliably bypass any fw_devlink=on restrictions by simply > > moving all of_mdiobus_register() driver calls from the .setup() method > > to their respective probe methods (prior to calling dsa_register_switch), > > then effectively fabricate an -EPROBE_DEFER during the first probe attempt. > > I mean, who will know whether that probe deferral request was justified > > or not? > > Pushing the thought even further, it is not even necessary to move the > of_mdiobus_register() call to the probe function. Where it is (in .setup) > is already good enough. It is sufficient to return -EOPNOTSUPP once > (the first time) immediately _after_ the call to of_mdiobus_register > (and have a proper error path, i.e. call mdiobus_unregister too). Right, there are plenty of ways to intentionally break fw_devlink. I hope that's not the point :) And I don't think -EOPNOTSUPP would work because your device wouldn't be probed again. > > > Anyway, I'm not sure everyone agrees with this type of "solution" (even > > though it's worth pointing it out as a fw_devlink limitation). In any > > case, we need some sort of lightweight "fix" to the chicken-and-egg > > problem, which will give me enough time to think of something better. I think the generic DSA patch I gave would be the lightweight fix to address this chicken-and-egg issue. As for the long term fix, I'd really suggest looking into using the component device model. I'd even be happy to help make any driver core/component device improvements you might need. I'm also interested in looking into improving the PHY probing so that the genphy never probes a device that has a driver that could probe it. Even outside of all this fw_devlink thing, they way PHY is handled now, if any of the supplier really isn't ready yet (say a clock), then the genphy gets used -- which isn't good. -Saravana > > I hope it is at least clearer now that there are subtleties and nuances, > > and we cannot just assess how many boards are broken by looking at the > > device trees. By design, all are, sure, but they might still work, and > > that's better than nothing...
next prev parent reply other threads:[~2021-09-01 2:20 UTC|newest] Thread overview: 57+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-08-26 7:45 [PATCH v1 0/2] Fix rtl8366rb issues with fw_devlink=on Saravana Kannan 2021-08-26 7:45 ` [PATCH v1 1/2] driver core: fw_devlink: Add support for FWNODE_FLAG_BROKEN_PARENT Saravana Kannan 2021-08-26 13:13 ` Andrew Lunn 2021-08-26 19:56 ` Saravana Kannan 2021-08-26 20:53 ` Andrew Lunn 2021-08-26 23:44 ` Saravana Kannan 2021-08-27 1:23 ` Andrew Lunn 2021-08-27 4:02 ` Saravana Kannan 2021-08-27 13:44 ` Andrew Lunn 2021-08-27 18:06 ` Saravana Kannan 2021-08-27 20:11 ` Andrew Lunn 2021-08-27 21:33 ` Saravana Kannan 2021-08-28 17:01 ` Andrew Lunn 2021-08-30 19:03 ` Saravana Kannan 2021-08-31 13:16 ` Andrew Lunn 2021-08-31 19:26 ` Saravana Kannan 2021-08-31 22:05 ` Andrew Lunn 2021-08-31 22:18 ` Saravana Kannan 2021-08-31 23:02 ` Andrew Lunn 2021-08-31 23:18 ` Vladimir Oltean 2021-08-31 23:27 ` Andrew Lunn 2021-09-01 1:28 ` Vladimir Oltean 2021-09-01 1:38 ` Vladimir Oltean 2021-09-01 2:19 ` Saravana Kannan [this message] 2021-09-01 9:02 ` Vladimir Oltean 2021-09-01 22:57 ` Saravana Kannan 2021-09-01 2:00 ` Saravana Kannan 2021-09-01 8:46 ` Vladimir Oltean 2021-09-01 22:53 ` Saravana Kannan 2021-09-02 17:41 ` Andrew Lunn 2021-09-02 17:58 ` Saravana Kannan 2021-09-30 5:33 ` Saravana Kannan 2021-09-30 13:15 ` Andrew Lunn 2021-09-30 13:43 ` Vladimir Oltean 2021-09-30 14:00 ` Andrew Lunn 2021-09-30 17:31 ` Saravana Kannan 2021-09-30 19:38 ` Andrew Lunn 2021-09-30 19:48 ` Saravana Kannan 2021-09-30 20:06 ` Florian Fainelli 2021-09-30 20:14 ` Saravana Kannan 2021-09-30 21:16 ` Florian Fainelli 2021-09-30 20:22 ` Andrew Lunn 2021-09-01 1:46 ` Saravana Kannan 2021-08-31 23:18 ` Andrew Lunn 2021-09-08 18:35 ` Saravana Kannan 2021-09-09 1:39 ` Andrew Lunn 2021-09-09 3:21 ` Saravana Kannan 2021-09-09 10:38 ` Vladimir Oltean 2021-09-09 15:00 ` Andrew Lunn 2021-09-09 16:56 ` Florian Fainelli 2021-08-26 7:45 ` [PATCH v1 2/2] net: dsa: rtl8366rb: Quick fix to work with fw_devlink=on Saravana Kannan 2021-08-26 7:55 ` Greg Kroah-Hartman 2021-08-26 11:25 ` Florian Fainelli 2021-08-26 17:29 ` Saravana Kannan 2021-08-26 11:29 ` Alvin Šipraga 2021-08-26 17:26 ` Saravana Kannan 2021-08-26 18:04 ` Alvin Šipraga
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to='CAGETcx8r7o9u9bveQx6TAXG8YLH+aiuz9VZ5pLACm=S6KxNpWQ@mail.gmail.com' \ --email@example.com \ --cc=ALSI@bang-olufsen.dk \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --subject='Re: [PATCH v1 1/2] driver core: fw_devlink: Add support for FWNODE_FLAG_BROKEN_PARENT' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).