netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Alvin Šipraga" <ALSI@bang-olufsen.dk>
To: Saravana Kannan <saravanak@google.com>,
	Vladimir Oltean <olteanv@gmail.com>
Cc: Vladimir Oltean <vladimir.oltean@nxp.com>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	Jakub Kicinski <kuba@kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	Florian Fainelli <f.fainelli@gmail.com>,
	Andrew Lunn <andrew@lunn.ch>,
	Vivien Didelot <vivien.didelot@gmail.com>,
	Frank Rowand <frowand.list@gmail.com>,
	Rob Herring <robh+dt@kernel.org>
Subject: Re: [PATCH net] net: dsa: sja1105: fix use-after-free after calling of_find_compatible_node, or worse
Date: Wed, 18 Aug 2021 10:18:06 +0000	[thread overview]
Message-ID: <6b89a9e1-e92e-ca99-9fbd-1d98f6a7864b@bang-olufsen.dk> (raw)
In-Reply-To: <CAGETcx8T-ReJ_Gj-U+nxQyZPsv1v67DRBvpp9hS0fXgGRUQ17w@mail.gmail.com>

Hi Saravana,

On 8/18/21 4:46 AM, Saravana Kannan wrote:
> On Tue, Aug 17, 2021 at 3:31 PM Vladimir Oltean <olteanv@gmail.com> wrote:
>>
>> Hi Alvin,
>>
>> On Tue, Aug 17, 2021 at 09:25:28PM +0000, Alvin Šipraga wrote:
>>> I have an observation that's slightly out of the scope of your patch,
>>> but I'll post here on the off chance that you find it relevant.
>>> Apologies if it's out of place.
>>>
>>> Do these integrated NXP PHYs use a specific PHY driver, or do they just
>>> use the Generic PHY driver?
>>
>> They refuse to probe at all with the Generic PHY driver. I have been
>> caught off guard a few times now when I had a kernel built with
>> CONFIG_NXP_C45_TJA11XX_PHY=n and their probing returns -22 in that case.
>>
>>> If the former is the case, do you experience that the PHY driver fails
>>> to get probed during mdiobus registration if the kernel uses
>>> fw_devlink=on?
>>
>> I don't test with "fw_devlink=on" in /proc/cmdline, this is the first
>> time I do it. It behaves exactly as you say.
>>
>>>
>>> In my case I am writing a new subdriver for realtek-smi, a DSA driver
>>> which registers an internal MDIO bus analogously to sja1105, which is
>>> why I'm asking. I noticed a deferred probe of the PHY driver because the
>>> supplier (ethernet-switch) is not ready - presumably because all of this
>>> is happening in the probe of the switch driver. See below:
>>>
>>> [   83.653213] device_add:3270: device: 'SMI-0': device_add
>>> [   83.653905] device_pm_add:136: PM: Adding info for No Bus:SMI-0
>>> [   83.654055] device_add:3270: device: 'platform:ethernet-switch--mdio_bus:SMI-0': device_add
>>> [   83.654224] device_link_add:843: mdio_bus SMI-0: Linked as a sync state only consumer to ethernet-switch
>>> [   83.654291] libphy: SMI slave MII: probed
>>> ...
>>> [   83.659809] device_add:3270: device: 'SMI-0:00': device_add
>>> [   83.659883] bus_add_device:447: bus: 'mdio_bus': add device SMI-0:00
>>> [   83.659970] device_pm_add:136: PM: Adding info for mdio_bus:SMI-0:00
>>> [   83.660122] device_add:3270: device: 'platform:ethernet-switch--mdio_bus:SMI-0:00': device_add
>>> [   83.660274] devices_kset_move_last:2701: devices_kset: Moving SMI-0:00 to end of list
>>> [   83.660282] device_pm_move_last:203: PM: Moving mdio_bus:SMI-0:00 to end of list
>>> [   83.660293] device_link_add:859: mdio_bus SMI-0:00: Linked as a consumer to ethernet-switch
>>> [   83.660350] __driver_probe_device:736: bus: 'mdio_bus': __driver_probe_device: matched device SMI-0:00 with driver RTL8365MB-VC Gigabit Ethernet
>>> [   83.660365] device_links_check_suppliers:1001: mdio_bus SMI-0:00: probe deferral - supplier ethernet-switch not ready
>>> [   83.660376] driver_deferred_probe_add:138: mdio_bus SMI-0:00: Added to deferred list
>>
>> So it's a circular dependency? Switch cannot finish probing because it
>> cannot connect to PHY, which cannot probe because switch has not
>> finished probing, which....
> 
> Hi Vladimir/Alvin,
> 
> If there's a cyclic dependency between two devices, then fw_devlink=on
> is smart enough to notice that. Once it notices a cycle, it knows that
> it can't tell which one is the real dependency and which one is the
> false dependency and so stops enforcing ordering between the devices
> in the cycle.
> 
> But fw_devlink doesn't understand all the properties yet. Just most of
> them and I'm always trying to add more. So when it only understands
> the property that's causing the false dependency but not the property
> that causes the real dependency, it can cause issues like this where
> fw_devlink=on enforces the false dependency and the driver/code
> enforces the real dependency. These are generally easy to fix -- you
> just need to teach fw_devlink how to parse more properties.
> 
> This is just a preliminary analysis since I don't have all the info
> yet -- so I could be wrong. With that said, I happened to be working
> on adding fw_devlink support for phy-handle property and I think it
> should fix your issue with fw_devlink=on. Can you give [1] a shot?

I tried [1] but it did not seem to have any effect.

> 
> If it doesn't fix it, can one of you please point me to an upstream
> dts (not dtsi) file for a platform in which you see this issue? And
> ideally also the DT nodes and their drivers that are involved in this
> cycle? With that info, I should be able to root cause this if the
> patch above doesn't already fix it.

I'm working with a non-upstream dts - maybe Vladimir is using an 
upstream one? The pattern among the drivers/dts is common between our 
two cases.

But for the sake of this discussion, my dts is pretty much the same as 
what you will find in arch/arm/boot/dts/gemini-dlink-dir-685.dts. The 
nodes of interest from that dts file are below, and the driver is in 
drivers/net/ds/{realtek-smi-core.c,rtl8366rb.c}. It's expected that the 
Realtek PHY driver in drivers/net/phy/realtek.c will get probed as part 
of the mdiobus registration, but that never happens. See my previous 
reply for a debug log.

/ {
	switch {
		compatible = "realtek,rtl8366rb";
		mdc-gpios = <&gpio0 21 GPIO_ACTIVE_HIGH>;
		mdio-gpios = <&gpio0 22 GPIO_ACTIVE_HIGH>;
		reset-gpios = <&gpio0 14 GPIO_ACTIVE_LOW>;
		realtek,disable-leds;

		switch_intc: interrupt-controller {
			/* GPIO 15 provides the interrupt */
			interrupt-parent = <&gpio0>;
			interrupts = <15 IRQ_TYPE_LEVEL_LOW>;
			interrupt-controller;
			#address-cells = <0>;
			#interrupt-cells = <1>;
		};

		ports {
			/* snip */
		};

		mdio {
			compatible = "realtek,smi-mdio";
			#address-cells = <1>;
			#size-cells = <0>;

			phy0: phy@0 {
				reg = <0>;
				interrupt-parent = <&switch_intc>;
				interrupts = <0>;
			};
			phy1: phy@1 {
				reg = <1>;
				interrupt-parent = <&switch_intc>;
				interrupts = <1>;
			};
			phy2: phy@2 {
				reg = <2>;
				interrupt-parent = <&switch_intc>;
				interrupts = <2>;
			};
			phy3: phy@3 {
				reg = <3>;
				interrupt-parent = <&switch_intc>;
				interrupts = <3>;
			};
			phy4: phy@4 {
				reg = <4>;
				interrupt-parent = <&switch_intc>;
				interrupts = <12>;
			};
		};
	};
};

Thanks for looking into this. Let me know if you need any other info or 
want something tested.

Kind regards,
Alvin

> 
>>
>> So how is it supposed to be solved then? Intuitively the 'mdio_bus SMI-0:00'
>> device should not be added to the deferred list, it should have everything
>> it needs right now (after all, it works without fw_devlink). No?
>>
>> It might be the late hour over here too, but right now I just don't
>> know. Let me add Saravana to the discussion too, he made an impressive
>> analysis recently on a PHY probing issue with mdio-mux,
> 
> Lol, thanks for the kind words.
> 
>> so the PHY
>> library probing dependencies should still be fresh in his mind, maybe he
>> has an idea what's wrong.
> 
> [1] - https://lore.kernel.org/lkml/20210818021717.3268255-1-saravanak@google.com/T/#u>> 
> Thanks,
> Saravana
> 

  reply	other threads:[~2021-08-18 10:18 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-17 14:52 [PATCH net] net: dsa: sja1105: fix use-after-free after calling of_find_compatible_node, or worse Vladimir Oltean
2021-08-17 21:25 ` Alvin Šipraga
2021-08-17 22:05   ` Andrew Lunn
2021-08-17 22:19     ` Alvin Šipraga
2021-08-17 22:31   ` Vladimir Oltean
2021-08-17 22:40     ` Vladimir Oltean
2021-08-17 23:01     ` Alvin Šipraga
2021-08-18  2:46     ` Saravana Kannan
2021-08-18 10:18       ` Alvin Šipraga [this message]
2021-08-19  3:28         ` Saravana Kannan
2021-08-19 11:22           ` Vladimir Oltean
2021-08-19 13:46             ` Alvin Šipraga
2021-08-20  0:50             ` Saravana Kannan
2021-08-19 13:35           ` Andrew Lunn
2021-08-19 23:52             ` Saravana Kannan
2021-08-20  0:37               ` Vladimir Oltean
2021-08-20  1:25                 ` Saravana Kannan
2021-08-20 13:01               ` Andrew Lunn
2021-08-19 13:42           ` Alvin Šipraga
2021-08-20  1:08             ` Saravana Kannan
2021-08-20 16:52               ` Saravana Kannan
2021-08-20 17:54                 ` Andrew Lunn
2021-08-20 18:10                   ` Saravana Kannan
2021-08-22 14:19                 ` Alvin Šipraga
2021-08-23 18:50                   ` Saravana Kannan
2021-08-23 20:43                     ` Andrew Lunn
2021-08-23 21:23                       ` Saravana Kannan
2021-08-25 13:40                     ` Alvin Šipraga
2021-08-26  5:33                       ` Saravana Kannan
2021-08-26  7:49                         ` Saravana Kannan
2021-08-26 11:09                         ` Alvin Šipraga
2021-08-18  9:30 ` patchwork-bot+netdevbpf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6b89a9e1-e92e-ca99-9fbd-1d98f6a7864b@bang-olufsen.dk \
    --to=alsi@bang-olufsen.dk \
    --cc=andrew@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=f.fainelli@gmail.com \
    --cc=frowand.list@gmail.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=olteanv@gmail.com \
    --cc=robh+dt@kernel.org \
    --cc=saravanak@google.com \
    --cc=vivien.didelot@gmail.com \
    --cc=vladimir.oltean@nxp.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).