All of lore.kernel.org
 help / color / mirror / Atom feed
* Lockup in phy_probe() for MDIO device (Broadcom's switch)
@ 2021-09-30  9:58 Rafał Miłecki
  2021-09-30 10:15 ` Rafał Miłecki
  2021-09-30 10:17 ` Russell King (Oracle)
  0 siblings, 2 replies; 17+ messages in thread
From: Rafał Miłecki @ 2021-09-30  9:58 UTC (permalink / raw)
  To: Andrew Lunn, Heiner Kallweit, Russell King, Network Development
  Cc: Florian Fainelli, BCM Kernel Feedback, Vivek Unune

Hi,

I've just received a report of kernel lockup after switching OpenWrt
platform from kernel 5.4 to kernel 5.10:
https://bugs.openwrt.org/index.php?do=details&task_id=4055

The problem is phy_probe() and its:
mutex_lock(&phydev->lock);

It seems to me that "lock" mutex doesn't get initalized. It seems
phy_device_create() doesn't get called for an MDIO device.

This isn't necessarily a PHY / MDIO regression. It could be some core
change that exposed a PHY / MDIO bug.


*** Lockup ***

[    0.000000] Booting Linux on physical CPU 0x0
[    0.000000] Linux version 5.10.64 (rmilecki@localhost.localdomain) (arm-openwrt-linux-muslgnueabi-gcc (OpenWrt GCC 11.2.0 r17558+1-71e96532df) 11.2.0, GNU ld (GNU Binutils) 2.36.1) #0 SMP Wed Sep 29 20:08:07 2021
(...)
[    5.592447] libphy: Fixed MDIO Bus: probed
[    5.596809] [of_mdiobus_register:254] np:/mdio@18003000
[    5.602333] libphy: iProc MDIO bus: probed
[    5.606479] iproc-mdio 18003000.mdio: Broadcom iProc MDIO bus registered
[    5.613439] [of_mdiobus_register:254] np:/mdio-mux@18003000/mdio@0
[    5.620101] libphy: mdio_mux: probed
[    5.623709] [of_mdiobus_register:282] child:/mdio-mux@18003000/mdio@0/usb3-phy@10
[    5.631571] [of_mdiobus_register:254] np:/mdio-mux@18003000/mdio@200
[    5.638426] libphy: mdio_mux: probed
[    5.642032] [of_mdiobus_register:282] child:/mdio-mux@18003000/mdio@200/switch@0
[    5.649841] ------------[ cut here ]------------
[    5.654503] WARNING: CPU: 0 PID: 1 at drivers/net/phy/phy_device.c:2839 phy_probe+0x58/0x1e8
[    5.662983] Modules linked in:
[    5.666055] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.10.64 #0
[    5.672074] Hardware name: BCM5301X
[    5.675587] [<c0108410>] (unwind_backtrace) from [<c0104bc4>] (show_stack+0x10/0x14)
[    5.683359] [<c0104bc4>] (show_stack) from [<c03dbfc8>] (dump_stack+0x94/0xa8)
[    5.690609] [<c03dbfc8>] (dump_stack) from [<c01183e4>] (__warn+0xb8/0x114)
[    5.697591] [<c01183e4>] (__warn) from [<c01184a8>] (warn_slowpath_fmt+0x68/0x78)
[    5.705095] [<c01184a8>] (warn_slowpath_fmt) from [<c04b85d0>] (phy_probe+0x58/0x1e8)
[    5.712951] [<c04b85d0>] (phy_probe) from [<c04569f8>] (really_probe+0xfc/0x4e0)
[    5.720361] [<c04569f8>] (really_probe) from [<c0454c50>] (bus_for_each_drv+0x74/0x98)
[    5.728298] [<c0454c50>] (bus_for_each_drv) from [<c0456f90>] (__device_attach+0xcc/0x120)
[    5.736584] [<c0456f90>] (__device_attach) from [<c0455bd8>] (bus_probe_device+0x84/0x8c)
[    5.744782] [<c0455bd8>] (bus_probe_device) from [<c0452284>] (device_add+0x300/0x77c)
[    5.752724] [<c0452284>] (device_add) from [<c04b9c4c>] (mdio_device_register+0x24/0x48)
[    5.760836] [<c04b9c4c>] (mdio_device_register) from [<c04c15d4>] (of_mdiobus_register+0x1f8/0x330)
[    5.769904] [<c04c15d4>] (of_mdiobus_register) from [<c04c1c1c>] (mdio_mux_init+0x178/0x2c0)
[    5.778363] [<c04c1c1c>] (mdio_mux_init) from [<c04c1ef8>] (mdio_mux_mmioreg_probe+0x138/0x1fc)
[    5.787089] [<c04c1ef8>] (mdio_mux_mmioreg_probe) from [<c04587bc>] (platform_drv_probe+0x34/0x70)
[    5.796066] [<c04587bc>] (platform_drv_probe) from [<c04569f8>] (really_probe+0xfc/0x4e0)
[    5.804266] [<c04569f8>] (really_probe) from [<c04573dc>] (device_driver_attach+0xe4/0xf4)
[    5.812552] [<c04573dc>] (device_driver_attach) from [<c0457468>] (__driver_attach+0x7c/0x110)
[    5.821186] [<c0457468>] (__driver_attach) from [<c0454bb0>] (bus_for_each_dev+0x64/0x90)
[    5.829385] [<c0454bb0>] (bus_for_each_dev) from [<c0455dd0>] (bus_add_driver+0xf8/0x1e0)
[    5.837585] [<c0455dd0>] (bus_add_driver) from [<c0457a74>] (driver_register+0x88/0x118)
[    5.845697] [<c0457a74>] (driver_register) from [<c01017e4>] (do_one_initcall+0x54/0x1e8)
[    5.853907] [<c01017e4>] (do_one_initcall) from [<c0801118>] (kernel_init_freeable+0x23c/0x290)
[    5.862628] [<c0801118>] (kernel_init_freeable) from [<c065a550>] (kernel_init+0x8/0x118)
[    5.870826] [<c065a550>] (kernel_init) from [<c0100128>] (ret_from_fork+0x14/0x2c)
[    5.878413] Exception stack(0xc1035fb0 to 0xc1035ff8)
[    5.883470] 5fa0:                                     00000000 00000000 00000000 00000000
[    5.891662] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    5.899852] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[    5.906509] ---[ end trace 6a8fa3807352bffb ]---
[    5.911144] Broadcom B53 (2) 0.200:00: [phy_probe:2840] TAKING LOCK...
[   26.924625] rcu: INFO: rcu_sched self-detected stall on CPU
[   26.930213] rcu:     0-....: (2099 ticks this GP) idle=e3e/1/0x40000002 softirq=109/109 fqs=1050
[   26.938844]  (t=2100 jiffies g=-1111 q=287)
[   26.943031] NMI backtrace for cpu 0
[   26.946523] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W         5.10.64 #0
[   26.953934] Hardware name: BCM5301X
[   26.957437] [<c0108410>] (unwind_backtrace) from [<c0104bc4>] (show_stack+0x10/0x14)
[   26.965206] [<c0104bc4>] (show_stack) from [<c03dbfc8>] (dump_stack+0x94/0xa8)
[   26.972450] [<c03dbfc8>] (dump_stack) from [<c03e3890>] (nmi_cpu_backtrace+0xc8/0xf4)
[   26.980295] [<c03e3890>] (nmi_cpu_backtrace) from [<c03e39c0>] (nmi_trigger_cpumask_backtrace+0x104/0x13c)
[   26.989974] [<c03e39c0>] (nmi_trigger_cpumask_backtrace) from [<c01700a0>] (rcu_dump_cpu_stacks+0xe4/0x10c)
[   26.999743] [<c01700a0>] (rcu_dump_cpu_stacks) from [<c0175444>] (rcu_sched_clock_irq+0x6e4/0x8ac)
[   27.008731] [<c0175444>] (rcu_sched_clock_irq) from [<c017b47c>] (update_process_times+0x88/0xbc)
[   27.017632] [<c017b47c>] (update_process_times) from [<c018ac8c>] (tick_sched_timer+0x78/0x274)
[   27.026349] [<c018ac8c>] (tick_sched_timer) from [<c017b9e8>] (__hrtimer_run_queues+0x15c/0x218)
[   27.035157] [<c017b9e8>] (__hrtimer_run_queues) from [<c017c5c8>] (hrtimer_interrupt+0x11c/0x298)
[   27.044056] [<c017c5c8>] (hrtimer_interrupt) from [<c0107a30>] (twd_handler+0x34/0x3c)
[   27.051993] [<c0107a30>] (twd_handler) from [<c0167c68>] (handle_percpu_devid_irq+0x78/0x148)
[   27.060547] [<c0167c68>] (handle_percpu_devid_irq) from [<c0162470>] (__handle_domain_irq+0x84/0xd8)
[   27.069706] [<c0162470>] (__handle_domain_irq) from [<c03f48e8>] (gic_handle_irq+0x80/0x94)
[   27.078076] [<c03f48e8>] (gic_handle_irq) from [<c0100aec>] (__irq_svc+0x6c/0x90)
[   27.085575] Exception stack(0xc1035c28 to 0xc1035c70)
[   27.090634] 5c20:                   c116648c 00000000 0000c116 00006488 c1166488 ffffe000
[   27.098825] 5c40: 00000000 c1034000 00000002 c0982be8 c116648c 00000000 00000000 c1035c78
[   27.107022] 5c60: c065ce58 c065fe64 80000013 ffffffff
[   27.112086] [<c0100aec>] (__irq_svc) from [<c065fe64>] (_raw_spin_lock+0x2c/0x40)
[   27.119585] [<c065fe64>] (_raw_spin_lock) from [<c065ce58>] (__mutex_lock.constprop.0+0x1b8/0x520)
[   27.128571] [<c065ce58>] (__mutex_lock.constprop.0) from [<c04b85f4>] (phy_probe+0x7c/0x1e8)
[   27.137032] [<c04b85f4>] (phy_probe) from [<c04569f8>] (really_probe+0xfc/0x4e0)
[   27.144444] [<c04569f8>] (really_probe) from [<c0454c50>] (bus_for_each_drv+0x74/0x98)
[   27.152380] [<c0454c50>] (bus_for_each_drv) from [<c0456f90>] (__device_attach+0xcc/0x120)
[   27.160667] [<c0456f90>] (__device_attach) from [<c0455bd8>] (bus_probe_device+0x84/0x8c)
[   27.168865] [<c0455bd8>] (bus_probe_device) from [<c0452284>] (device_add+0x300/0x77c)
[   27.176797] [<c0452284>] (device_add) from [<c04b9c4c>] (mdio_device_register+0x24/0x48)
[   27.184911] [<c04b9c4c>] (mdio_device_register) from [<c04c15d4>] (of_mdiobus_register+0x1f8/0x330)
[   27.193977] [<c04c15d4>] (of_mdiobus_register) from [<c04c1c1c>] (mdio_mux_init+0x178/0x2c0)
[   27.202437] [<c04c1c1c>] (mdio_mux_init) from [<c04c1ef8>] (mdio_mux_mmioreg_probe+0x138/0x1fc)
[   27.211154] [<c04c1ef8>] (mdio_mux_mmioreg_probe) from [<c04587bc>] (platform_drv_probe+0x34/0x70)
[   27.220133] [<c04587bc>] (platform_drv_probe) from [<c04569f8>] (really_probe+0xfc/0x4e0)
[   27.228331] [<c04569f8>] (really_probe) from [<c04573dc>] (device_driver_attach+0xe4/0xf4)
[   27.236609] [<c04573dc>] (device_driver_attach) from [<c0457468>] (__driver_attach+0x7c/0x110)
[   27.245243] [<c0457468>] (__driver_attach) from [<c0454bb0>] (bus_for_each_dev+0x64/0x90)
[   27.253434] [<c0454bb0>] (bus_for_each_dev) from [<c0455dd0>] (bus_add_driver+0xf8/0x1e0)
[   27.261633] [<c0455dd0>] (bus_add_driver) from [<c0457a74>] (driver_register+0x88/0x118)
[   27.269746] [<c0457a74>] (driver_register) from [<c01017e4>] (do_one_initcall+0x54/0x1e8)
[   27.277949] [<c01017e4>] (do_one_initcall) from [<c0801118>] (kernel_init_freeable+0x23c/0x290)
[   27.286667] [<c0801118>] (kernel_init_freeable) from [<c065a550>] (kernel_init+0x8/0x118)
[   27.294865] [<c065a550>] (kernel_init) from [<c0100128>] (ret_from_fork+0x14/0x2c)
[   27.302452] Exception stack(0xc1035fb0 to 0xc1035ff8)
[   27.307511] 5fa0:                                     00000000 00000000 00000000 00000000
[   27.315702] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[   27.323891] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000


*** Device tree ***

Base: arch/arm/boot/dts/bcm5301x.dtsi

Relevant part:

mdio-bus-mux@18003000 {
	compatible = "mdio-mux-mmioreg";
	mdio-parent-bus = <&mdio>;
	#address-cells = <1>;
	#size-cells = <0>;
	reg = <0x18003000 0x4>;
	mux-mask = <0x200>;

	mdio@0 {
		reg = <0x0>;
		#address-cells = <1>;
		#size-cells = <0>;

		usb3_phy: usb3-phy@10 {
			compatible = "brcm,ns-ax-usb3-phy";
			reg = <0x10>;
			usb3-dmp-syscon = <&usb3_dmp>;
			#phy-cells = <0>;
			status = "disabled";
		};
	};

	mdio@200 {
		reg = <0x200>;
		#address-cells = <1>;
		#size-cells = <0>;

		switch@0  {
			compatible = "brcm,bcm53125";
			#address-cells = <1>;
			#size-cells = <0>;
			reset-gpios = <&chipcommon 10 GPIO_ACTIVE_LOW>;
			reset-names = "robo_reset";
			reg = <0>;
			dsa,member = <1 0>;
			pinctrl-names = "default";
			pinctrl-0 = <&pinmux_mdio>;

			ports {
				#address-cells = <1>;
				#size-cells = <0>;

				port@0 {
					reg = <0>;
					label = "lan1";
				};

				port@1 {
					reg = <1>;
					label = "lan5";
				};

				port@2 {
					reg = <2>;
					label = "lan2";
				};

				port@3 {
					reg = <3>;
					label = "lan6";
				};

				port@4 {
					reg = <4>;
					label = "lan3";
				};
			};
		};
	};
};


*** Used debugging diff ***

diff --git a/drivers/net/mdio/of_mdio.c b/drivers/net/mdio/of_mdio.c
index 4daf94bb5..dde775c92 100644
--- a/drivers/net/mdio/of_mdio.c
+++ b/drivers/net/mdio/of_mdio.c
@@ -251,6 +251,7 @@ int of_mdiobus_register(struct mii_bus *mdio, struct device_node *np)
         bool scanphys = false;
         int addr, rc;

+pr_info("[%s:%d] np:%pOF\n", __func__, __LINE__, np);
         if (!np)
                 return mdiobus_register(mdio);

@@ -278,6 +279,7 @@ int of_mdiobus_register(struct mii_bus *mdio, struct device_node *np)

         /* Loop over the child nodes and register a phy_device for each phy */
         for_each_available_child_of_node(np, child) {
+pr_info("[%s:%d] child:%pOF\n", __func__, __LINE__, child);
                 addr = of_mdio_parse_addr(&mdio->dev, child);
                 if (addr < 0) {
                         scanphys = true;
diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 950277e4d..a0a46af82 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -592,6 +592,7 @@ struct phy_device *phy_device_create(struct mii_bus *bus, int addr, u32 phy_id,

         dev->state = PHY_DOWN;

+dev_info(&mdiodev->dev, "[%s:%d] INIT MUTEX\n", __func__, __LINE__);
         mutex_init(&dev->lock);
         INIT_DELAYED_WORK(&dev->state_queue, phy_state_machine);

@@ -2835,7 +2836,10 @@ static int phy_probe(struct device *dev)
         if (phydrv->flags & PHY_IS_INTERNAL)
                 phydev->is_internal = true;

+WARN_ON(1);
+dev_info(dev, "[%s:%d] TAKING LOCK...\n", __func__, __LINE__);
         mutex_lock(&phydev->lock);
+dev_info(dev, "[%s:%d] LOCKED\n", __func__, __LINE__);

         /* Deassert the reset signal */
         phy_device_reset(phydev, 0);

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: Lockup in phy_probe() for MDIO device (Broadcom's switch)
  2021-09-30  9:58 Lockup in phy_probe() for MDIO device (Broadcom's switch) Rafał Miłecki
@ 2021-09-30 10:15 ` Rafał Miłecki
  2021-09-30 10:17 ` Russell King (Oracle)
  1 sibling, 0 replies; 17+ messages in thread
From: Rafał Miłecki @ 2021-09-30 10:15 UTC (permalink / raw)
  To: Andrew Lunn, Heiner Kallweit, Russell King, Network Development
  Cc: Florian Fainelli, BCM Kernel Feedback, Vivek Unune

On 30.09.2021 11:58, Rafał Miłecki wrote:
> I've just received a report of kernel lockup after switching OpenWrt
> platform from kernel 5.4 to kernel 5.10:
> https://bugs.openwrt.org/index.php?do=details&task_id=4055
> 
> The problem is phy_probe() and its:
> mutex_lock(&phydev->lock);
> 
> It seems to me that "lock" mutex doesn't get initalized. It seems
> phy_device_create() doesn't get called for an MDIO device.

dmesg without debugging diff but with CONFIG_DEBUG_MUTEXES=y

[    6.227101] libphy: Fixed MDIO Bus: probed
[    6.231599] libphy: iProc MDIO bus: probed
[    6.235746] iproc-mdio 18003000.mdio: Broadcom iProc MDIO bus registered
[    6.243141] libphy: mdio_mux: probed
[    6.247559] libphy: mdio_mux: probed
[    6.251517] ------------[ cut here ]------------
[    6.256192] WARNING: CPU: 1 PID: 1 at kernel/locking/mutex.c:951 __mutex_lock.constprop.0+0x744/0x848
[    6.265439] DEBUG_LOCKS_WARN_ON(lock->magic != lock)
[    6.265442] Modules linked in:
[    6.273470] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.10.64 #0
[    6.279488] Hardware name: BCM5301X
[    6.283001] [<c0108410>] (unwind_backtrace) from [<c0104bc4>] (show_stack+0x10/0x14)
[    6.290775] [<c0104bc4>] (show_stack) from [<c03dc6a8>] (dump_stack+0x94/0xa8)
[    6.298024] [<c03dc6a8>] (dump_stack) from [<c01183e8>] (__warn+0xb8/0x114)
[    6.305004] [<c01183e8>] (__warn) from [<c01184ac>] (warn_slowpath_fmt+0x68/0x78)
[    6.312509] [<c01184ac>] (warn_slowpath_fmt) from [<c065deec>] (__mutex_lock.constprop.0+0x744/0x848)
[    6.321759] [<c065deec>] (__mutex_lock.constprop.0) from [<c04b8d80>] (phy_probe+0x48/0x198)
[    6.330229] [<c04b8d80>] (phy_probe) from [<c0457120>] (really_probe+0xfc/0x4e0)
[    6.337646] [<c0457120>] (really_probe) from [<c0455378>] (bus_for_each_drv+0x74/0x98)
[    6.345583] [<c0455378>] (bus_for_each_drv) from [<c04576b8>] (__device_attach+0xcc/0x120)
[    6.353868] [<c04576b8>] (__device_attach) from [<c0456300>] (bus_probe_device+0x84/0x8c)
[    6.362067] [<c0456300>] (bus_probe_device) from [<c04529ac>] (device_add+0x300/0x77c)
[    6.370008] [<c04529ac>] (device_add) from [<c04ba38c>] (mdio_device_register+0x24/0x48)
[    6.378121] [<c04ba38c>] (mdio_device_register) from [<c04c1ab8>] (of_mdiobus_register+0x198/0x2fc)
[    6.387188] [<c04c1ab8>] (of_mdiobus_register) from [<c04c233c>] (mdio_mux_init+0x178/0x2c0)
[    6.395647] [<c04c233c>] (mdio_mux_init) from [<c04c2618>] (mdio_mux_mmioreg_probe+0x138/0x1fc)
[    6.404375] [<c04c2618>] (mdio_mux_mmioreg_probe) from [<c0458ee4>] (platform_drv_probe+0x34/0x70)
[    6.413352] [<c0458ee4>] (platform_drv_probe) from [<c0457120>] (really_probe+0xfc/0x4e0)
[    6.421550] [<c0457120>] (really_probe) from [<c0457b04>] (device_driver_attach+0xe4/0xf4)
[    6.429837] [<c0457b04>] (device_driver_attach) from [<c0457b90>] (__driver_attach+0x7c/0x110)
[    6.438471] [<c0457b90>] (__driver_attach) from [<c04552d8>] (bus_for_each_dev+0x64/0x90)
[    6.446669] [<c04552d8>] (bus_for_each_dev) from [<c04564f8>] (bus_add_driver+0xf8/0x1e0)
[    6.454861] [<c04564f8>] (bus_add_driver) from [<c045819c>] (driver_register+0x88/0x118)
[    6.462974] [<c045819c>] (driver_register) from [<c01017e4>] (do_one_initcall+0x54/0x1e8)
[    6.471180] [<c01017e4>] (do_one_initcall) from [<c0801118>] (kernel_init_freeable+0x23c/0x290)
[    6.479904] [<c0801118>] (kernel_init_freeable) from [<c065acf0>] (kernel_init+0x8/0x118)
[    6.488100] [<c065acf0>] (kernel_init) from [<c0100128>] (ret_from_fork+0x14/0x2c)
[    6.495680] Exception stack(0xc1035fb0 to 0xc1035ff8)
[    6.500739] 5fa0:                                     00000000 00000000 00000000 00000000
[    6.508938] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    6.517128] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[    6.523776] ---[ end trace 3cbdf5abbf86c3cc ]---
[    6.528422] 8<--- cut here ---
[    6.531479] Unable to handle kernel NULL pointer dereference at virtual address 00000180
[    6.539599] pgd = 1f10e6e2
[    6.542306] [00000180] *pgd=00000000
[    6.545896] Internal error: Oops: 805 [#1] SMP ARM
[    6.550698] Modules linked in:
[    6.553755] CPU: 1 PID: 1 Comm: swapper/0 Tainted: G        W         5.10.64 #0
[    6.561167] Hardware name: BCM5301X
[    6.564663] PC is at __mutex_add_waiter+0x34/0x60
[    6.569376] LR is at __mutex_add_waiter+0x24/0x60
[    6.574085] pc : [<c01595cc>]    lr : [<c01595bc>]    psr: 80000013
[    6.580360] sp : c1035c40  ip : 00000000  fp : c1038000
[    6.585591] r10: c090530c  r9 : c06f56f4  r8 : c1035c74
[    6.590823] r7 : c09b24ac  r6 : c13b9490  r5 : c13b949c  r4 : c1035c74
[    6.597360] r3 : 00000180  r2 : c1038000  r1 : c1035c74  r0 : c13b9490
[    6.603897] Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[    6.611043] Control: 10c5387d  Table: 0000404a  DAC: 00000051
[    6.616796] Process swapper/0 (pid: 1, stack limit = 0xde2e9975)
[    6.622811] Stack: (0xc1035c40 to 0xc1036000)
[    6.627174] 5c40: c13b9490 c13b948c 00000004 c065da94 c06f56e0 c0763cc8 c1034000 00000002
[    6.635372] 5c60: c0982e64 c13b9494 c13cc000 c02da7f4 00000000 c13b949c 00000180 11111111
[    6.643571] 5c80: 11111111 c1035c74 00000000 c13b9200 c0982e64 00000000 c13b9490 00000000
[    6.651770] 5ca0: c0982e64 00000000 c0751344 c04b8d80 c13b9200 c09c0880 00000000 c09c0870
[    6.659969] 5cc0: 00000000 c0457120 00000000 00000000 c1035d08 c04578ec c12cf578 00000000
[    6.668161] 5ce0: c097eb54 00000000 c0751344 c0455378 c119ab70 c127f5b8 c13b9200 00000001
[    6.676360] 5d00: c13b9244 c04576b8 c13b9200 00000001 c13b9200 c13b9200 c098284c c0456300
[    6.684559] 5d20: c13b9200 00000000 c09c0740 c04529ac 00000000 00000000 00000000 c03e23c8
[    6.692757] 5d40: c13b9200 0a3031d0 00000000 00000000 c13b9200 c6973ddc 00000000 00000000
[    6.700949] 5d60: c13b9200 c04ba38c c6973ef8 c12cf000 c6973ddc c04c1ab8 00000000 0000003d
[    6.709148] 5d80: ffffff0f 00000001 00000000 c12cf578 c0755054 c0753064 00000000 00000000
[    6.717347] 5da0: c6973974 c11f67c0 c11f6540 c6973ddc 00000000 c1153410 c6973974 c0751344
[    6.725545] 5dc0: c0755c60 c04c233c 00000200 c04de78c c0755c54 c11f64c0 c1035e18 00000200
[    6.733736] 5de0: 00000200 00000000 c11f64c0 c6973974 c0751344 c1153400 c1153410 00000000
[    6.741927] 5e00: c0839bc8 c04c2618 c11f64c0 c12cb000 00000000 00000004 18003000 18003003
[    6.750126] 5e20: c69739cc 00000200 00000000 00000000 00000000 00000000 c0984350 c1153410
[    6.758325] 5e40: c0984350 00000000 c09c0870 00000000 c0984350 c0458ee4 c1153410 c09c0880
[    6.766516] 5e60: 00000000 c0457120 00000000 c1153410 00000000 c1153454 c0984350 c0831854
[    6.774715] 5e80: c0831834 c08003e4 c0839bc8 c0457b04 00000000 c0984350 c1153410 c097ee50
[    6.782914] 5ea0: c0831854 c0457b90 00000000 c0984350 c0457b14 c04552d8 c100e35c c114c534
[    6.791113] 5ec0: c0984350 c11f6400 00000000 c04564f8 c0755df0 ffffe000 00000000 c0984350
[    6.799312] 5ee0: 00000000 ffffe000 00000000 c045819c c098e010 c081b88c ffffe000 c01017e4
[    6.807502] 5f00: c108cb00 c108cb15 c07ac224 00000000 0000005f c013593c 00000dc0 c07abacc
[    6.815692] 5f20: c0730bbc 00000006 00000006 c07002b0 c06f4888 c06f483c c108cb15 00000000
[    6.823884] 5f40: 00000000 00000007 c108cb00 0a3031d0 c098e020 00000007 c108cb00 c07abacc
[    6.832082] 5f60: c098e020 c0801118 00000006 00000006 00000000 c08003e4 00000000 0000005f
[    6.840273] 5f80: 00000000 00000000 c065ace8 00000000 00000000 00000000 00000000 00000000
[    6.848464] 5fa0: 00000000 c065acf0 00000000 c0100128 00000000 00000000 00000000 00000000
[    6.856662] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    6.864853] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[    6.873051] [<c01595cc>] (__mutex_add_waiter) from [<c065da94>] (__mutex_lock.constprop.0+0x2ec/0x848)
[    6.882378] [<c065da94>] (__mutex_lock.constprop.0) from [<c04b8d80>] (phy_probe+0x48/0x198)
[    6.890841] [<c04b8d80>] (phy_probe) from [<c0457120>] (really_probe+0xfc/0x4e0)
[    6.898252] [<c0457120>] (really_probe) from [<c0455378>] (bus_for_each_drv+0x74/0x98)
[    6.906189] [<c0455378>] (bus_for_each_drv) from [<c04576b8>] (__device_attach+0xcc/0x120)
[    6.914475] [<c04576b8>] (__device_attach) from [<c0456300>] (bus_probe_device+0x84/0x8c)
[    6.922675] [<c0456300>] (bus_probe_device) from [<c04529ac>] (device_add+0x300/0x77c)
[    6.930614] [<c04529ac>] (device_add) from [<c04ba38c>] (mdio_device_register+0x24/0x48)
[    6.938727] [<c04ba38c>] (mdio_device_register) from [<c04c1ab8>] (of_mdiobus_register+0x198/0x2fc)
[    6.947794] [<c04c1ab8>] (of_mdiobus_register) from [<c04c233c>] (mdio_mux_init+0x178/0x2c0)
[    6.956255] [<c04c233c>] (mdio_mux_init) from [<c04c2618>] (mdio_mux_mmioreg_probe+0x138/0x1fc)
[    6.964978] [<c04c2618>] (mdio_mux_mmioreg_probe) from [<c0458ee4>] (platform_drv_probe+0x34/0x70)
[    6.973958] [<c0458ee4>] (platform_drv_probe) from [<c0457120>] (really_probe+0xfc/0x4e0)
[    6.982157] [<c0457120>] (really_probe) from [<c0457b04>] (device_driver_attach+0xe4/0xf4)
[    6.990444] [<c0457b04>] (device_driver_attach) from [<c0457b90>] (__driver_attach+0x7c/0x110)
[    6.999079] [<c0457b90>] (__driver_attach) from [<c04552d8>] (bus_for_each_dev+0x64/0x90)
[    7.007276] [<c04552d8>] (bus_for_each_dev) from [<c04564f8>] (bus_add_driver+0xf8/0x1e0)
[    7.015467] [<c04564f8>] (bus_add_driver) from [<c045819c>] (driver_register+0x88/0x118)
[    7.023574] [<c045819c>] (driver_register) from [<c01017e4>] (do_one_initcall+0x54/0x1e8)
[    7.031775] [<c01017e4>] (do_one_initcall) from [<c0801118>] (kernel_init_freeable+0x23c/0x290)
[    7.040493] [<c0801118>] (kernel_init_freeable) from [<c065acf0>] (kernel_init+0x8/0x118)
[    7.048690] [<c065acf0>] (kernel_init) from [<c0100128>] (ret_from_fork+0x14/0x2c)
[    7.056269] Exception stack(0xc1035fb0 to 0xc1035ff8)
[    7.061327] 5fa0:                                     00000000 00000000 00000000 00000000
[    7.069519] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    7.077709] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[    7.084335] Code: e5953004 e5854004 e5845000 e5843004 (e5834000)
[    7.090456] ---[ end trace 3cbdf5abbf86c3cd ]---
[    7.095086] Kernel panic - not syncing: Fatal exception
[    7.100325] CPU0: stopping
[    7.103038] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G      D W         5.10.64 #0
[    7.110450] Hardware name: BCM5301X
[    7.113951] [<c0108410>] (unwind_backtrace) from [<c0104bc4>] (show_stack+0x10/0x14)
[    7.121716] [<c0104bc4>] (show_stack) from [<c03dc6a8>] (dump_stack+0x94/0xa8)
[    7.128953] [<c03dc6a8>] (dump_stack) from [<c0106c80>] (do_handle_IPI+0xf8/0x12c)
[    7.136542] [<c0106c80>] (do_handle_IPI) from [<c0106ccc>] (ipi_handler+0x18/0x20)
[    7.144140] [<c0106ccc>] (ipi_handler) from [<c0162ae4>] (__handle_domain_irq+0x84/0xd8)
[    7.152256] [<c0162ae4>] (__handle_domain_irq) from [<c03f4fc8>] (gic_handle_irq+0x80/0x94)
[    7.160624] [<c03f4fc8>] (gic_handle_irq) from [<c0100aec>] (__irq_svc+0x6c/0x90)
[    7.168124] Exception stack(0xc0901f48 to 0xc0901f90)
[    7.173183] 1f40:                   00083bd6 00000000 00083bd8 c010e540 c0900000 00000000
[    7.181374] 1f60: c0904f14 c0904f54 c0831a34 413fc090 10c5387d 00000000 00000000 c0901f98
[    7.189572] 1f80: c0102644 c0102648 60000013 ffffffff
[    7.194631] [<c0100aec>] (__irq_svc) from [<c0102648>] (arch_cpu_idle+0x38/0x3c)
[    7.202058] [<c0102648>] (arch_cpu_idle) from [<c0143d54>] (do_idle+0xc0/0x138)
[    7.209385] [<c0143d54>] (do_idle) from [<c0144048>] (cpu_startup_entry+0x18/0x1c)
[    7.216976] [<c0144048>] (cpu_startup_entry) from [<c0800e74>] (start_kernel+0x4f8/0x50c)
[    7.225174] [<c0800e74>] (start_kernel) from [<00000000>] (0x0)
[    7.231111] Rebooting in 1 seconds..

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Lockup in phy_probe() for MDIO device (Broadcom's switch)
  2021-09-30  9:58 Lockup in phy_probe() for MDIO device (Broadcom's switch) Rafał Miłecki
  2021-09-30 10:15 ` Rafał Miłecki
@ 2021-09-30 10:17 ` Russell King (Oracle)
  2021-09-30 10:30   ` Rafał Miłecki
  1 sibling, 1 reply; 17+ messages in thread
From: Russell King (Oracle) @ 2021-09-30 10:17 UTC (permalink / raw)
  To: Rafał Miłecki
  Cc: Andrew Lunn, Heiner Kallweit, Network Development,
	Florian Fainelli, BCM Kernel Feedback, Vivek Unune

On Thu, Sep 30, 2021 at 11:58:21AM +0200, Rafał Miłecki wrote:
> This isn't necessarily a PHY / MDIO regression. It could be some core
> change that exposed a PHY / MDIO bug.

I think what's going on is that the switch device is somehow being
probed by phylib. It looks to me like we don't check that the mdio
device being matched in phy_bus_match() is actually a PHY (by
checking whether mdiodev->flags & MDIO_DEVICE_FLAG_PHY is true
before proceeding with any matching.)

We do, however, check the driver side. This looks to me like a problem
especially when the mdio bus can contain a mixture of PHY devices and
non-PHY devices. However, I would expect this to also be blowing up in
the mainline kernel as well - but it doesn't seem to.

Maybe Andrew can provide a reason why this doesn't happen - maybe we've
just been lucky with out-of-bounds read accesses (to the non-existent
phy_device wrapped around the mdio_device?)

If my theory is correct, this patch should solve your issue:

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index ba5ad86ec826..dac017174ab1 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -462,7 +462,8 @@ static int phy_bus_match(struct device *dev, struct device_driver *drv)
 	const int num_ids = ARRAY_SIZE(phydev->c45_ids.device_ids);
 	int i;
 
-	if (!(phydrv->mdiodrv.flags & MDIO_DEVICE_IS_PHY))
+	if (!(phydrv->mdiodrv.flags & MDIO_DEVICE_IS_PHY) ||
+	    !(phydev->mdio.flags & MDIO_DEVICE_FLAG_PHY))
 		return 0;
 
 	if (phydrv->match_phy_device)

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: Lockup in phy_probe() for MDIO device (Broadcom's switch)
  2021-09-30 10:17 ` Russell King (Oracle)
@ 2021-09-30 10:30   ` Rafał Miłecki
  2021-09-30 10:40     ` Russell King (Oracle)
  2021-09-30 11:22     ` Rafał Miłecki
  0 siblings, 2 replies; 17+ messages in thread
From: Rafał Miłecki @ 2021-09-30 10:30 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Andrew Lunn, Heiner Kallweit, Network Development,
	Florian Fainelli, BCM Kernel Feedback, Vivek Unune

On 30.09.2021 12:17, Russell King (Oracle) wrote:
> On Thu, Sep 30, 2021 at 11:58:21AM +0200, Rafał Miłecki wrote:
>> This isn't necessarily a PHY / MDIO regression. It could be some core
>> change that exposed a PHY / MDIO bug.
> 
> I think what's going on is that the switch device is somehow being
> probed by phylib. It looks to me like we don't check that the mdio
> device being matched in phy_bus_match() is actually a PHY (by
> checking whether mdiodev->flags & MDIO_DEVICE_FLAG_PHY is true
> before proceeding with any matching.)
> 
> We do, however, check the driver side. This looks to me like a problem
> especially when the mdio bus can contain a mixture of PHY devices and
> non-PHY devices. However, I would expect this to also be blowing up in
> the mainline kernel as well - but it doesn't seem to.
> 
> Maybe Andrew can provide a reason why this doesn't happen - maybe we've
> just been lucky with out-of-bounds read accesses (to the non-existent
> phy_device wrapped around the mdio_device?)

I'll see if I can use buildroot to test unmodified kernel.


> If my theory is correct, this patch should solve your issue:
> 
> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> index ba5ad86ec826..dac017174ab1 100644
> --- a/drivers/net/phy/phy_device.c
> +++ b/drivers/net/phy/phy_device.c
> @@ -462,7 +462,8 @@ static int phy_bus_match(struct device *dev, struct device_driver *drv)
>   	const int num_ids = ARRAY_SIZE(phydev->c45_ids.device_ids);
>   	int i;
>   
> -	if (!(phydrv->mdiodrv.flags & MDIO_DEVICE_IS_PHY))
> +	if (!(phydrv->mdiodrv.flags & MDIO_DEVICE_IS_PHY) ||
> +	    !(phydev->mdio.flags & MDIO_DEVICE_FLAG_PHY))
>   		return 0;
>   
>   	if (phydrv->match_phy_device)
> 

Unfortunately this doesn't seem to help

[    6.219828] libphy: Fixed MDIO Bus: probed
[    6.224376] libphy: iProc MDIO bus: probed
[    6.228506] iproc-mdio 18003000.mdio: Broadcom iProc MDIO bus registered
[    6.235906] libphy: mdio_mux: probed
[    6.240298] libphy: mdio_mux: probed
[    6.244316] ------------[ cut here ]------------
[    6.248969] WARNING: CPU: 1 PID: 1 at kernel/locking/mutex.c:951 __mutex_lock.constprop.0+0x744/0x848
[    6.258223] DEBUG_LOCKS_WARN_ON(lock->magic != lock)
[    6.258226] Modules linked in:
[    6.266265] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.10.64 #0
[    6.272278] Hardware name: BCM5301X
[    6.275791] [<c0108410>] (unwind_backtrace) from [<c0104bc4>] (show_stack+0x10/0x14)
[    6.283564] [<c0104bc4>] (show_stack) from [<c03dc6a8>] (dump_stack+0x94/0xa8)
[    6.290812] [<c03dc6a8>] (dump_stack) from [<c01183e8>] (__warn+0xb8/0x114)
[    6.297794] [<c01183e8>] (__warn) from [<c01184ac>] (warn_slowpath_fmt+0x68/0x78)
[    6.305298] [<c01184ac>] (warn_slowpath_fmt) from [<c065defc>] (__mutex_lock.constprop.0+0x744/0x848)
[    6.314549] [<c065defc>] (__mutex_lock.constprop.0) from [<c04b8d8c>] (phy_probe+0x48/0x198)
[    6.323017] [<c04b8d8c>] (phy_probe) from [<c0457120>] (really_probe+0xfc/0x4e0)
[    6.330434] [<c0457120>] (really_probe) from [<c0455378>] (bus_for_each_drv+0x74/0x98)
[    6.338372] [<c0455378>] (bus_for_each_drv) from [<c04576b8>] (__device_attach+0xcc/0x120)
[    6.346657] [<c04576b8>] (__device_attach) from [<c0456300>] (bus_probe_device+0x84/0x8c)
[    6.354856] [<c0456300>] (bus_probe_device) from [<c04529ac>] (device_add+0x300/0x77c)
[    6.362797] [<c04529ac>] (device_add) from [<c04ba398>] (mdio_device_register+0x24/0x48)
[    6.370911] [<c04ba398>] (mdio_device_register) from [<c04c1ac4>] (of_mdiobus_register+0x198/0x2fc)
[    6.379978] [<c04c1ac4>] (of_mdiobus_register) from [<c04c2348>] (mdio_mux_init+0x178/0x2c0)
[    6.388436] [<c04c2348>] (mdio_mux_init) from [<c04c2624>] (mdio_mux_mmioreg_probe+0x138/0x1fc)
[    6.397163] [<c04c2624>] (mdio_mux_mmioreg_probe) from [<c0458ee4>] (platform_drv_probe+0x34/0x70)
[    6.406142] [<c0458ee4>] (platform_drv_probe) from [<c0457120>] (really_probe+0xfc/0x4e0)
[    6.414339] [<c0457120>] (really_probe) from [<c0457b04>] (device_driver_attach+0xe4/0xf4)
[    6.422626] [<c0457b04>] (device_driver_attach) from [<c0457b90>] (__driver_attach+0x7c/0x110)
[    6.431260] [<c0457b90>] (__driver_attach) from [<c04552d8>] (bus_for_each_dev+0x64/0x90)
[    6.439459] [<c04552d8>] (bus_for_each_dev) from [<c04564f8>] (bus_add_driver+0xf8/0x1e0)
[    6.447658] [<c04564f8>] (bus_add_driver) from [<c045819c>] (driver_register+0x88/0x118)
[    6.455772] [<c045819c>] (driver_register) from [<c01017e4>] (do_one_initcall+0x54/0x1e8)
[    6.463978] [<c01017e4>] (do_one_initcall) from [<c0801118>] (kernel_init_freeable+0x23c/0x290)
[    6.472701] [<c0801118>] (kernel_init_freeable) from [<c065ad00>] (kernel_init+0x8/0x118)
[    6.480899] [<c065ad00>] (kernel_init) from [<c0100128>] (ret_from_fork+0x14/0x2c)
[    6.488486] Exception stack(0xc1035fb0 to 0xc1035ff8)
[    6.493545] 5fa0:                                     00000000 00000000 00000000 00000000
[    6.501736] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    6.509934] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[    6.516581] ---[ end trace b8ef68dd409e132c ]---
[    6.521227] 8<--- cut here ---
[    6.524303] Unable to handle kernel NULL pointer dereference at virtual address 00000180
[    6.532409] pgd = 4c4edbcc
[    6.535130] [00000180] *pgd=00000000
[    6.538720] Internal error: Oops: 805 [#1] SMP ARM
[    6.543521] Modules linked in:
[    6.546579] CPU: 1 PID: 1 Comm: swapper/0 Tainted: G        W         5.10.64 #0
[    6.553992] Hardware name: BCM5301X
[    6.557486] PC is at __mutex_add_waiter+0x34/0x60
[    6.562201] LR is at __mutex_add_waiter+0x24/0x60
[    6.566909] pc : [<c01595cc>]    lr : [<c01595bc>]    psr: 80000013
[    6.573184] sp : c1035c40  ip : c134e040  fp : c1038000
[    6.578416] r10: c090530c  r9 : c06f56f4  r8 : c1035c74
[    6.583646] r7 : c09b24ac  r6 : c123b290  r5 : c123b29c  r4 : c1035c74
[    6.590184] r3 : 00000180  r2 : c1038000  r1 : c1035c74  r0 : c123b290
[    6.596720] Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[    6.603867] Control: 10c5387d  Table: 0000404a  DAC: 00000051
[    6.609621] Process swapper/0 (pid: 1, stack limit = 0xf3633140)
[    6.615634] Stack: (0xc1035c40 to 0xc1036000)
[    6.619999] 5c40: c123b290 c123b28c 00000004 c065daa4 c06f56e0 c0763cc8 c1034000 00000002
[    6.628197] 5c60: c0982e64 c123b294 c138d000 c02da7f4 00000000 c123b29c 00000180 11111111
[    6.636396] 5c80: 11111111 c1035c74 00000000 c123b000 c0982e64 00000000 c123b290 00000000
[    6.644595] 5ca0: c0982e64 00000000 c0751344 c04b8d8c c123b000 c09c0880 00000000 c09c0870
[    6.652786] 5cc0: 00000000 c0457120 00000000 00000000 c1035d08 c04578ec c121f578 00000000
[    6.660985] 5ce0: c097eb54 00000000 c0751344 c0455378 c119ab70 c12885b8 c123b000 00000001
[    6.669183] 5d00: c123b044 c04576b8 c123b000 00000001 c123b000 c123b000 c098284c c0456300
[    6.677374] 5d20: c123b000 00000000 c09c0740 c04529ac 00000000 00000000 00000000 c03e23c8
[    6.685573] 5d40: c123b000 0a3031d0 00000000 00000000 c123b000 c6973ddc 00000000 00000000
[    6.693773] 5d60: c123b000 c04ba398 c6973ef8 c121f000 c6973ddc c04c1ac4 00000000 0000003d
[    6.701972] 5d80: ffffff0f 00000001 00000000 c121f578 c0755054 c0753064 00000000 00000000
[    6.710171] 5da0: c6973974 c129c7c0 c129c540 c6973ddc 00000000 c1153410 c6973974 c0751344
[    6.718370] 5dc0: c0755c60 c04c2348 00000200 c04de798 c0755c54 c129c4c0 c1035e18 00000200
[    6.726570] 5de0: 00000200 00000000 c129c4c0 c6973974 c0751344 c1153400 c1153410 00000000
[    6.734769] 5e00: c0839bc8 c04c2624 c129c4c0 c121b000 00000000 00000004 18003000 18003003
[    6.742968] 5e20: c69739cc 00000200 00000000 00000000 00000000 00000000 c0984350 c1153410
[    6.751167] 5e40: c0984350 00000000 c09c0870 00000000 c0984350 c0458ee4 c1153410 c09c0880
[    6.759366] 5e60: 00000000 c0457120 00000000 c1153410 00000000 c1153454 c0984350 c0831854
[    6.767564] 5e80: c0831834 c08003e4 c0839bc8 c0457b04 00000000 c0984350 c1153410 c097ee50
[    6.775755] 5ea0: c0831854 c0457b90 00000000 c0984350 c0457b14 c04552d8 c100e35c c114c534
[    6.783945] 5ec0: c0984350 c129c400 00000000 c04564f8 c0755df0 ffffe000 00000000 c0984350
[    6.792137] 5ee0: 00000000 ffffe000 00000000 c045819c c098e010 c081b88c ffffe000 c01017e4
[    6.800335] 5f00: c108cb00 c108cb15 c07ac224 00000000 0000005f c013593c 00000dc0 c07abacc
[    6.808535] 5f20: c0730bbc 00000006 00000006 c07002b0 c06f4888 c06f483c c108cb15 00000000
[    6.816733] 5f40: 00000000 00000007 c108cb00 0a3031d0 c098e020 00000007 c108cb00 c07abacc
[    6.824924] 5f60: c098e020 c0801118 00000006 00000006 00000000 c08003e4 00000000 0000005f
[    6.833114] 5f80: 00000000 00000000 c065acf8 00000000 00000000 00000000 00000000 00000000
[    6.841305] 5fa0: 00000000 c065ad00 00000000 c0100128 00000000 00000000 00000000 00000000
[    6.849496] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    6.857694] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[    6.865891] [<c01595cc>] (__mutex_add_waiter) from [<c065daa4>] (__mutex_lock.constprop.0+0x2ec/0x848)
[    6.875220] [<c065daa4>] (__mutex_lock.constprop.0) from [<c04b8d8c>] (phy_probe+0x48/0x198)
[    6.883682] [<c04b8d8c>] (phy_probe) from [<c0457120>] (really_probe+0xfc/0x4e0)
[    6.891093] [<c0457120>] (really_probe) from [<c0455378>] (bus_for_each_drv+0x74/0x98)
[    6.899031] [<c0455378>] (bus_for_each_drv) from [<c04576b8>] (__device_attach+0xcc/0x120)
[    6.907317] [<c04576b8>] (__device_attach) from [<c0456300>] (bus_probe_device+0x84/0x8c)
[    6.915516] [<c0456300>] (bus_probe_device) from [<c04529ac>] (device_add+0x300/0x77c)
[    6.923456] [<c04529ac>] (device_add) from [<c04ba398>] (mdio_device_register+0x24/0x48)
[    6.931568] [<c04ba398>] (mdio_device_register) from [<c04c1ac4>] (of_mdiobus_register+0x198/0x2fc)
[    6.940636] [<c04c1ac4>] (of_mdiobus_register) from [<c04c2348>] (mdio_mux_init+0x178/0x2c0)
[    6.949096] [<c04c2348>] (mdio_mux_init) from [<c04c2624>] (mdio_mux_mmioreg_probe+0x138/0x1fc)
[    6.957820] [<c04c2624>] (mdio_mux_mmioreg_probe) from [<c0458ee4>] (platform_drv_probe+0x34/0x70)
[    6.966800] [<c0458ee4>] (platform_drv_probe) from [<c0457120>] (really_probe+0xfc/0x4e0)
[    6.974999] [<c0457120>] (really_probe) from [<c0457b04>] (device_driver_attach+0xe4/0xf4)
[    6.983285] [<c0457b04>] (device_driver_attach) from [<c0457b90>] (__driver_attach+0x7c/0x110)
[    6.991919] [<c0457b90>] (__driver_attach) from [<c04552d8>] (bus_for_each_dev+0x64/0x90)
[    7.000118] [<c04552d8>] (bus_for_each_dev) from [<c04564f8>] (bus_add_driver+0xf8/0x1e0)
[    7.008310] [<c04564f8>] (bus_add_driver) from [<c045819c>] (driver_register+0x88/0x118)
[    7.016423] [<c045819c>] (driver_register) from [<c01017e4>] (do_one_initcall+0x54/0x1e8)
[    7.024624] [<c01017e4>] (do_one_initcall) from [<c0801118>] (kernel_init_freeable+0x23c/0x290)
[    7.033343] [<c0801118>] (kernel_init_freeable) from [<c065ad00>] (kernel_init+0x8/0x118)
[    7.041541] [<c065ad00>] (kernel_init) from [<c0100128>] (ret_from_fork+0x14/0x2c)
[    7.049128] Exception stack(0xc1035fb0 to 0xc1035ff8)
[    7.054187] 5fa0:                                     00000000 00000000 00000000 00000000
[    7.062377] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    7.070567] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[    7.077192] Code: e5953004 e5854004 e5845000 e5843004 (e5834000)
[    7.083321] ---[ end trace b8ef68dd409e132d ]---
[    7.087956] Kernel panic - not syncing: Fatal exception
[    7.093195] CPU0: stopping
[    7.095907] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G      D W         5.10.64 #0
[    7.103318] Hardware name: BCM5301X
[    7.106820] [<c0108410>] (unwind_backtrace) from [<c0104bc4>] (show_stack+0x10/0x14)
[    7.114592] [<c0104bc4>] (show_stack) from [<c03dc6a8>] (dump_stack+0x94/0xa8)
[    7.121829] [<c03dc6a8>] (dump_stack) from [<c0106c80>] (do_handle_IPI+0xf8/0x12c)
[    7.129418] [<c0106c80>] (do_handle_IPI) from [<c0106ccc>] (ipi_handler+0x18/0x20)
[    7.137017] [<c0106ccc>] (ipi_handler) from [<c0162ae4>] (__handle_domain_irq+0x84/0xd8)
[    7.145132] [<c0162ae4>] (__handle_domain_irq) from [<c03f4fc8>] (gic_handle_irq+0x80/0x94)
[    7.153502] [<c03f4fc8>] (gic_handle_irq) from [<c0100aec>] (__irq_svc+0x6c/0x90)
[    7.161000] Exception stack(0xc0901f48 to 0xc0901f90)
[    7.166060] 1f40:                   000841a2 00000000 000841a4 c010e540 c0900000 00000000
[    7.174251] 1f60: c0904f14 c0904f54 c0831a34 413fc090 10c5387d 00000000 00000000 c0901f98
[    7.182448] 1f80: c0102644 c0102648 60000013 ffffffff
[    7.187508] [<c0100aec>] (__irq_svc) from [<c0102648>] (arch_cpu_idle+0x38/0x3c)
[    7.194934] [<c0102648>] (arch_cpu_idle) from [<c0143d54>] (do_idle+0xc0/0x138)
[    7.202262] [<c0143d54>] (do_idle) from [<c0144048>] (cpu_startup_entry+0x18/0x1c)
[    7.209853] [<c0144048>] (cpu_startup_entry) from [<c0800e74>] (start_kernel+0x4f8/0x50c)
[    7.218050] [<c0800e74>] (start_kernel) from [<00000000>] (0x0)
[    7.223986] Rebooting in 1 seconds..

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Lockup in phy_probe() for MDIO device (Broadcom's switch)
  2021-09-30 10:30   ` Rafał Miłecki
@ 2021-09-30 10:40     ` Russell King (Oracle)
  2021-09-30 11:29       ` Rafał Miłecki
  2021-09-30 11:22     ` Rafał Miłecki
  1 sibling, 1 reply; 17+ messages in thread
From: Russell King (Oracle) @ 2021-09-30 10:40 UTC (permalink / raw)
  To: Rafał Miłecki
  Cc: Andrew Lunn, Heiner Kallweit, Network Development,
	Florian Fainelli, BCM Kernel Feedback, Vivek Unune

On Thu, Sep 30, 2021 at 12:30:52PM +0200, Rafał Miłecki wrote:
> On 30.09.2021 12:17, Russell King (Oracle) wrote:
> > On Thu, Sep 30, 2021 at 11:58:21AM +0200, Rafał Miłecki wrote:
> > > This isn't necessarily a PHY / MDIO regression. It could be some core
> > > change that exposed a PHY / MDIO bug.
> > 
> > I think what's going on is that the switch device is somehow being
> > probed by phylib. It looks to me like we don't check that the mdio
> > device being matched in phy_bus_match() is actually a PHY (by
> > checking whether mdiodev->flags & MDIO_DEVICE_FLAG_PHY is true
> > before proceeding with any matching.)
> > 
> > We do, however, check the driver side. This looks to me like a problem
> > especially when the mdio bus can contain a mixture of PHY devices and
> > non-PHY devices. However, I would expect this to also be blowing up in
> > the mainline kernel as well - but it doesn't seem to.
> > 
> > Maybe Andrew can provide a reason why this doesn't happen - maybe we've
> > just been lucky with out-of-bounds read accesses (to the non-existent
> > phy_device wrapped around the mdio_device?)
> 
> I'll see if I can use buildroot to test unmodified kernel.
> 
> 
> > If my theory is correct, this patch should solve your issue:
> > 
> > diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> > index ba5ad86ec826..dac017174ab1 100644
> > --- a/drivers/net/phy/phy_device.c
> > +++ b/drivers/net/phy/phy_device.c
> > @@ -462,7 +462,8 @@ static int phy_bus_match(struct device *dev, struct device_driver *drv)
> >   	const int num_ids = ARRAY_SIZE(phydev->c45_ids.device_ids);
> >   	int i;
> > -	if (!(phydrv->mdiodrv.flags & MDIO_DEVICE_IS_PHY))
> > +	if (!(phydrv->mdiodrv.flags & MDIO_DEVICE_IS_PHY) ||
> > +	    !(phydev->mdio.flags & MDIO_DEVICE_FLAG_PHY))
> >   		return 0;
> >   	if (phydrv->match_phy_device)
> > 
> 
> Unfortunately this doesn't seem to help

Hmm.

In phy_probe, can you add:

	WARN_ON(!(phydev->mdio.flags & MDIO_DEVICE_FLAG_PHY));

just to make sure we have a real PHY device there please? Maybe also
print the value of the flags argument.

MDIO_DEVICE_FLAG_PHY is set by phy_create_device() before the mutex is
initialised, so if it is set, the lock should be initialised.

Maybe also print mdiodev->flags in mdio_device_register() as well, so
we can see what is being registered and the flags being used for that
device.

Could it be that openwrt is carrying a patch that is causing this
issue?

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Lockup in phy_probe() for MDIO device (Broadcom's switch)
  2021-09-30 10:30   ` Rafał Miłecki
  2021-09-30 10:40     ` Russell King (Oracle)
@ 2021-09-30 11:22     ` Rafał Miłecki
  1 sibling, 0 replies; 17+ messages in thread
From: Rafał Miłecki @ 2021-09-30 11:22 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Andrew Lunn, Heiner Kallweit, Network Development,
	Florian Fainelli, BCM Kernel Feedback, Vivek Unune

On 30.09.2021 12:30, Rafał Miłecki wrote:
> On 30.09.2021 12:17, Russell King (Oracle) wrote:
>> On Thu, Sep 30, 2021 at 11:58:21AM +0200, Rafał Miłecki wrote:
>>> This isn't necessarily a PHY / MDIO regression. It could be some core
>>> change that exposed a PHY / MDIO bug.
>>
>> I think what's going on is that the switch device is somehow being
>> probed by phylib. It looks to me like we don't check that the mdio
>> device being matched in phy_bus_match() is actually a PHY (by
>> checking whether mdiodev->flags & MDIO_DEVICE_FLAG_PHY is true
>> before proceeding with any matching.)
>>
>> We do, however, check the driver side. This looks to me like a problem
>> especially when the mdio bus can contain a mixture of PHY devices and
>> non-PHY devices. However, I would expect this to also be blowing up in
>> the mainline kernel as well - but it doesn't seem to.
>>
>> Maybe Andrew can provide a reason why this doesn't happen - maybe we've
>> just been lucky with out-of-bounds read accesses (to the non-existent
>> phy_device wrapped around the mdio_device?)
> 
> I'll see if I can use buildroot to test unmodified kernel.

I've used buildroot to use unmodified 5.10.57 kernel.


Let me start with explaining that there are 2 b53 drivers.

1. OpenWrt downstream swconfig-based b53 driver
Its b53_mdio.c registers as PHY driver by calling phy_driver_register()

2. Upstream DSA-based b53 driver
Its b53_mdio.c registers as MDIO driver by using mdio_module_driver()


With buildroot + kernel 5.10.57 + upstream DSA-based b53 driver I can't
see phy_probe() called for the /mdio-mux@18003000/mdio@200/switch@0 .
I'm not sure why as I have CONFIG_B53_MDIO_DRIVER=y . Maybe it's some
PHY device vs. MDIO device thing?

I'll proceed with Russell's request for checking MDIO_DEVICE_FLAG_PHY
now.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Lockup in phy_probe() for MDIO device (Broadcom's switch)
  2021-09-30 10:40     ` Russell King (Oracle)
@ 2021-09-30 11:29       ` Rafał Miłecki
  2021-09-30 11:44         ` Russell King (Oracle)
  0 siblings, 1 reply; 17+ messages in thread
From: Rafał Miłecki @ 2021-09-30 11:29 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Andrew Lunn, Heiner Kallweit, Network Development,
	Florian Fainelli, BCM Kernel Feedback, Vivek Unune

On 30.09.2021 12:40, Russell King (Oracle) wrote:
> In phy_probe, can you add:
> 
> 	WARN_ON(!(phydev->mdio.flags & MDIO_DEVICE_FLAG_PHY));
> 
> just to make sure we have a real PHY device there please? Maybe also
> print the value of the flags argument.
> 
> MDIO_DEVICE_FLAG_PHY is set by phy_create_device() before the mutex is
> initialised, so if it is set, the lock should be initialised.
> 
> Maybe also print mdiodev->flags in mdio_device_register() as well, so
> we can see what is being registered and the flags being used for that
> device.
> 
> Could it be that openwrt is carrying a patch that is causing this
> issue?

I don't think there is any OpenWrt patch affecting that.

MDIO_DEVICE_FLAG_PHY seems to be missing.

[    5.593833] libphy: Fixed MDIO Bus: probed
[    5.598383] libphy: iProc MDIO bus: probed
[    5.602510] iproc-mdio 18003000.mdio: Broadcom iProc MDIO bus registered
[    5.609918] libphy: mdio_mux: probed
[    5.613533] mdio_bus 0.0:10: flags: 0x00000000
[    5.618816] libphy: mdio_mux: probed
[    5.622440] mdio_bus 0.200:00: flags: 0x00000000
[    5.627479] Broadcom B53 (2) 0.200:00: flags: 0x00000000
[    5.632811] ------------[ cut here ]------------
[    5.637475] WARNING: CPU: 0 PID: 1 at drivers/net/phy/phy_device.c:2828 phy_probe+0x40/0x1d4
[    5.645946] Modules linked in:
[    5.649011] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.10.64 #0
[    5.655030] Hardware name: BCM5301X
[    5.658543] [<c0108410>] (unwind_backtrace) from [<c0104bc4>] (show_stack+0x10/0x14)
[    5.666316] [<c0104bc4>] (show_stack) from [<c03dc6a8>] (dump_stack+0x94/0xa8)
[    5.673565] [<c03dc6a8>] (dump_stack) from [<c01183e8>] (__warn+0xb8/0x114)
[    5.680546] [<c01183e8>] (__warn) from [<c01184ac>] (warn_slowpath_fmt+0x68/0x78)
[    5.688050] [<c01184ac>] (warn_slowpath_fmt) from [<c04b8d78>] (phy_probe+0x40/0x1d4)
[    5.695909] [<c04b8d78>] (phy_probe) from [<c0457120>] (really_probe+0xfc/0x4e0)
[    5.703325] [<c0457120>] (really_probe) from [<c0455378>] (bus_for_each_drv+0x74/0x98)
[    5.711262] [<c0455378>] (bus_for_each_drv) from [<c04576b8>] (__device_attach+0xcc/0x120)
[    5.719548] [<c04576b8>] (__device_attach) from [<c0456300>] (bus_probe_device+0x84/0x8c)
[    5.727747] [<c0456300>] (bus_probe_device) from [<c04529ac>] (device_add+0x300/0x77c)
[    5.735687] [<c04529ac>] (device_add) from [<c04ba3dc>] (mdio_device_register+0x38/0x5c)
[    5.743801] [<c04ba3dc>] (mdio_device_register) from [<c04c1b08>] (of_mdiobus_register+0x198/0x2fc)
[    5.752868] [<c04c1b08>] (of_mdiobus_register) from [<c04c238c>] (mdio_mux_init+0x178/0x2c0)
[    5.761328] [<c04c238c>] (mdio_mux_init) from [<c04c2668>] (mdio_mux_mmioreg_probe+0x138/0x1fc)
[    5.770054] [<c04c2668>] (mdio_mux_mmioreg_probe) from [<c0458ee4>] (platform_drv_probe+0x34/0x70)
[    5.779032] [<c0458ee4>] (platform_drv_probe) from [<c0457120>] (really_probe+0xfc/0x4e0)
[    5.787230] [<c0457120>] (really_probe) from [<c0457b04>] (device_driver_attach+0xe4/0xf4)
[    5.795516] [<c0457b04>] (device_driver_attach) from [<c0457b90>] (__driver_attach+0x7c/0x110)
[    5.804149] [<c0457b90>] (__driver_attach) from [<c04552d8>] (bus_for_each_dev+0x64/0x90)
[    5.812340] [<c04552d8>] (bus_for_each_dev) from [<c04564f8>] (bus_add_driver+0xf8/0x1e0)
[    5.820540] [<c04564f8>] (bus_add_driver) from [<c045819c>] (driver_register+0x88/0x118)
[    5.828652] [<c045819c>] (driver_register) from [<c01017e4>] (do_one_initcall+0x54/0x1e8)
[    5.836863] [<c01017e4>] (do_one_initcall) from [<c0801118>] (kernel_init_freeable+0x23c/0x290)
[    5.845583] [<c0801118>] (kernel_init_freeable) from [<c065ad40>] (kernel_init+0x8/0x118)
[    5.853781] [<c065ad40>] (kernel_init) from [<c0100128>] (ret_from_fork+0x14/0x2c)
[    5.861367] Exception stack(0xc1035fb0 to 0xc1035ff8)
[    5.866425] 5fa0:                                     00000000 00000000 00000000 00000000
[    5.874618] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    5.882816] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[    5.889473] ---[ end trace f28513567c7daf95 ]---
[    5.894105] ------------[ cut here ]------------
[    5.898743] WARNING: CPU: 0 PID: 1 at kernel/locking/mutex.c:951 __mutex_lock.constprop.0+0x744/0x848
[    5.907983] DEBUG_LOCKS_WARN_ON(lock->magic != lock)


diff --git a/drivers/net/phy/mdio_device.c b/drivers/net/phy/mdio_device.c
index 0837319a5..910149c8d 100644
--- a/drivers/net/phy/mdio_device.c
+++ b/drivers/net/phy/mdio_device.c
@@ -83,6 +83,8 @@ int mdio_device_register(struct mdio_device *mdiodev)
         if (err)
                 return err;

+       dev_info(&mdiodev->dev, "flags: 0x%08x\n", mdiodev->flags);
+
         err = device_add(&mdiodev->dev);
         if (err) {
                 pr_err("MDIO %d failed to add\n", mdiodev->addr);
diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 950277e4d..02c06e6d5 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -2824,6 +2824,9 @@ static int phy_probe(struct device *dev)
         struct phy_driver *phydrv = to_phy_driver(drv);
         int err = 0;

+       dev_info(dev, "flags: 0x%08x\n", phydev->mdio.flags);
+       WARN_ON(!(phydev->mdio.flags & MDIO_DEVICE_FLAG_PHY));
+
         phydev->drv = phydrv;

         /* Disable the interrupt if the PHY doesn't support it

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: Lockup in phy_probe() for MDIO device (Broadcom's switch)
  2021-09-30 11:29       ` Rafał Miłecki
@ 2021-09-30 11:44         ` Russell King (Oracle)
  2021-09-30 12:14           ` Rafał Miłecki
  0 siblings, 1 reply; 17+ messages in thread
From: Russell King (Oracle) @ 2021-09-30 11:44 UTC (permalink / raw)
  To: Rafał Miłecki
  Cc: Andrew Lunn, Heiner Kallweit, Network Development,
	Florian Fainelli, BCM Kernel Feedback, Vivek Unune

On Thu, Sep 30, 2021 at 01:29:33PM +0200, Rafał Miłecki wrote:
> On 30.09.2021 12:40, Russell King (Oracle) wrote:
> > In phy_probe, can you add:
> > 
> > 	WARN_ON(!(phydev->mdio.flags & MDIO_DEVICE_FLAG_PHY));
> > 
> > just to make sure we have a real PHY device there please? Maybe also
> > print the value of the flags argument.
> > 
> > MDIO_DEVICE_FLAG_PHY is set by phy_create_device() before the mutex is
> > initialised, so if it is set, the lock should be initialised.
> > 
> > Maybe also print mdiodev->flags in mdio_device_register() as well, so
> > we can see what is being registered and the flags being used for that
> > device.
> > 
> > Could it be that openwrt is carrying a patch that is causing this
> > issue?
> 
> I don't think there is any OpenWrt patch affecting that.
> 
> MDIO_DEVICE_FLAG_PHY seems to be missing.

Right, so the mdio device being registered is a non-PHY MDIO device.
It doesn't have a struct phy_device around it - and so any access
outside of the mdio_device is an out-of-bounds access.

Consequently, phylib should not be matching this device. The only
remaining way I can see that this could happen is if a PHY driver has
an OF compatible, which phylib drivers should never have.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Lockup in phy_probe() for MDIO device (Broadcom's switch)
  2021-09-30 11:44         ` Russell King (Oracle)
@ 2021-09-30 12:14           ` Rafał Miłecki
  2021-09-30 12:30             ` Russell King (Oracle)
  0 siblings, 1 reply; 17+ messages in thread
From: Rafał Miłecki @ 2021-09-30 12:14 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Andrew Lunn, Heiner Kallweit, Network Development,
	Florian Fainelli, BCM Kernel Feedback, Vivek Unune

On 30.09.2021 13:44, Russell King (Oracle) wrote:
> On Thu, Sep 30, 2021 at 01:29:33PM +0200, Rafał Miłecki wrote:
>> On 30.09.2021 12:40, Russell King (Oracle) wrote:
>>> In phy_probe, can you add:
>>>
>>> 	WARN_ON(!(phydev->mdio.flags & MDIO_DEVICE_FLAG_PHY));
>>>
>>> just to make sure we have a real PHY device there please? Maybe also
>>> print the value of the flags argument.
>>>
>>> MDIO_DEVICE_FLAG_PHY is set by phy_create_device() before the mutex is
>>> initialised, so if it is set, the lock should be initialised.
>>>
>>> Maybe also print mdiodev->flags in mdio_device_register() as well, so
>>> we can see what is being registered and the flags being used for that
>>> device.
>>>
>>> Could it be that openwrt is carrying a patch that is causing this
>>> issue?
>>
>> I don't think there is any OpenWrt patch affecting that.
>>
>> MDIO_DEVICE_FLAG_PHY seems to be missing.
> 
> Right, so the mdio device being registered is a non-PHY MDIO device.
> It doesn't have a struct phy_device around it - and so any access
> outside of the mdio_device is an out-of-bounds access.

I can confirm that.

of_mdiobus_register() iterates over node children. It calls
of_mdiobus_child_is_phy() for the /mdio-mux@18003000/mdio@200/switch@0
and that returns 0. It results in calling of_mdiobus_register_device().

So we have MDIO device as expected. It's not a PHY device.


> Consequently, phylib should not be matching this device. The only
> remaining way I can see that this could happen is if a PHY driver has
> an OF compatible, which phylib drivers should never have.

It's actually OpenWrt's downstream swconfig-based b53 driver that
matches this device.

I'm confused as downstream b53_mdio.c calls phy_driver_register(). Why
does it match MDIO device then? I thought MDIO devices should be
matches only with drivers using mdio_driver_register().

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Lockup in phy_probe() for MDIO device (Broadcom's switch)
  2021-09-30 12:14           ` Rafał Miłecki
@ 2021-09-30 12:30             ` Russell King (Oracle)
  2021-09-30 12:51               ` Rafał Miłecki
  0 siblings, 1 reply; 17+ messages in thread
From: Russell King (Oracle) @ 2021-09-30 12:30 UTC (permalink / raw)
  To: Rafał Miłecki
  Cc: Andrew Lunn, Heiner Kallweit, Network Development,
	Florian Fainelli, BCM Kernel Feedback, Vivek Unune

On Thu, Sep 30, 2021 at 02:14:54PM +0200, Rafał Miłecki wrote:
> On 30.09.2021 13:44, Russell King (Oracle) wrote:
> > On Thu, Sep 30, 2021 at 01:29:33PM +0200, Rafał Miłecki wrote:
> > > On 30.09.2021 12:40, Russell King (Oracle) wrote:
> > > > In phy_probe, can you add:
> > > > 
> > > > 	WARN_ON(!(phydev->mdio.flags & MDIO_DEVICE_FLAG_PHY));
> > > > 
> > > > just to make sure we have a real PHY device there please? Maybe also
> > > > print the value of the flags argument.
> > > > 
> > > > MDIO_DEVICE_FLAG_PHY is set by phy_create_device() before the mutex is
> > > > initialised, so if it is set, the lock should be initialised.
> > > > 
> > > > Maybe also print mdiodev->flags in mdio_device_register() as well, so
> > > > we can see what is being registered and the flags being used for that
> > > > device.
> > > > 
> > > > Could it be that openwrt is carrying a patch that is causing this
> > > > issue?
> > > 
> > > I don't think there is any OpenWrt patch affecting that.
> > > 
> > > MDIO_DEVICE_FLAG_PHY seems to be missing.
> > 
> > Right, so the mdio device being registered is a non-PHY MDIO device.
> > It doesn't have a struct phy_device around it - and so any access
> > outside of the mdio_device is an out-of-bounds access.
> 
> I can confirm that.
> 
> of_mdiobus_register() iterates over node children. It calls
> of_mdiobus_child_is_phy() for the /mdio-mux@18003000/mdio@200/switch@0
> and that returns 0. It results in calling of_mdiobus_register_device().
> 
> So we have MDIO device as expected. It's not a PHY device.

Right - because it's a switch device - identified as having a
compatible but without having a PHY-like compatible.

	compatible = "brcm,bcm53125";

This will be created by mdio_device_create(), which will not have
its bus_match method populated. So the only way a driver registered
on the MDIO bus_type can match is via the DT-based matching I've
previously mentioned.

> > Consequently, phylib should not be matching this device. The only
> > remaining way I can see that this could happen is if a PHY driver has
> > an OF compatible, which phylib drivers should never have.
> 
> It's actually OpenWrt's downstream swconfig-based b53 driver that
> matches this device.
> 
> I'm confused as downstream b53_mdio.c calls phy_driver_register(). Why
> does it match MDIO device then? I thought MDIO devices should be
> matches only with drivers using mdio_driver_register().

Note that I've no idea what he swconfig-based b53 driver looks like,
I don't have the source for that to hand.

If it calls phy_driver_register(), then it is registering a driver for
a MDIO device wrapped in a struct phy_device. If this driver has a
.of_match_table member set, then this is wrong - the basic rule is

	PHY drivers must never match using DT compatibles.

because this is exactly what will occur - it bypasses the check that
the mdio_device being matched is in fact wrapped by a struct phy_device,
and we will access members of the non-existent phy_device, including
the "uninitialised" mutex.

If the swconfig-based b53 driver does want to bind to a phy_device based
DT node, then it needs to match using either a custom .match_phy_device
method in the PHY driver, or it needs to match using the PHY IDs.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Lockup in phy_probe() for MDIO device (Broadcom's switch)
  2021-09-30 12:30             ` Russell King (Oracle)
@ 2021-09-30 12:51               ` Rafał Miłecki
  2021-09-30 13:07                 ` Russell King (Oracle)
  0 siblings, 1 reply; 17+ messages in thread
From: Rafał Miłecki @ 2021-09-30 12:51 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Andrew Lunn, Heiner Kallweit, Network Development,
	Florian Fainelli, BCM Kernel Feedback, Vivek Unune

On 30.09.2021 14:30, Russell King (Oracle) wrote:
>> It's actually OpenWrt's downstream swconfig-based b53 driver that
>> matches this device.
>>
>> I'm confused as downstream b53_mdio.c calls phy_driver_register(). Why
>> does it match MDIO device then? I thought MDIO devices should be
>> matches only with drivers using mdio_driver_register().
> 
> Note that I've no idea what he swconfig-based b53 driver looks like,
> I don't have the source for that to hand.
> 
> If it calls phy_driver_register(), then it is registering a driver for
> a MDIO device wrapped in a struct phy_device. If this driver has a
> .of_match_table member set, then this is wrong - the basic rule is
> 
> 	PHY drivers must never match using DT compatibles.
> 
> because this is exactly what will occur - it bypasses the check that
> the mdio_device being matched is in fact wrapped by a struct phy_device,
> and we will access members of the non-existent phy_device, including
> the "uninitialised" mutex.
> 
> If the swconfig-based b53 driver does want to bind to a phy_device based
> DT node, then it needs to match using either a custom .match_phy_device
> method in the PHY driver, or it needs to match using the PHY IDs.

Sorry, I should have linked downstream b53_mdio.c . It's:
https://git.openwrt.org/?p=openwrt/openwrt.git;a=blob;f=target/linux/generic/files/drivers/net/phy/b53/b53_mdio.c;h=98cdbffe73c7354f4401389dfcc96014bff62588;hb=HEAD

You can see that is *uses* of_match_table.

What about refusing bugged drivers like above b53 with something like:

diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c
index b848439fa..76c8197d3 100644
--- a/drivers/net/phy/mdio_bus.c
+++ b/drivers/net/phy/mdio_bus.c
@@ -941,6 +941,10 @@ static int mdio_bus_match(struct device *dev, struct device_driver *drv)
  {
         struct mdio_device *mdio = to_mdio_device(dev);

+       if (WARN_ONCE(!(mdio->flags & MDIO_DEVICE_FLAG_PHY),
+                "Bugged driver %s\n", drv->name))
+               return 0;
+
         if (of_driver_match_device(dev, drv))
                 return 1;


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: Lockup in phy_probe() for MDIO device (Broadcom's switch)
  2021-09-30 12:51               ` Rafał Miłecki
@ 2021-09-30 13:07                 ` Russell King (Oracle)
  2021-09-30 13:21                   ` Russell King (Oracle)
  2021-09-30 13:42                   ` Rafał Miłecki
  0 siblings, 2 replies; 17+ messages in thread
From: Russell King (Oracle) @ 2021-09-30 13:07 UTC (permalink / raw)
  To: Rafał Miłecki
  Cc: Andrew Lunn, Heiner Kallweit, Network Development,
	Florian Fainelli, BCM Kernel Feedback, Vivek Unune

On Thu, Sep 30, 2021 at 02:51:40PM +0200, Rafał Miłecki wrote:
> On 30.09.2021 14:30, Russell King (Oracle) wrote:
> > > It's actually OpenWrt's downstream swconfig-based b53 driver that
> > > matches this device.
> > > 
> > > I'm confused as downstream b53_mdio.c calls phy_driver_register(). Why
> > > does it match MDIO device then? I thought MDIO devices should be
> > > matches only with drivers using mdio_driver_register().
> > 
> > Note that I've no idea what he swconfig-based b53 driver looks like,
> > I don't have the source for that to hand.
> > 
> > If it calls phy_driver_register(), then it is registering a driver for
> > a MDIO device wrapped in a struct phy_device. If this driver has a
> > .of_match_table member set, then this is wrong - the basic rule is
> > 
> > 	PHY drivers must never match using DT compatibles.
> > 
> > because this is exactly what will occur - it bypasses the check that
> > the mdio_device being matched is in fact wrapped by a struct phy_device,
> > and we will access members of the non-existent phy_device, including
> > the "uninitialised" mutex.
> > 
> > If the swconfig-based b53 driver does want to bind to a phy_device based
> > DT node, then it needs to match using either a custom .match_phy_device
> > method in the PHY driver, or it needs to match using the PHY IDs.
> 
> Sorry, I should have linked downstream b53_mdio.c . It's:
> https://git.openwrt.org/?p=openwrt/openwrt.git;a=blob;f=target/linux/generic/files/drivers/net/phy/b53/b53_mdio.c;h=98cdbffe73c7354f4401389dfcc96014bff62588;hb=HEAD

Yes, I just found a version of the driver

> You can see that is *uses* of_match_table.
> 
> What about refusing bugged drivers like above b53 with something like:

That will break all the MDIO based DSA and other non-PHY drivers,
sorry.

I suppose we could detect if the driver has the MDIO_DEVICE_IS_PHY flag
set, and reject any device that does not have MDIO_DEVICE_IS_PHY set:

diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c
index 53f034fc2ef7..7bc06126ce10 100644
--- a/drivers/net/phy/mdio_bus.c
+++ b/drivers/net/phy/mdio_bus.c
@@ -939,6 +939,12 @@ EXPORT_SYMBOL_GPL(mdiobus_modify);
 static int mdio_bus_match(struct device *dev, struct device_driver *drv)
 {
 	struct mdio_device *mdio = to_mdio_device(dev);
+	struct mdio_driver *mdiodrv = to_mdio_driver(drv);
+
+	/* Both the driver and device must type-match */
+	if (!(mdiodrv->mdiodrv.flags & MDIO_DEVICE_IS_PHY) ==
+	    !(mdio->flags & MDIO_DEVICE_FLAG_PHY))
+		return 0;
 
 	if (of_driver_match_device(dev, drv))
 		return 1;

In other words, the driver's state of the MDIO_DEVICE_IS_PHY flag
must match the device's MDIO_DEVICE_FLAG_PHY flag before we attempt
any matches.

If that's not possible, then we need to prevent phylib drivers from
using .of_match_table:

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index e62f7a232307..dacf0b31b167 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -2501,6 +2501,16 @@ int phy_driver_register(struct phy_driver *new_driver, struct module *owner)
 		return -EINVAL;
 	}
 
+	/* PHYLIB device drivers must not match using a DT compatible table
+	 * as this bypasses our checks that the mdiodev that is being matched
+	 * is backed by a struct phy_device. If such a case happens, we will
+	 * make out-of-bounds accesses and lockup in phydev->lock.
+	 */
+	if (WARN(new_driver->mdiodrv.driver.of_match_table,
+		 "%s: driver must not provide a DT match table\n",
+		 new_driver->name))
+		return -EINVAL;
+
 	new_driver->mdiodrv.flags |= MDIO_DEVICE_IS_PHY;
 	new_driver->mdiodrv.driver.name = new_driver->name;
 	new_driver->mdiodrv.driver.bus = &mdio_bus_type;

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: Lockup in phy_probe() for MDIO device (Broadcom's switch)
  2021-09-30 13:07                 ` Russell King (Oracle)
@ 2021-09-30 13:21                   ` Russell King (Oracle)
  2021-09-30 13:32                     ` Andrew Lunn
  2021-09-30 13:42                   ` Rafał Miłecki
  1 sibling, 1 reply; 17+ messages in thread
From: Russell King (Oracle) @ 2021-09-30 13:21 UTC (permalink / raw)
  To: Rafał Miłecki
  Cc: Andrew Lunn, Heiner Kallweit, Network Development,
	Florian Fainelli, BCM Kernel Feedback, Vivek Unune

On Thu, Sep 30, 2021 at 02:07:44PM +0100, Russell King (Oracle) wrote:
> On Thu, Sep 30, 2021 at 02:51:40PM +0200, Rafał Miłecki wrote:
> > On 30.09.2021 14:30, Russell King (Oracle) wrote:
> > > > It's actually OpenWrt's downstream swconfig-based b53 driver that
> > > > matches this device.
> > > > 
> > > > I'm confused as downstream b53_mdio.c calls phy_driver_register(). Why
> > > > does it match MDIO device then? I thought MDIO devices should be
> > > > matches only with drivers using mdio_driver_register().
> > > 
> > > Note that I've no idea what he swconfig-based b53 driver looks like,
> > > I don't have the source for that to hand.
> > > 
> > > If it calls phy_driver_register(), then it is registering a driver for
> > > a MDIO device wrapped in a struct phy_device. If this driver has a
> > > .of_match_table member set, then this is wrong - the basic rule is
> > > 
> > > 	PHY drivers must never match using DT compatibles.
> > > 
> > > because this is exactly what will occur - it bypasses the check that
> > > the mdio_device being matched is in fact wrapped by a struct phy_device,
> > > and we will access members of the non-existent phy_device, including
> > > the "uninitialised" mutex.
> > > 
> > > If the swconfig-based b53 driver does want to bind to a phy_device based
> > > DT node, then it needs to match using either a custom .match_phy_device
> > > method in the PHY driver, or it needs to match using the PHY IDs.
> > 
> > Sorry, I should have linked downstream b53_mdio.c . It's:
> > https://git.openwrt.org/?p=openwrt/openwrt.git;a=blob;f=target/linux/generic/files/drivers/net/phy/b53/b53_mdio.c;h=98cdbffe73c7354f4401389dfcc96014bff62588;hb=HEAD
> 
> Yes, I just found a version of the driver
> 
> > You can see that is *uses* of_match_table.
> > 
> > What about refusing bugged drivers like above b53 with something like:
> 
> That will break all the MDIO based DSA and other non-PHY drivers,
> sorry.
> 
> I suppose we could detect if the driver has the MDIO_DEVICE_IS_PHY flag
> set, and reject any device that does not have MDIO_DEVICE_IS_PHY set:
> 
> diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c
> index 53f034fc2ef7..7bc06126ce10 100644
> --- a/drivers/net/phy/mdio_bus.c
> +++ b/drivers/net/phy/mdio_bus.c
> @@ -939,6 +939,12 @@ EXPORT_SYMBOL_GPL(mdiobus_modify);
>  static int mdio_bus_match(struct device *dev, struct device_driver *drv)
>  {
>  	struct mdio_device *mdio = to_mdio_device(dev);
> +	struct mdio_driver *mdiodrv = to_mdio_driver(drv);
> +
> +	/* Both the driver and device must type-match */
> +	if (!(mdiodrv->mdiodrv.flags & MDIO_DEVICE_IS_PHY) ==
> +	    !(mdio->flags & MDIO_DEVICE_FLAG_PHY))
> +		return 0;
>  
>  	if (of_driver_match_device(dev, drv))
>  		return 1;
> 
> In other words, the driver's state of the MDIO_DEVICE_IS_PHY flag
> must match the device's MDIO_DEVICE_FLAG_PHY flag before we attempt
> any matches.
> 
> If that's not possible, then we need to prevent phylib drivers from
> using .of_match_table:
> 
> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> index e62f7a232307..dacf0b31b167 100644
> --- a/drivers/net/phy/phy_device.c
> +++ b/drivers/net/phy/phy_device.c
> @@ -2501,6 +2501,16 @@ int phy_driver_register(struct phy_driver *new_driver, struct module *owner)
>  		return -EINVAL;
>  	}
>  
> +	/* PHYLIB device drivers must not match using a DT compatible table
> +	 * as this bypasses our checks that the mdiodev that is being matched
> +	 * is backed by a struct phy_device. If such a case happens, we will
> +	 * make out-of-bounds accesses and lockup in phydev->lock.
> +	 */
> +	if (WARN(new_driver->mdiodrv.driver.of_match_table,
> +		 "%s: driver must not provide a DT match table\n",
> +		 new_driver->name))
> +		return -EINVAL;
> +
>  	new_driver->mdiodrv.flags |= MDIO_DEVICE_IS_PHY;
>  	new_driver->mdiodrv.driver.name = new_driver->name;
>  	new_driver->mdiodrv.driver.bus = &mdio_bus_type;

I should also point out that as this b53 driver that is causing the
problem only exists in OpenWRT, this is really a matter for OpenWRT
developers rather than mainline which does not suffer this problem.
I suspect that OpenWRT developers will not be happy with either of
the two patches I've posted above - I suspect they are trying to
support both DSA and swconfig approaches with a single DT. That can
be made to work, but not with a PHYLIB driver being a wrapper around
the swconfig stuff (precisely because there's no phy_device in this
scenario.)

The only reason to patch mainline kernels would be to make them more
robust, and maybe to also make an explicit statement about what isn't
supported (having a phy_driver with its of_match_table member set.)

I probably should've made that clearer in my email with the patches.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Lockup in phy_probe() for MDIO device (Broadcom's switch)
  2021-09-30 13:21                   ` Russell King (Oracle)
@ 2021-09-30 13:32                     ` Andrew Lunn
  2021-09-30 13:47                       ` Rafał Miłecki
  0 siblings, 1 reply; 17+ messages in thread
From: Andrew Lunn @ 2021-09-30 13:32 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Rafał Miłecki, Heiner Kallweit, Network Development,
	Florian Fainelli, BCM Kernel Feedback, Vivek Unune

> I should also point out that as this b53 driver that is causing the
> problem only exists in OpenWRT, this is really a matter for OpenWRT
> developers rather than mainline which does not suffer this problem.
> I suspect that OpenWRT developers will not be happy with either of
> the two patches I've posted above - I suspect they are trying to
> support both DSA and swconfig approaches with a single DT. That can
> be made to work, but not with a PHYLIB driver being a wrapper around
> the swconfig stuff (precisely because there's no phy_device in this
> scenario.)
> 
> The only reason to patch mainline kernels would be to make them more
> robust, and maybe to also make an explicit statement about what isn't
> supported (having a phy_driver with its of_match_table member set.)

I agree with you here. This is an OpenWRT problem. We would hopefully
catch such a driver at review time and reject it. We could make it
more robust in mainline, but as you said, OpenWRT developers might not
actually like it more robust.

	 Andrew

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Lockup in phy_probe() for MDIO device (Broadcom's switch)
  2021-09-30 13:07                 ` Russell King (Oracle)
  2021-09-30 13:21                   ` Russell King (Oracle)
@ 2021-09-30 13:42                   ` Rafał Miłecki
  2021-09-30 13:54                     ` Russell King (Oracle)
  1 sibling, 1 reply; 17+ messages in thread
From: Rafał Miłecki @ 2021-09-30 13:42 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Andrew Lunn, Heiner Kallweit, Network Development,
	Florian Fainelli, BCM Kernel Feedback, Vivek Unune

On 30.09.2021 15:07, Russell King (Oracle) wrote:
> On Thu, Sep 30, 2021 at 02:51:40PM +0200, Rafał Miłecki wrote:
>> On 30.09.2021 14:30, Russell King (Oracle) wrote:
>>>> It's actually OpenWrt's downstream swconfig-based b53 driver that
>>>> matches this device.
>>>>
>>>> I'm confused as downstream b53_mdio.c calls phy_driver_register(). Why
>>>> does it match MDIO device then? I thought MDIO devices should be
>>>> matches only with drivers using mdio_driver_register().
>>>
>>> Note that I've no idea what he swconfig-based b53 driver looks like,
>>> I don't have the source for that to hand.
>>>
>>> If it calls phy_driver_register(), then it is registering a driver for
>>> a MDIO device wrapped in a struct phy_device. If this driver has a
>>> .of_match_table member set, then this is wrong - the basic rule is
>>>
>>> 	PHY drivers must never match using DT compatibles.
>>>
>>> because this is exactly what will occur - it bypasses the check that
>>> the mdio_device being matched is in fact wrapped by a struct phy_device,
>>> and we will access members of the non-existent phy_device, including
>>> the "uninitialised" mutex.
>>>
>>> If the swconfig-based b53 driver does want to bind to a phy_device based
>>> DT node, then it needs to match using either a custom .match_phy_device
>>> method in the PHY driver, or it needs to match using the PHY IDs.
>>
>> Sorry, I should have linked downstream b53_mdio.c . It's:
>> https://git.openwrt.org/?p=openwrt/openwrt.git;a=blob;f=target/linux/generic/files/drivers/net/phy/b53/b53_mdio.c;h=98cdbffe73c7354f4401389dfcc96014bff62588;hb=HEAD
> 
> Yes, I just found a version of the driver
> 
>> You can see that is *uses* of_match_table.
>>
>> What about refusing bugged drivers like above b53 with something like:
> 
> That will break all the MDIO based DSA and other non-PHY drivers,
> sorry.
> 
> I suppose we could detect if the driver has the MDIO_DEVICE_IS_PHY flag
> set, and reject any device that does not have MDIO_DEVICE_IS_PHY set:
> 
> diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c
> index 53f034fc2ef7..7bc06126ce10 100644
> --- a/drivers/net/phy/mdio_bus.c
> +++ b/drivers/net/phy/mdio_bus.c
> @@ -939,6 +939,12 @@ EXPORT_SYMBOL_GPL(mdiobus_modify);
>   static int mdio_bus_match(struct device *dev, struct device_driver *drv)
>   {
>   	struct mdio_device *mdio = to_mdio_device(dev);
> +	struct mdio_driver *mdiodrv = to_mdio_driver(drv);
> +
> +	/* Both the driver and device must type-match */
> +	if (!(mdiodrv->mdiodrv.flags & MDIO_DEVICE_IS_PHY) ==
> +	    !(mdio->flags & MDIO_DEVICE_FLAG_PHY))
> +		return 0;
>   
>   	if (of_driver_match_device(dev, drv))
>   		return 1;
> 

In OpenWrt & bugged b53 case we have:
1. Device without MDIO_DEVICE_FLAG_PHY
2. Driver with MDIO_DEVICE_IS_PHY

I think the logic should be to return 0 on mismatch (reverted).

Above code doesn't prevent probing bugged b53 driver.


> In other words, the driver's state of the MDIO_DEVICE_IS_PHY flag
> must match the device's MDIO_DEVICE_FLAG_PHY flag before we attempt
> any matches.
> 
> If that's not possible, then we need to prevent phylib drivers from
> using .of_match_table:
> 
> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> index e62f7a232307..dacf0b31b167 100644
> --- a/drivers/net/phy/phy_device.c
> +++ b/drivers/net/phy/phy_device.c
> @@ -2501,6 +2501,16 @@ int phy_driver_register(struct phy_driver *new_driver, struct module *owner)
>   		return -EINVAL;
>   	}
>   
> +	/* PHYLIB device drivers must not match using a DT compatible table
> +	 * as this bypasses our checks that the mdiodev that is being matched
> +	 * is backed by a struct phy_device. If such a case happens, we will
> +	 * make out-of-bounds accesses and lockup in phydev->lock.
> +	 */
> +	if (WARN(new_driver->mdiodrv.driver.of_match_table,
> +		 "%s: driver must not provide a DT match table\n",
> +		 new_driver->name))
> +		return -EINVAL;
> +
>   	new_driver->mdiodrv.flags |= MDIO_DEVICE_IS_PHY;
>   	new_driver->mdiodrv.driver.name = new_driver->name;
>   	new_driver->mdiodrv.driver.bus = &mdio_bus_type;

FWIW it prevents probing b53:

[    6.226037] ------------[ cut here ]------------
[    6.230687] WARNING: CPU: 1 PID: 1 at drivers/net/phy/phy_device.c:2964 phy_driver_register+0xe4/0x108
[    6.240073] Broadcom B53 (1): driver must not provide a DT match table
[    6.246627] Modules linked in:
[    6.249696] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.10.64 #0
[    6.255714] Hardware name: BCM5301X
[    6.259229] [<c0108410>] (unwind_backtrace) from [<c0104bc4>] (show_stack+0x10/0x14)
[    6.266999] [<c0104bc4>] (show_stack) from [<c03dc6a8>] (dump_stack+0x94/0xa8)
[    6.274249] [<c03dc6a8>] (dump_stack) from [<c01183e8>] (__warn+0xb8/0x114)
[    6.281230] [<c01183e8>] (__warn) from [<c01184ac>] (warn_slowpath_fmt+0x68/0x78)
[    6.288736] [<c01184ac>] (warn_slowpath_fmt) from [<c04b7278>] (phy_driver_register+0xe4/0x108)
[    6.297464] [<c04b7278>] (phy_driver_register) from [<c081b72c>] (b53_phy_driver_register+0x14/0x6c)
[    6.306622] [<c081b72c>] (b53_phy_driver_register) from [<c01017e4>] (do_one_initcall+0x54/0x1e8)
[    6.315526] [<c01017e4>] (do_one_initcall) from [<c0801118>] (kernel_init_freeable+0x23c/0x290)
[    6.324246] [<c0801118>] (kernel_init_freeable) from [<c065acd8>] (kernel_init+0x8/0x118)
[    6.332445] [<c065acd8>] (kernel_init) from [<c0100128>] (ret_from_fork+0x14/0x2c)
[    6.340031] Exception stack(0xc1035fb0 to 0xc1035ff8)
[    6.345089] 5fa0:                                     00000000 00000000 00000000 00000000
[    6.353280] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    6.361470] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[    6.368119] ---[ end trace efac8022c3486581 ]---

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Lockup in phy_probe() for MDIO device (Broadcom's switch)
  2021-09-30 13:32                     ` Andrew Lunn
@ 2021-09-30 13:47                       ` Rafał Miłecki
  0 siblings, 0 replies; 17+ messages in thread
From: Rafał Miłecki @ 2021-09-30 13:47 UTC (permalink / raw)
  To: Andrew Lunn, Russell King (Oracle)
  Cc: Heiner Kallweit, Network Development, Florian Fainelli,
	BCM Kernel Feedback, Vivek Unune

On 30.09.2021 15:32, Andrew Lunn wrote:
>> I should also point out that as this b53 driver that is causing the
>> problem only exists in OpenWRT, this is really a matter for OpenWRT
>> developers rather than mainline which does not suffer this problem.
>> I suspect that OpenWRT developers will not be happy with either of
>> the two patches I've posted above - I suspect they are trying to
>> support both DSA and swconfig approaches with a single DT. That can
>> be made to work, but not with a PHYLIB driver being a wrapper around
>> the swconfig stuff (precisely because there's no phy_device in this
>> scenario.)
>>
>> The only reason to patch mainline kernels would be to make them more
>> robust, and maybe to also make an explicit statement about what isn't
>> supported (having a phy_driver with its of_match_table member set.)
> 
> I agree with you here. This is an OpenWRT problem. We would hopefully
> catch such a driver at review time and reject it. We could make it
> more robust in mainline, but as you said, OpenWRT developers might not
> actually like it more robust.
I was thinking about patching mdio_bus_match() / phy_driver_register()
to prevent other developers from doing the same mistake as OpenWrt &
b53. Also saving your time from reports similar to mine.

I understand it's an issue that OpenWrt has to handle downstream.

Thank you a lot for helping me investigate this problem.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Lockup in phy_probe() for MDIO device (Broadcom's switch)
  2021-09-30 13:42                   ` Rafał Miłecki
@ 2021-09-30 13:54                     ` Russell King (Oracle)
  0 siblings, 0 replies; 17+ messages in thread
From: Russell King (Oracle) @ 2021-09-30 13:54 UTC (permalink / raw)
  To: Rafał Miłecki
  Cc: Andrew Lunn, Heiner Kallweit, Network Development,
	Florian Fainelli, BCM Kernel Feedback, Vivek Unune

On Thu, Sep 30, 2021 at 03:42:38PM +0200, Rafał Miłecki wrote:
> On 30.09.2021 15:07, Russell King (Oracle) wrote:
> > On Thu, Sep 30, 2021 at 02:51:40PM +0200, Rafał Miłecki wrote:
> > > On 30.09.2021 14:30, Russell King (Oracle) wrote:
> > > > > It's actually OpenWrt's downstream swconfig-based b53 driver that
> > > > > matches this device.
> > > > > 
> > > > > I'm confused as downstream b53_mdio.c calls phy_driver_register(). Why
> > > > > does it match MDIO device then? I thought MDIO devices should be
> > > > > matches only with drivers using mdio_driver_register().
> > > > 
> > > > Note that I've no idea what he swconfig-based b53 driver looks like,
> > > > I don't have the source for that to hand.
> > > > 
> > > > If it calls phy_driver_register(), then it is registering a driver for
> > > > a MDIO device wrapped in a struct phy_device. If this driver has a
> > > > .of_match_table member set, then this is wrong - the basic rule is
> > > > 
> > > > 	PHY drivers must never match using DT compatibles.
> > > > 
> > > > because this is exactly what will occur - it bypasses the check that
> > > > the mdio_device being matched is in fact wrapped by a struct phy_device,
> > > > and we will access members of the non-existent phy_device, including
> > > > the "uninitialised" mutex.
> > > > 
> > > > If the swconfig-based b53 driver does want to bind to a phy_device based
> > > > DT node, then it needs to match using either a custom .match_phy_device
> > > > method in the PHY driver, or it needs to match using the PHY IDs.
> > > 
> > > Sorry, I should have linked downstream b53_mdio.c . It's:
> > > https://git.openwrt.org/?p=openwrt/openwrt.git;a=blob;f=target/linux/generic/files/drivers/net/phy/b53/b53_mdio.c;h=98cdbffe73c7354f4401389dfcc96014bff62588;hb=HEAD
> > 
> > Yes, I just found a version of the driver
> > 
> > > You can see that is *uses* of_match_table.
> > > 
> > > What about refusing bugged drivers like above b53 with something like:
> > 
> > That will break all the MDIO based DSA and other non-PHY drivers,
> > sorry.
> > 
> > I suppose we could detect if the driver has the MDIO_DEVICE_IS_PHY flag
> > set, and reject any device that does not have MDIO_DEVICE_IS_PHY set:
> > 
> > diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c
> > index 53f034fc2ef7..7bc06126ce10 100644
> > --- a/drivers/net/phy/mdio_bus.c
> > +++ b/drivers/net/phy/mdio_bus.c
> > @@ -939,6 +939,12 @@ EXPORT_SYMBOL_GPL(mdiobus_modify);
> >   static int mdio_bus_match(struct device *dev, struct device_driver *drv)
> >   {
> >   	struct mdio_device *mdio = to_mdio_device(dev);
> > +	struct mdio_driver *mdiodrv = to_mdio_driver(drv);
> > +
> > +	/* Both the driver and device must type-match */
> > +	if (!(mdiodrv->mdiodrv.flags & MDIO_DEVICE_IS_PHY) ==
> > +	    !(mdio->flags & MDIO_DEVICE_FLAG_PHY))
> > +		return 0;
> >   	if (of_driver_match_device(dev, drv))
> >   		return 1;
> > 
> 
> In OpenWrt & bugged b53 case we have:
> 1. Device without MDIO_DEVICE_FLAG_PHY
> 2. Driver with MDIO_DEVICE_IS_PHY
> 
> I think the logic should be to return 0 on mismatch (reverted).

I assume you mean the test should've been != not == - then yes, you
are absolutely correct. Sorry, I'm still not over a cold.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-09-30 13:54 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-30  9:58 Lockup in phy_probe() for MDIO device (Broadcom's switch) Rafał Miłecki
2021-09-30 10:15 ` Rafał Miłecki
2021-09-30 10:17 ` Russell King (Oracle)
2021-09-30 10:30   ` Rafał Miłecki
2021-09-30 10:40     ` Russell King (Oracle)
2021-09-30 11:29       ` Rafał Miłecki
2021-09-30 11:44         ` Russell King (Oracle)
2021-09-30 12:14           ` Rafał Miłecki
2021-09-30 12:30             ` Russell King (Oracle)
2021-09-30 12:51               ` Rafał Miłecki
2021-09-30 13:07                 ` Russell King (Oracle)
2021-09-30 13:21                   ` Russell King (Oracle)
2021-09-30 13:32                     ` Andrew Lunn
2021-09-30 13:47                       ` Rafał Miłecki
2021-09-30 13:42                   ` Rafał Miłecki
2021-09-30 13:54                     ` Russell King (Oracle)
2021-09-30 11:22     ` Rafał Miłecki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.