* NPD in phy_led_set_brightness+0x3c
@ 2023-06-07 20:50 Florian Fainelli
2023-06-07 21:32 ` Andrew Lunn
0 siblings, 1 reply; 7+ messages in thread
From: Florian Fainelli @ 2023-06-07 20:50 UTC (permalink / raw)
To: netdev, Andrew Lunn, Heiner Kallweit, Christian Marangi
Cc: Vladimir Oltean, Pavel Machek, Lee Jones, linux-leds
Hi all,
While adding support for configuring the LEDs with Broadcom PHYs, I
stumbled upon the NPD below which happens while issuing a reboot. The
driver being used is GENET which calls phy_disconnect() in its
shutdown/remove path. This is also reproducible by unbinding the MDIO
bus controller driver, e.g.:
# echo unimac-mdio.0 > unbind
The relevant section of the DT for this platform is the following:
leds {
leds@2 {
default-state =
"keep";
color = <0x4>;
reg = <0x2>;
};
leds@1 {
default-state =
"keep";
color = <0x2>;
reg = <0x1>;
};
};
There is no trigger being configured for either LED therefore it is not
clear to me why the workqueue is being kicked in the first place?
# cat /sys/class/leds/
mmc0::/ unimac-mdio-0:19:amber:/
mmc1::/ unimac-mdio-0:19:green:/
# cat /sys/class/leds/unimac-mdio-0\:19\:
unimac-mdio-0:19:amber:/ unimac-mdio-0:19:green:/
# cat /sys/class/leds/unimac-mdio-0\:19\:green\:/trigger
[none] rfkill-any rfkill-none kbd-scrolllock kbd-numlock kbd-capslock
kbd-kanalock kbd-shiftlock kbd-altgrlock kbd-ctrllock kbd-altlock
kbd-shiftllock kbd-shiftrlock kbd-ctrlllock kbd-ctrlrlock mmc1 mmc0
# cat /sys/class/leds/unimac-mdio-0\:19\:amber\:/trigger
[none] rfkill-any rfkill-none kbd-scrolllock kbd-numlock kbd-capslock
kbd-kanalock kbd-shiftlock kbd-altgrlock kbd-ctrllock kbd-altlock
kbd-shiftllock kbd-shiftrlock kbd-ctrlllock kbd-ctrlrlock mmc1 mmc0
# reboot -f
[ 55.476856] bcmgenet 8f00000.ethernet eth0: Link is Down
[ 55.553834] Unable to handle kernel NULL pointer dereference at
virtual address 00000000000001f0
[ 55.562674] Mem abort info:
[ 55.565482] ESR = 0x0000000096000005
[ 55.569245] EC = 0x25: DABT (current EL), IL = 32 bits
[ 55.574575] SET = 0, FnV = 0
[ 55.577641] EA = 0, S1PTW = 0
[ 55.580797] FSC = 0x05: level 1 translation fault
[ 55.585690] Data abort info:
[ 55.588582] ISV = 0, ISS = 0x00000005
[ 55.592432] CM = 0, WnR = 0
[ 55.595410] user pgtable: 4k pages, 39-bit VAs, pgdp=000000004815d000
[ 55.601870] [00000000000001f0] pgd=0000000000000000,
p4d=0000000000000000, pud=0000000000000000
[ 55.610601] Internal error: Oops: 0000000096000005 [#1] SMP
[ 55.616190] Modules linked in: bdc udc_core
[ 55.620394] CPU: 2 PID: 46 Comm: kworker/2:1 Not tainted
6.4.0-rc1-g665543a0726c #76
[ 55.628156] Hardware name: BCM972180HB_V20 (DT)
[ 55.632697] Workqueue: events set_brightness_delayed
[ 55.637691] pstate: 40000005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS
BTYPE=--)
[ 55.644669] pc : phy_led_set_brightness+0x3c/0x68
[ 55.649389] lr : phy_led_set_brightness+0x30/0x68
[ 55.654105] sp : ffffffc00acabd10
[ 55.657425] x29: ffffffc00acabd10 x28: 0000000000000000 x27:
0000000000000000
[ 55.664581] x26: 0000000000000000 x25: ffffff807dbb340d x24:
0000000000000000
[ 55.671736] x23: ffffff807dbb3400 x22: 0000000000000000 x21:
ffffff8002e59c80
[ 55.678891] x20: ffffff8003d1f520 x19: ffffff8003d1f000 x18:
0000000000000000
[ 55.686046] x17: 74656e2f74656e72 x16: 656874652e303030 x15:
303066382f626472
[ 55.693202] x14: ffffff8003022de0 x13: 6e69622f7273752f x12:
0000000000000000
[ 55.700357] x11: ffffff8002d1d510 x10: 0000000000000870 x9 :
ffffffc008660a7c
[ 55.707512] x8 : ffffff8002e77374 x7 : fefefefefefefeff x6 :
000073746e657665
[ 55.714667] x5 : ffffff8002e77374 x4 : 0000000000000000 x3 :
0000000000000000
[ 55.721822] x2 : 0000000000000000 x1 : 0000000000000001 x0 :
0000000000000000
[ 55.728978] Call trace:
[ 55.731431] phy_led_set_brightness+0x3c/0x68
[ 55.735800] set_brightness_delayed_set_brightness+0x44/0x7c
[ 55.741476] set_brightness_delayed+0xc4/0x1a4
[ 55.745932] process_one_work+0x1a4/0x254
[ 55.749958] process_scheduled_works+0x44/0x48
[ 55.754415] worker_thread+0x1e8/0x264
[ 55.758176] kthread+0xcc/0xdc
[ 55.761242] ret_from_fork+0x10/0x20
[ 55.764833] Code: 940ed4c5 f941a660 2a1603e2 394622a1 (f940f803)
[ 55.770941] ---[ end trace 0000000000000000 ]---
--
Florian
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: NPD in phy_led_set_brightness+0x3c
2023-06-07 20:50 NPD in phy_led_set_brightness+0x3c Florian Fainelli
@ 2023-06-07 21:32 ` Andrew Lunn
2023-06-07 22:29 ` Florian Fainelli
0 siblings, 1 reply; 7+ messages in thread
From: Andrew Lunn @ 2023-06-07 21:32 UTC (permalink / raw)
To: Florian Fainelli
Cc: netdev, Heiner Kallweit, Christian Marangi, Vladimir Oltean,
Pavel Machek, Lee Jones, linux-leds
> There is no trigger being configured for either LED therefore it is not
> clear to me why the workqueue is being kicked in the first place?
Since setting LEDs is a sleepable action, it gets offloaded to a
workqueue.
My guess is, something in led_classdev_unregister() is triggering it,
maybe to put the LED into a known state before pulling the
plug. However, i don't see what.
I'm also wondering about ordering. The LED is registered with
devm_led_classdev_register_ext(). So maybe led_classdev_unregister()
is getting called too late? So maybe we need to replace devm_ with
manual cleanup.
However, i've done lots of reboots while developing this code, so its
interesting you can trigger this, and i've not seen it.
Andrew
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: NPD in phy_led_set_brightness+0x3c
2023-06-07 21:32 ` Andrew Lunn
@ 2023-06-07 22:29 ` Florian Fainelli
2023-06-08 1:30 ` Andrew Lunn
0 siblings, 1 reply; 7+ messages in thread
From: Florian Fainelli @ 2023-06-07 22:29 UTC (permalink / raw)
To: Andrew Lunn
Cc: netdev, Heiner Kallweit, Christian Marangi, Vladimir Oltean,
Pavel Machek, Lee Jones, linux-leds
On 6/7/23 14:32, Andrew Lunn wrote:
>> There is no trigger being configured for either LED therefore it is not
>> clear to me why the workqueue is being kicked in the first place?
>
> Since setting LEDs is a sleepable action, it gets offloaded to a
> workqueue.
>
> My guess is, something in led_classdev_unregister() is triggering it,
> maybe to put the LED into a known state before pulling the
> plug. However, i don't see what.
>
> I'm also wondering about ordering. The LED is registered with
> devm_led_classdev_register_ext(). So maybe led_classdev_unregister()
> is getting called too late? So maybe we need to replace devm_ with
> manual cleanup.
>
> However, i've done lots of reboots while developing this code, so its
> interesting you can trigger this, and i've not seen it.
led_brightness_set is the member of phydev->drv which has become NULL:
(gdb) print /x (int)&((struct phy_driver *)0)->led_brightness_set
$1 = 0x1f0
so this would indeed look like an use-after-free here. If you tested
with a PHYLINK enabled driver you might have no seen due to
phylink_disconnect_phy() being called with RTNL held?
--
Florian
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: NPD in phy_led_set_brightness+0x3c
2023-06-07 22:29 ` Florian Fainelli
@ 2023-06-08 1:30 ` Andrew Lunn
2023-06-08 17:33 ` Florian Fainelli
0 siblings, 1 reply; 7+ messages in thread
From: Andrew Lunn @ 2023-06-08 1:30 UTC (permalink / raw)
To: Florian Fainelli
Cc: netdev, Heiner Kallweit, Christian Marangi, Vladimir Oltean,
Pavel Machek, Lee Jones, linux-leds
> (gdb) print /x (int)&((struct phy_driver *)0)->led_brightness_set
> $1 = 0x1f0
>
> so this would indeed look like an use-after-free here. If you tested with a
> PHYLINK enabled driver you might have no seen due to
> phylink_disconnect_phy() being called with RTNL held?
Yes, i've been testing with mvneta, which is phylink.
Andrew
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: NPD in phy_led_set_brightness+0x3c
2023-06-08 1:30 ` Andrew Lunn
@ 2023-06-08 17:33 ` Florian Fainelli
2023-06-08 19:13 ` Andrew Lunn
0 siblings, 1 reply; 7+ messages in thread
From: Florian Fainelli @ 2023-06-08 17:33 UTC (permalink / raw)
To: Andrew Lunn
Cc: netdev, Heiner Kallweit, Christian Marangi, Vladimir Oltean,
Pavel Machek, Lee Jones, linux-leds
On 6/7/23 18:30, Andrew Lunn wrote:
>> (gdb) print /x (int)&((struct phy_driver *)0)->led_brightness_set
>> $1 = 0x1f0
>>
>> so this would indeed look like an use-after-free here. If you tested with a
>> PHYLINK enabled driver you might have no seen due to
>> phylink_disconnect_phy() being called with RTNL held?
>
> Yes, i've been testing with mvneta, which is phylink.
Humm, this is really puzzling because we have the below call trace as to
where we call schedule_work() which is in led_set_brightness_nopm()
however we have led_classdev_unregister() call flush_work() to ensure
the workqueue completed. Is there something else in that call stack that
prevents the system workqueue from running?
[ 280.663384] ------------[ cut here ]------------
[ 280.668038] WARNING: CPU: 3 PID: 1497 at drivers/leds/led-core.c:333
led_set_brightness_nopm+0x68/0xf8
[ 280.677378] Modules linked in: bdc udc_core
[ 280.681585] CPU: 3 PID: 1497 Comm: reboot Not tainted
6.4.0-rc5-next-20230607-g27d73db94b91 #94
[ 280.690304] Hardware name: BCM972180HB_V20 (DT)
[ 280.694845] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS
BTYPE=--)
[ 280.701824] pc : led_set_brightness_nopm+0x68/0xf8
[ 280.706628] lr : led_set_brightness_nosleep+0x2c/0x38
[ 280.711691] sp : ffffffc082ddb7b0
[ 280.715012] x29: ffffffc082ddb7b0 x28: ffffff8007a55780 x27:
0000000000000000
[ 280.722168] x26: ffffff8002fdcc90 x25: 0000000000000001 x24:
0000000000000000
[ 280.729323] x23: ffffff8002e6b000 x22: ffffffc082ddb898 x21:
ffffffc080c4b676
[ 280.736480] x20: ffffff800792b990 x19: ffffff800792b898 x18:
0000000000000000
[ 280.743636] x17: 74656e2f74656e72 x16: 656874652e303030 x15:
303066382f626472
[ 280.750791] x14: ffffff8004a6ccd8 x13: 6e69622f7273752f x12:
0000000000000000
[ 280.757948] x11: ffffff8002d1c710 x10: ffffff8002e6b2a0 x9 :
ffffffc0807ad6c0
[ 280.765103] x8 : ffffffc080595964 x7 : ffffffc08059550c x6 :
ffffff8002e6b2a0
[ 280.772258] x5 : 0000000000000000 x4 : ffffff800792b8b0 x3 :
ffffff800792b8b0
[ 280.779414] x2 : ffffff800792b898 x1 : 0000000000000000 x0 :
0000000000000040
[ 280.786570] Call trace:
[ 280.789021] led_set_brightness_nopm+0x68/0xf8
[ 280.793476] led_set_brightness_nosleep+0x2c/0x38
[ 280.798192] led_set_brightness+0x9c/0xa0
[ 280.802210] led_classdev_unregister+0x78/0xd0
[ 280.806665] devm_led_classdev_release+0x18/0x20
[ 280.811294] release_nodes+0x70/0x84
[ 280.814884] devres_release_all+0xa0/0xd4
[ 280.818905] device_unbind_cleanup+0x1c/0x60
[ 280.823189] device_release_driver_internal+0xa8/0x128
[ 280.828341] device_release_driver+0x1c/0x24
[ 280.832622] bus_remove_device+0x108/0x12c
[ 280.836731] device_del+0x194/0x2ec
[ 280.840230] phy_device_remove+0x1c/0x3c
[ 280.844167] phy_mdio_device_remove+0x14/0x1c
[ 280.848537] mdiobus_unregister+0x6c/0xa0
[ 280.852560] unimac_mdio_remove+0x20/0x4c
[ 280.856582] platform_remove+0x50/0x68
[ 280.860342] device_remove+0x50/0x74
[ 280.863929] device_release_driver_internal+0x80/0x128
[ 280.869079] device_release_driver+0x1c/0x24
[ 280.873360] bus_remove_device+0x108/0x12c
[ 280.877471] device_del+0x194/0x2ec
[ 280.880969] platform_device_del+0x2c/0x90
[ 280.885077] platform_device_unregister+0x1c/0x30
[ 280.889793] bcmgenet_mii_exit+0x40/0x4c
[ 280.893728] bcmgenet_remove+0x2c/0x44
[ 280.897489] bcmgenet_shutdown+0x14/0x1c
[ 280.901422] platform_shutdown+0x28/0x34
[ 280.905355] device_shutdown+0x158/0x1d8
[ 280.909290] kernel_restart_prepare+0x3c/0x44
[ 280.913661] kernel_restart+0x1c/0x7c
[ 280.917332] __do_sys_reboot+0x170/0x1f4
[ 280.921265] __arm64_sys_reboot+0x24/0x2c
[ 280.925286] invoke_syscall+0x80/0x114
[ 280.929047] el0_svc_common.constprop.1+0xb8/0xe4
[ 280.933762] do_el0_svc+0x90/0x9c
[ 280.937086] el0_svc+0x1c/0x44
[ 280.940154] el0t_64_sync_handler+0x100/0x150
[ 280.944524] el0t_64_sync+0x14c/0x150
[ 280.948198] ---[ end trace 0000000000000000 ]---
[ 280.952885] Unable to handle kernel NULL pointer dereference at
virtual address 00000000000001f0
[ 280.961697] Mem abort info:
[ 280.964502] ESR = 0x0000000096000005
[ 280.968264] EC = 0x25: DABT (current EL), IL = 32 bits
[ 280.973594] SET = 0, FnV = 0
[ 280.976661] EA = 0, S1PTW = 0
[ 280.979815] FSC = 0x05: level 1 translation fault
[ 280.984708] Data abort info:
[ 280.987600] ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
[ 280.993101] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 280.998170] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 281.003500] user pgtable: 4k pages, 39-bit VAs, pgdp=0000000045cde000
[ 281.009960] [00000000000001f0] pgd=0000000000000000,
p4d=0000000000000000, pud=0000000000000000
[ 281.018691] Internal error: Oops: 0000000096000005 [#1] SMP
[ 281.024280] Modules linked in: bdc udc_core
[ 281.028480] CPU: 3 PID: 817 Comm: kworker/3:2 Tainted: G W
6.4.0-rc5-next-20230607-g27d73db94b91 #94
[ 281.039024] Hardware name: BCM972180HB_V20 (DT)
[ 281.043565] Workqueue: events set_brightness_delayed
[ 281.048543] pstate: 40000005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS
BTYPE=--)
[ 281.055520] pc : phy_led_set_brightness+0x3c/0x68
[ 281.060238] lr : phy_led_set_brightness+0x30/0x68
[ 281.064955] sp : ffffffc0845bbd20
[ 281.068276] x29: ffffffc0845bbd20 x28: 0000000000000000 x27:
0000000000000000
[ 281.075432] x26: 0000000000000000 x25: 0000000000000000 x24:
ffffff807dbcc40d
[ 281.082587] x23: ffffff800792b960 x22: 0000000000000000 x21:
ffffff800792b880
[ 281.089743] x20: ffffff8002e6b520 x19: ffffff8002e6b000 x18:
0000000000000000
[ 281.096899] x17: 74656e2f74656e72 x16: 656874652e303030 x15:
303066382f626472
[ 281.104054] x14: ffffff8004a6ccd8 x13: 6e69622f7273752f x12:
0000000000000000
[ 281.111209] x11: ffffff8002d1c710 x10: 0000000000000870 x9 :
ffffffc080663bd0
[ 281.118364] x8 : ffffff80065f1a80 x7 : fefefefefefefeff x6 :
000073746e657665
[ 281.125519] x5 : ffffff80065f1a80 x4 : 0000000000000000 x3 :
0000000000000000
[ 281.132676] x2 : 0000000000000000 x1 : 0000000000000001 x0 :
0000000000000000
[ 281.139831] Call trace:
[ 281.142283] phy_led_set_brightness+0x3c/0x68
[ 281.146652] set_brightness_delayed_set_brightness+0x44/0x7c
[ 281.152328] set_brightness_delayed+0xc4/0x1a4
[ 281.156783] process_one_work+0x1c0/0x284
[ 281.160806] process_scheduled_works+0x44/0x48
[ 281.165263] worker_thread+0x1e8/0x264
[ 281.169023] kthread+0xcc/0xdc
[ 281.172089] ret_from_fork+0x10/0x20
[ 281.175678] Code: 940edf9f f941a660 2a1603e2 3946c2a1 (f940f803)
[ 281.181786] ---[ end trace 0000000000000000 ]---
--
Florian
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: NPD in phy_led_set_brightness+0x3c
2023-06-08 17:33 ` Florian Fainelli
@ 2023-06-08 19:13 ` Andrew Lunn
2023-06-08 22:03 ` Florian Fainelli
0 siblings, 1 reply; 7+ messages in thread
From: Andrew Lunn @ 2023-06-08 19:13 UTC (permalink / raw)
To: Florian Fainelli
Cc: netdev, Heiner Kallweit, Christian Marangi, Vladimir Oltean,
Pavel Machek, Lee Jones, linux-leds
On Thu, Jun 08, 2023 at 10:33:30AM -0700, Florian Fainelli wrote:
> On 6/7/23 18:30, Andrew Lunn wrote:
> > > (gdb) print /x (int)&((struct phy_driver *)0)->led_brightness_set
> > > $1 = 0x1f0
> > >
> > > so this would indeed look like an use-after-free here. If you tested with a
> > > PHYLINK enabled driver you might have no seen due to
> > > phylink_disconnect_phy() being called with RTNL held?
> >
> > Yes, i've been testing with mvneta, which is phylink.
>
> Humm, this is really puzzling because we have the below call trace as to
> where we call schedule_work() which is in led_set_brightness_nopm() however
> we have led_classdev_unregister() call flush_work() to ensure the workqueue
> completed. Is there something else in that call stack that prevents the
> system workqueue from running?
Has phy_remove() already been called? Last thing it does is:
phydev->drv = NULL;
This is one of the differences between my system and yours. With
mvneta, the mdio bus driver is an independent device. You have a
combined MAC and MDIO bus driver.
Andrew
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: NPD in phy_led_set_brightness+0x3c
2023-06-08 19:13 ` Andrew Lunn
@ 2023-06-08 22:03 ` Florian Fainelli
0 siblings, 0 replies; 7+ messages in thread
From: Florian Fainelli @ 2023-06-08 22:03 UTC (permalink / raw)
To: Andrew Lunn
Cc: netdev, Heiner Kallweit, Christian Marangi, Vladimir Oltean,
Pavel Machek, Lee Jones, linux-leds
On 6/8/23 12:13, Andrew Lunn wrote:
> On Thu, Jun 08, 2023 at 10:33:30AM -0700, Florian Fainelli wrote:
>> On 6/7/23 18:30, Andrew Lunn wrote:
>>>> (gdb) print /x (int)&((struct phy_driver *)0)->led_brightness_set
>>>> $1 = 0x1f0
>>>>
>>>> so this would indeed look like an use-after-free here. If you tested with a
>>>> PHYLINK enabled driver you might have no seen due to
>>>> phylink_disconnect_phy() being called with RTNL held?
>>>
>>> Yes, i've been testing with mvneta, which is phylink.
>>
>> Humm, this is really puzzling because we have the below call trace as to
>> where we call schedule_work() which is in led_set_brightness_nopm() however
>> we have led_classdev_unregister() call flush_work() to ensure the workqueue
>> completed. Is there something else in that call stack that prevents the
>> system workqueue from running?
>
> Has phy_remove() already been called? Last thing it does is:
>
> phydev->drv = NULL;
>
> This is one of the differences between my system and yours. With
> mvneta, the mdio bus driver is an independent device. You have a
> combined MAC and MDIO bus driver.
Yes, good point. I did change to the patch below, however that still
triggers an delayed led_brightness_set call which now gets scheduled
*after* we removed the MDIO bus controller and shutdown the MDIO bus clock:
diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 2cad9cc3f6b8..f838c4f92524 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -3053,7 +3053,7 @@ static int of_phy_led(struct phy_device *phydev,
init_data.fwnode = of_fwnode_handle(led);
init_data.devname_mandatory = true;
- err = devm_led_classdev_register_ext(dev, cdev, &init_data);
+ err = led_classdev_register_ext(dev, cdev, &init_data);
if (err)
return err;
@@ -3298,6 +3298,14 @@ static int phy_probe(struct device *dev)
return err;
}
+static void phy_remove_leds(struct phy_device *phydev)
+{
+ struct phy_led *phyled;
+
+ list_for_each_entry(phyled, &phydev->leds, list)
+ led_classdev_unregister(&phyled->led_cdev);
+}
+
static int phy_remove(struct device *dev)
{
struct phy_device *phydev = to_phy_device(dev);
@@ -3315,6 +3323,8 @@ static int phy_remove(struct device *dev)
/* Assert the reset signal */
phy_device_reset(phydev, 1);
+ phy_remove_leds(phydev);
+
phydev->drv = NULL;
return 0;
--
Florian
^ permalink raw reply related [flat|nested] 7+ messages in thread
end of thread, other threads:[~2023-06-08 22:04 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-07 20:50 NPD in phy_led_set_brightness+0x3c Florian Fainelli
2023-06-07 21:32 ` Andrew Lunn
2023-06-07 22:29 ` Florian Fainelli
2023-06-08 1:30 ` Andrew Lunn
2023-06-08 17:33 ` Florian Fainelli
2023-06-08 19:13 ` Andrew Lunn
2023-06-08 22:03 ` Florian Fainelli
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).