All of lore.kernel.org
 help / color / mirror / Atom feed
* Bug: power supply - NULL pointer dereference in bq27x00 driver
@ 2015-05-18  9:40 ` Dr. H. Nikolaus Schaller
  0 siblings, 0 replies; 12+ messages in thread
From: Dr. H. Nikolaus Schaller @ 2015-05-18  9:40 UTC (permalink / raw)
  To: Krzysztof Kozlowski
  Cc: Rodolfo Giometti, Sebastian Reichel, LKML, Marek Belisko,
	Linux PM mailing list, Dmitry Eremin-Solenikov, David Woodhouse,
	List for communicating with real GTA04 owners

Hi,
we tried to upgrade from 4.0 to 4.1-rc3 kernel and ran into a NULL pointer problem within the bq27x00 driver.

It appears to be introduced by your patch 297d716f6260cc9421d971b124ca196b957ee458

The problem appears to be that bq27x00_powersupply_init() calls power_supply_register_no_ws() and
sets di->bat *after* return. The old code did pass an uninitialized struct pointer.

Now for reasons I don’t understand, the power_supply_register_no_ws() appears to call
uevent related stuff which in turn calls bq27x00_battery_get_property() before di->bat
is properly initialized.

I have checked with printk in bq27x00_battery_get_property() that di>bat == NULL in this case and
right before we see the segfault.

The old code simply did pass a zeroed struct power_supply and perhaps initialized its components
during registration.

Returning some -EINVAL if di->bat == NULL would likely solve the NULL pointer dereference but
I don’t know what it does to the uevent and if it restores previous operation.

It could have been that it was for good purpose that power_supply_register_no_ws() did not return
by value, but by reference to the di->bat struct:

-	ret = power_supply_register_no_ws(di->dev, &di->bat, NULL);
+	di->bat = power_supply_register_no_ws(di->dev, psy_desc, &psy_cfg);

So that code called within the context of power_supply_register_no_ws() could already
refer to initialized di->bat.


BR and thanks,
Nikolaus Schaller


[   11.879943] Unable to handle kernel NULL pointer dereference at virtual address 0000000c
[   11.888519] pgd = c0004000
[   11.891357] [0000000c] *pgd=00000000
[   11.895141] Internal error: Oops: 5 [#1] SMP ARM
[   11.899963] Modules linked in: bq27x00_battery w1_bq27000 ov9655 v4l2_common omap_hdq snd_soc_omap_mcbsp videodev hmc5843_i2c(C) lis3lv02d_i2c snd_soc_omap lis3lv02d itg3200 snd_pcm_dmaengine tsc2007 bmp085_i2c bma150 hmc5843_core(C) media at24 input_polldev leds_tca6507 rtc_twl twl4030_pwrbutton twl4030_keypad twl4030_madc snd_soc_twl4030 twl4030_vibra ehci_omap
[   11.933898] CPU: 0 PID: 1164 Comm: w1_bus_master1 Tainted: G         C      4.1.0-rc3-gta04+ #1086
[   11.943267] Hardware name: Generic OMAP36xx (Flattened Device Tree)
[   11.949829] task: dd2729c0 ti: dd276000 task.ti: dd276000
[   11.955505] PC is at __power_supply_is_supplied_by+0x8/0xc0
[   11.961364] LR is at __power_supply_am_i_supplied+0x18/0x48
[   11.967193] pc : [<c0469f4c>]    lr : [<c046a058>]    psr: a0000013
[   11.967193] sp : dd277bb8  ip : 00000000  fp : c092f82c
[   11.979217] r10: de348000  r9 : dd18cc20  r8 : dd18cc20
[   11.984680] r7 : de5c9420  r6 : de5c9400  r5 : de5c9400  r4 : 00000000
[   11.991516] r3 : de5c9648  r2 : 00000000  r1 : 00000000  r0 : de5c9400
[   11.998352] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
[   12.006011] Control: 10c5387d  Table: 9c1b0019  DAC: 00000015
[   12.012054] Process w1_bus_master1 (pid: 1164, stack limit = 0xdd276218)
[   12.019073] Stack: (0xdd277bb8 to 0xdd278000)
[   12.023620] 7ba0:                                                       de5c9648 00000000
[   12.032196] 7bc0: de5c9400 c046a040 de5c9420 c046a058 de5c9420 00000000 00000000 dd277bf4
[   12.040771] 7be0: 00000000 c0374ebc c05c4268 dd18eec4 dd18cc20 de27fea4 de5c9648 00000000
[   12.049346] 7c00: dd18eee8 00000000 dd277c3c dd18ee10 dd18eec4 bf0d0de4 00000000 00000000
[   12.057922] 7c20: dd377000 c092f82c dd18cc20 c0469508 b6db6db7 c046a140 dd18cc20 c092f82c
[   12.066467] 7c40: 00000000 dd377000 00000000 00000000 dd18cc00 c046a42c dd18cc20 de348000
[   12.075042] 7c60: c046a3b0 de348000 dd18cc28 dd18cc20 de0918c0 c0608dd8 00000000 00000000
[   12.083618] 7c80: c07b5bb1 c0371f0c c07b5bb1 c02c72c4 de348000 dd277ca8 c07b5bb1 de348000
[   12.092193] 7ca0: de348000 dd18cc28 00000000 c02c7544 00000007 00000006 00000000 c0083394
[   12.100769] 7cc0: dd2729c0 dd2f6540 c07b2ae3 dc1f7950 dd18c810 dd18cc20 00000000 dd18cc28
[   12.109344] 7ce0: dc1f7950 dd18c810 00000000 00000000 00000000 c0371938 dd18cc20 00000000
[   12.117919] 7d00: c0008280 00000080 dd18cc00 dd18cc20 00000000 dc1f7950 dd18c810 c0469cd8
[   12.126464] 7d20: bf0d124f dd277d44 1f1c4000 c05ae99c bf0d124f dd18ee10 dc1f7950 bf0cc121
[   12.135040] 7d40: 0000001b dd014c50 00000001 bf0d0864 00000000 dd18ee10 00000000 00000000
[   12.143615] 7d60: ffffffed dd18c810 bf0d1670 c03752c0 dd18c810 00000000 bf0d1670 c0373bd4
[   12.152191] 7d80: 00000007 dd18c810 bf0d1670 c0373ed8 00000000 c0373e34 00000000 dd18c810
[   12.160766] 7da0: dd277da8 c037265c de0948d4 dd2f6194 dd18c810 dd18c810 dd18c844 dd18c810
[   12.169342] 7dc0: bf0cc160 c0373dbc dd18c810 c0918590 dd18c810 c0373294 dd18c810 00000000
[   12.177917] 7de0: dd18c818 c0371940 dd18c810 0000002f dd18c810 00000000 dd18c800 00000000
[   12.186492] 7e00: 00000000 bf0cc160 dd0144c0 c03754bc dd014c50 dd18c800 dd014c00 00000000
[   12.195037] 7e20: bf0cc160 bf0cc058 dd014c00 dd014c00 bf0cc174 c04663c0 dd014c00 00000000
[   12.203613] 7e40: dd014c50 c0466694 00000000 00000000 dd014c00 dd014400 dd277ea0 c0466cfc
[   12.212188] 7e60: dd01444c c00835e8 dd01442c 60000013 dd014450 00000000 00000000 00000000
[   12.220764] 7e80: dd01444c dd014400 dd0144a8 0000003d c0466d94 dd01444c c0466d94 c0466e10
[   12.229339] 7ea0: 00000001 3d000000 00000001 3d000000 00000001 00000000 dd014400 bf0ba0ac
[   12.237915] 7ec0: 00000001 00000000 000000f0 dd014400 dd0147a8 000000f0 000000f0 c0469410
[   12.246490] 7ee0: dd014400 dd01442c dd014470 c04675d0 dd2729c0 dd014400 00000000 dd01442c
[   12.255035] 7f00: dd0144c0 c1111c48 000003e8 c046773c 00000000 00000000 dd35a980 dd014400
[   12.263610] 7f20: c04676dc 00000000 00000000 c0059838 dfa71c80 00000000 00000000 dd014400
[   12.272186] 7f40: 00000000 00000000 dead4ead ffffffff ffffffff c0952f54 00000000 00000000
[   12.280761] 7f60: c0759a92 dd277f64 dd277f64 00000000 00000000 dead4ead ffffffff ffffffff
[   12.289337] 7f80: c0952f54 00000000 00000000 c0759a92 dd277f90 dd277f90 dd277fac dd35a980
[   12.297882] 7fa0: c0059764 00000000 00000000 c000ed50 00000000 00000000 00000000 00000000
[   12.306457] 7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[   12.315032] 7fe0: 00000000 00000000 00000000 00000000 00000013 00000000 9fef6821 9fef6c21
[   12.323608] [<c0469f4c>] (__power_supply_is_supplied_by) from [<c046a058>] (__power_supply_am_i_supplied+0x18/0x48)
[   12.334564] [<c046a058>] (__power_supply_am_i_supplied) from [<c0374ebc>] (class_for_each_device+0x68/0xa8)
[   12.344787] [<c0374ebc>] (class_for_each_device) from [<bf0d0de4>] (bq27x00_battery_get_property+0x19c/0x3a8 [bq27x00_battery])
[   12.356811] [<bf0d0de4>] (bq27x00_battery_get_property [bq27x00_battery]) from [<c0469508>] (power_supply_get_property+0x1c/0x28)
[   12.369018] [<c0469508>] (power_supply_get_property) from [<c046a140>] (power_supply_show_property+0x48/0x1ac)
[   12.379516] [<c046a140>] (power_supply_show_property) from [<c046a42c>] (power_supply_uevent+0x7c/0x148)
[   12.389465] [<c046a42c>] (power_supply_uevent) from [<c0371f0c>] (dev_uevent+0x174/0x1c0)
[   12.398040] [<c0371f0c>] (dev_uevent) from [<c02c7544>] (kobject_uevent_env+0x184/0x47c)
[   12.406524] [<c02c7544>] (kobject_uevent_env) from [<c0371938>] (device_add+0x1fc/0x34c)
[   12.415008] [<c0371938>] (device_add) from [<c0469cd8>] (__power_supply_register+0x144/0x290)
[   12.423950] [<c0469cd8>] (__power_supply_register) from [<bf0d0864>] (bq27x00_powersupply_init+0x120/0x1c0 [bq27x00_battery])
[   12.435791] [<bf0d0864>] (bq27x00_powersupply_init [bq27x00_battery]) from [<c03752c0>] (platform_drv_probe+0x48/0x90)
[   12.447021] [<c03752c0>] (platform_drv_probe) from [<c0373bd4>] (really_probe+0xd4/0x238)
[   12.455596] [<c0373bd4>] (really_probe) from [<c0373e34>] (driver_probe_device+0x30/0x48)
[   12.464141] [<c0373e34>] (driver_probe_device) from [<c037265c>] (bus_for_each_drv+0x4c/0x84)
[   12.473083] [<c037265c>] (bus_for_each_drv) from [<c0373dbc>] (device_attach+0x60/0x8c)
[   12.481475] [<c0373dbc>] (device_attach) from [<c0373294>] (bus_probe_device+0x28/0xa0)
[   12.489868] [<c0373294>] (bus_probe_device) from [<c0371940>] (device_add+0x204/0x34c)
[   12.498168] [<c0371940>] (device_add) from [<c03754bc>] (platform_device_add+0x138/0x1c8)
[   12.506744] [<c03754bc>] (platform_device_add) from [<bf0cc058>] (w1_bq27000_add_slave+0x50/0x78 [w1_bq27000])
[   12.517242] [<bf0cc058>] (w1_bq27000_add_slave [w1_bq27000]) from [<c04663c0>] (w1_family_notify+0x44/0xc8)
[   12.527465] [<c04663c0>] (w1_family_notify) from [<c0466694>] (__w1_attach_slave_device+0xe4/0x154)
[   12.536956] [<c0466694>] (__w1_attach_slave_device) from [<c0466cfc>] (w1_attach_slave_device+0x13c/0x1d4)
[   12.547058] [<c0466cfc>] (w1_attach_slave_device) from [<c0466e10>] (w1_slave_found+0x7c/0x98)
[   12.556091] [<c0466e10>] (w1_slave_found) from [<bf0ba0ac>] (omap_w1_search_bus+0x54/0x5c [omap_hdq])
[   12.565765] [<bf0ba0ac>] (omap_w1_search_bus [omap_hdq]) from [<c0469410>] (w1_search_devices+0x3c/0x48)
[   12.575714] [<c0469410>] (w1_search_devices) from [<c04675d0>] (w1_search_process_cb+0x64/0x108)
[   12.584930] [<c04675d0>] (w1_search_process_cb) from [<c046773c>] (w1_process+0x60/0x164)
[   12.593505] [<c046773c>] (w1_process) from [<c0059838>] (kthread+0xd4/0xe8)
[   12.600799] [<c0059838>] (kthread) from [<c000ed50>] (ret_from_fork+0x14/0x24)
[   12.608367] Code: e12fff33 e8bd8008 e92d40f8 e1a06000 (e591500c) 
[   12.614929] ---[ end trace 2ba904cc466626a7 ]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Bug: power supply - NULL pointer dereference in bq27x00 driver
@ 2015-05-18  9:40 ` Dr. H. Nikolaus Schaller
  0 siblings, 0 replies; 12+ messages in thread
From: Dr. H. Nikolaus Schaller @ 2015-05-18  9:40 UTC (permalink / raw)
  To: Krzysztof Kozlowski
  Cc: Rodolfo Giometti, Sebastian Reichel, LKML, Marek Belisko,
	Linux PM mailing list, Dmitry Eremin-Solenikov, David Woodhouse,
	List for communicating with real GTA04 owners

Hi,
we tried to upgrade from 4.0 to 4.1-rc3 kernel and ran into a NULL pointer problem within the bq27x00 driver.

It appears to be introduced by your patch 297d716f6260cc9421d971b124ca196b957ee458

The problem appears to be that bq27x00_powersupply_init() calls power_supply_register_no_ws() and
sets di->bat *after* return. The old code did pass an uninitialized struct pointer.

Now for reasons I don’t understand, the power_supply_register_no_ws() appears to call
uevent related stuff which in turn calls bq27x00_battery_get_property() before di->bat
is properly initialized.

I have checked with printk in bq27x00_battery_get_property() that di>bat == NULL in this case and
right before we see the segfault.

The old code simply did pass a zeroed struct power_supply and perhaps initialized its components
during registration.

Returning some -EINVAL if di->bat == NULL would likely solve the NULL pointer dereference but
I don’t know what it does to the uevent and if it restores previous operation.

It could have been that it was for good purpose that power_supply_register_no_ws() did not return
by value, but by reference to the di->bat struct:

-	ret = power_supply_register_no_ws(di->dev, &di->bat, NULL);
+	di->bat = power_supply_register_no_ws(di->dev, psy_desc, &psy_cfg);

So that code called within the context of power_supply_register_no_ws() could already
refer to initialized di->bat.


BR and thanks,
Nikolaus Schaller


[   11.879943] Unable to handle kernel NULL pointer dereference at virtual address 0000000c
[   11.888519] pgd = c0004000
[   11.891357] [0000000c] *pgd=00000000
[   11.895141] Internal error: Oops: 5 [#1] SMP ARM
[   11.899963] Modules linked in: bq27x00_battery w1_bq27000 ov9655 v4l2_common omap_hdq snd_soc_omap_mcbsp videodev hmc5843_i2c(C) lis3lv02d_i2c snd_soc_omap lis3lv02d itg3200 snd_pcm_dmaengine tsc2007 bmp085_i2c bma150 hmc5843_core(C) media at24 input_polldev leds_tca6507 rtc_twl twl4030_pwrbutton twl4030_keypad twl4030_madc snd_soc_twl4030 twl4030_vibra ehci_omap
[   11.933898] CPU: 0 PID: 1164 Comm: w1_bus_master1 Tainted: G         C      4.1.0-rc3-gta04+ #1086
[   11.943267] Hardware name: Generic OMAP36xx (Flattened Device Tree)
[   11.949829] task: dd2729c0 ti: dd276000 task.ti: dd276000
[   11.955505] PC is at __power_supply_is_supplied_by+0x8/0xc0
[   11.961364] LR is at __power_supply_am_i_supplied+0x18/0x48
[   11.967193] pc : [<c0469f4c>]    lr : [<c046a058>]    psr: a0000013
[   11.967193] sp : dd277bb8  ip : 00000000  fp : c092f82c
[   11.979217] r10: de348000  r9 : dd18cc20  r8 : dd18cc20
[   11.984680] r7 : de5c9420  r6 : de5c9400  r5 : de5c9400  r4 : 00000000
[   11.991516] r3 : de5c9648  r2 : 00000000  r1 : 00000000  r0 : de5c9400
[   11.998352] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
[   12.006011] Control: 10c5387d  Table: 9c1b0019  DAC: 00000015
[   12.012054] Process w1_bus_master1 (pid: 1164, stack limit = 0xdd276218)
[   12.019073] Stack: (0xdd277bb8 to 0xdd278000)
[   12.023620] 7ba0:                                                       de5c9648 00000000
[   12.032196] 7bc0: de5c9400 c046a040 de5c9420 c046a058 de5c9420 00000000 00000000 dd277bf4
[   12.040771] 7be0: 00000000 c0374ebc c05c4268 dd18eec4 dd18cc20 de27fea4 de5c9648 00000000
[   12.049346] 7c00: dd18eee8 00000000 dd277c3c dd18ee10 dd18eec4 bf0d0de4 00000000 00000000
[   12.057922] 7c20: dd377000 c092f82c dd18cc20 c0469508 b6db6db7 c046a140 dd18cc20 c092f82c
[   12.066467] 7c40: 00000000 dd377000 00000000 00000000 dd18cc00 c046a42c dd18cc20 de348000
[   12.075042] 7c60: c046a3b0 de348000 dd18cc28 dd18cc20 de0918c0 c0608dd8 00000000 00000000
[   12.083618] 7c80: c07b5bb1 c0371f0c c07b5bb1 c02c72c4 de348000 dd277ca8 c07b5bb1 de348000
[   12.092193] 7ca0: de348000 dd18cc28 00000000 c02c7544 00000007 00000006 00000000 c0083394
[   12.100769] 7cc0: dd2729c0 dd2f6540 c07b2ae3 dc1f7950 dd18c810 dd18cc20 00000000 dd18cc28
[   12.109344] 7ce0: dc1f7950 dd18c810 00000000 00000000 00000000 c0371938 dd18cc20 00000000
[   12.117919] 7d00: c0008280 00000080 dd18cc00 dd18cc20 00000000 dc1f7950 dd18c810 c0469cd8
[   12.126464] 7d20: bf0d124f dd277d44 1f1c4000 c05ae99c bf0d124f dd18ee10 dc1f7950 bf0cc121
[   12.135040] 7d40: 0000001b dd014c50 00000001 bf0d0864 00000000 dd18ee10 00000000 00000000
[   12.143615] 7d60: ffffffed dd18c810 bf0d1670 c03752c0 dd18c810 00000000 bf0d1670 c0373bd4
[   12.152191] 7d80: 00000007 dd18c810 bf0d1670 c0373ed8 00000000 c0373e34 00000000 dd18c810
[   12.160766] 7da0: dd277da8 c037265c de0948d4 dd2f6194 dd18c810 dd18c810 dd18c844 dd18c810
[   12.169342] 7dc0: bf0cc160 c0373dbc dd18c810 c0918590 dd18c810 c0373294 dd18c810 00000000
[   12.177917] 7de0: dd18c818 c0371940 dd18c810 0000002f dd18c810 00000000 dd18c800 00000000
[   12.186492] 7e00: 00000000 bf0cc160 dd0144c0 c03754bc dd014c50 dd18c800 dd014c00 00000000
[   12.195037] 7e20: bf0cc160 bf0cc058 dd014c00 dd014c00 bf0cc174 c04663c0 dd014c00 00000000
[   12.203613] 7e40: dd014c50 c0466694 00000000 00000000 dd014c00 dd014400 dd277ea0 c0466cfc
[   12.212188] 7e60: dd01444c c00835e8 dd01442c 60000013 dd014450 00000000 00000000 00000000
[   12.220764] 7e80: dd01444c dd014400 dd0144a8 0000003d c0466d94 dd01444c c0466d94 c0466e10
[   12.229339] 7ea0: 00000001 3d000000 00000001 3d000000 00000001 00000000 dd014400 bf0ba0ac
[   12.237915] 7ec0: 00000001 00000000 000000f0 dd014400 dd0147a8 000000f0 000000f0 c0469410
[   12.246490] 7ee0: dd014400 dd01442c dd014470 c04675d0 dd2729c0 dd014400 00000000 dd01442c
[   12.255035] 7f00: dd0144c0 c1111c48 000003e8 c046773c 00000000 00000000 dd35a980 dd014400
[   12.263610] 7f20: c04676dc 00000000 00000000 c0059838 dfa71c80 00000000 00000000 dd014400
[   12.272186] 7f40: 00000000 00000000 dead4ead ffffffff ffffffff c0952f54 00000000 00000000
[   12.280761] 7f60: c0759a92 dd277f64 dd277f64 00000000 00000000 dead4ead ffffffff ffffffff
[   12.289337] 7f80: c0952f54 00000000 00000000 c0759a92 dd277f90 dd277f90 dd277fac dd35a980
[   12.297882] 7fa0: c0059764 00000000 00000000 c000ed50 00000000 00000000 00000000 00000000
[   12.306457] 7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[   12.315032] 7fe0: 00000000 00000000 00000000 00000000 00000013 00000000 9fef6821 9fef6c21
[   12.323608] [<c0469f4c>] (__power_supply_is_supplied_by) from [<c046a058>] (__power_supply_am_i_supplied+0x18/0x48)
[   12.334564] [<c046a058>] (__power_supply_am_i_supplied) from [<c0374ebc>] (class_for_each_device+0x68/0xa8)
[   12.344787] [<c0374ebc>] (class_for_each_device) from [<bf0d0de4>] (bq27x00_battery_get_property+0x19c/0x3a8 [bq27x00_battery])
[   12.356811] [<bf0d0de4>] (bq27x00_battery_get_property [bq27x00_battery]) from [<c0469508>] (power_supply_get_property+0x1c/0x28)
[   12.369018] [<c0469508>] (power_supply_get_property) from [<c046a140>] (power_supply_show_property+0x48/0x1ac)
[   12.379516] [<c046a140>] (power_supply_show_property) from [<c046a42c>] (power_supply_uevent+0x7c/0x148)
[   12.389465] [<c046a42c>] (power_supply_uevent) from [<c0371f0c>] (dev_uevent+0x174/0x1c0)
[   12.398040] [<c0371f0c>] (dev_uevent) from [<c02c7544>] (kobject_uevent_env+0x184/0x47c)
[   12.406524] [<c02c7544>] (kobject_uevent_env) from [<c0371938>] (device_add+0x1fc/0x34c)
[   12.415008] [<c0371938>] (device_add) from [<c0469cd8>] (__power_supply_register+0x144/0x290)
[   12.423950] [<c0469cd8>] (__power_supply_register) from [<bf0d0864>] (bq27x00_powersupply_init+0x120/0x1c0 [bq27x00_battery])
[   12.435791] [<bf0d0864>] (bq27x00_powersupply_init [bq27x00_battery]) from [<c03752c0>] (platform_drv_probe+0x48/0x90)
[   12.447021] [<c03752c0>] (platform_drv_probe) from [<c0373bd4>] (really_probe+0xd4/0x238)
[   12.455596] [<c0373bd4>] (really_probe) from [<c0373e34>] (driver_probe_device+0x30/0x48)
[   12.464141] [<c0373e34>] (driver_probe_device) from [<c037265c>] (bus_for_each_drv+0x4c/0x84)
[   12.473083] [<c037265c>] (bus_for_each_drv) from [<c0373dbc>] (device_attach+0x60/0x8c)
[   12.481475] [<c0373dbc>] (device_attach) from [<c0373294>] (bus_probe_device+0x28/0xa0)
[   12.489868] [<c0373294>] (bus_probe_device) from [<c0371940>] (device_add+0x204/0x34c)
[   12.498168] [<c0371940>] (device_add) from [<c03754bc>] (platform_device_add+0x138/0x1c8)
[   12.506744] [<c03754bc>] (platform_device_add) from [<bf0cc058>] (w1_bq27000_add_slave+0x50/0x78 [w1_bq27000])
[   12.517242] [<bf0cc058>] (w1_bq27000_add_slave [w1_bq27000]) from [<c04663c0>] (w1_family_notify+0x44/0xc8)
[   12.527465] [<c04663c0>] (w1_family_notify) from [<c0466694>] (__w1_attach_slave_device+0xe4/0x154)
[   12.536956] [<c0466694>] (__w1_attach_slave_device) from [<c0466cfc>] (w1_attach_slave_device+0x13c/0x1d4)
[   12.547058] [<c0466cfc>] (w1_attach_slave_device) from [<c0466e10>] (w1_slave_found+0x7c/0x98)
[   12.556091] [<c0466e10>] (w1_slave_found) from [<bf0ba0ac>] (omap_w1_search_bus+0x54/0x5c [omap_hdq])
[   12.565765] [<bf0ba0ac>] (omap_w1_search_bus [omap_hdq]) from [<c0469410>] (w1_search_devices+0x3c/0x48)
[   12.575714] [<c0469410>] (w1_search_devices) from [<c04675d0>] (w1_search_process_cb+0x64/0x108)
[   12.584930] [<c04675d0>] (w1_search_process_cb) from [<c046773c>] (w1_process+0x60/0x164)
[   12.593505] [<c046773c>] (w1_process) from [<c0059838>] (kthread+0xd4/0xe8)
[   12.600799] [<c0059838>] (kthread) from [<c000ed50>] (ret_from_fork+0x14/0x24)
[   12.608367] Code: e12fff33 e8bd8008 e92d40f8 e1a06000 (e591500c) 
[   12.614929] ---[ end trace 2ba904cc466626a7 ]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Bug: power supply - NULL pointer dereference in bq27x00 driver
  2015-05-18  9:40 ` Dr. H. Nikolaus Schaller
@ 2015-05-18 12:32   ` Krzysztof Kozlowski
  -1 siblings, 0 replies; 12+ messages in thread
From: Krzysztof Kozlowski @ 2015-05-18 12:32 UTC (permalink / raw)
  To: Dr. H. Nikolaus Schaller
  Cc: Krzysztof Kozlowski, Rodolfo Giometti, Sebastian Reichel, LKML,
	Marek Belisko, Linux PM mailing list, Dmitry Eremin-Solenikov,
	David Woodhouse, List for communicating with real GTA04 owners

2015-05-18 18:40 GMT+09:00 Dr. H. Nikolaus Schaller <hns@goldelico.com>:
> Hi,
> we tried to upgrade from 4.0 to 4.1-rc3 kernel and ran into a NULL pointer problem within the bq27x00 driver.
>
> It appears to be introduced by your patch 297d716f6260cc9421d971b124ca196b957ee458
>
> The problem appears to be that bq27x00_powersupply_init() calls power_supply_register_no_ws() and
> sets di->bat *after* return. The old code did pass an uninitialized struct pointer.
>
> Now for reasons I don’t understand, the power_supply_register_no_ws() appears to call
> uevent related stuff which in turn calls bq27x00_battery_get_property() before di->bat
> is properly initialized.
>
> I have checked with printk in bq27x00_battery_get_property() that di>bat == NULL in this case and
> right before we see the segfault.
>
> The old code simply did pass a zeroed struct power_supply and perhaps initialized its components
> during registration.
>
> Returning some -EINVAL if di->bat == NULL would likely solve the NULL pointer dereference but
> I don’t know what it does to the uevent and if it restores previous operation.
>
> It could have been that it was for good purpose that power_supply_register_no_ws() did not return
> by value, but by reference to the di->bat struct:
>
> -       ret = power_supply_register_no_ws(di->dev, &di->bat, NULL);
> +       di->bat = power_supply_register_no_ws(di->dev, psy_desc, &psy_cfg);
>
> So that code called within the context of power_supply_register_no_ws() could already
> refer to initialized di->bat.

Indeed this issue was not present in previous design however I think
the architecture was still error-prone. Starting from driver's probe:
 - some_driver_probe()
   - power_supply_register(&psy)
     - device_add()
       - kobject_uevent() - notify userspace
        - power_supply_uevent()
          - power_supply_show_property()
            - power_supply_get_property()
              - some_driver_get_property()

So before probe() ends the power supply core calls driver's
get_property(). In that time the driver internal structures may not be
ready yet, because the probe did not end. The get_property() could
access some registers or other core functions which will be ready
after power_supply_register() call. For example the
bq27x00_battery_get_property() accesses delayed work which could be
(by mistake) not yet initialized.

I looked at other dev_uevent implementations and almost all of them do
not call back the driver.

Of course this does not change that I introduced the issue and I feel
bad about it.
I got some ideas for resolving it:
1. Fix bq27x00_battery and maybe others so they won't access the 'psy'
member of its state container. This does not solve the issue from
architectural point of view - still some uninitialised data may be
accessed because probe() is in progress. However this would be the
fastest and the least intrusive.

2. Remove calls to get_property() (and other functions provided by
driver) from power_supply_uevent(). Unfortunately this may break
userspace which expects that some data will be present in uevent.

3. Change the API to the previous convention, which you pointed as a remedy:
ret = power_supply_register_no_ws(di->dev, &di->bat, psy_desc, &psy_cfg);
This also won't solve the issue from 1. that uevent will be called
before probe ends.

4. Block the power_supply_uevent() from calling the get_property()
functions until device_add() finishes. This would solve this issue but
first uevent messages (from adding device) won't contain all of device
data (just like in solution no 2.) and this won't solve other race -
someone may call sysfs show() on created device nodes and thus access
the get_property() before probe finishes.

Any ideas?

Best regards,
Krzysztof

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Bug: power supply - NULL pointer dereference in bq27x00 driver
@ 2015-05-18 12:32   ` Krzysztof Kozlowski
  0 siblings, 0 replies; 12+ messages in thread
From: Krzysztof Kozlowski @ 2015-05-18 12:32 UTC (permalink / raw)
  To: Dr. H. Nikolaus Schaller
  Cc: Krzysztof Kozlowski, Rodolfo Giometti, Sebastian Reichel, LKML,
	Marek Belisko, Linux PM mailing list, Dmitry Eremin-Solenikov,
	David Woodhouse, List for communicating with real GTA04 owners

2015-05-18 18:40 GMT+09:00 Dr. H. Nikolaus Schaller <hns@goldelico.com>:
> Hi,
> we tried to upgrade from 4.0 to 4.1-rc3 kernel and ran into a NULL pointer problem within the bq27x00 driver.
>
> It appears to be introduced by your patch 297d716f6260cc9421d971b124ca196b957ee458
>
> The problem appears to be that bq27x00_powersupply_init() calls power_supply_register_no_ws() and
> sets di->bat *after* return. The old code did pass an uninitialized struct pointer.
>
> Now for reasons I don’t understand, the power_supply_register_no_ws() appears to call
> uevent related stuff which in turn calls bq27x00_battery_get_property() before di->bat
> is properly initialized.
>
> I have checked with printk in bq27x00_battery_get_property() that di>bat == NULL in this case and
> right before we see the segfault.
>
> The old code simply did pass a zeroed struct power_supply and perhaps initialized its components
> during registration.
>
> Returning some -EINVAL if di->bat == NULL would likely solve the NULL pointer dereference but
> I don’t know what it does to the uevent and if it restores previous operation.
>
> It could have been that it was for good purpose that power_supply_register_no_ws() did not return
> by value, but by reference to the di->bat struct:
>
> -       ret = power_supply_register_no_ws(di->dev, &di->bat, NULL);
> +       di->bat = power_supply_register_no_ws(di->dev, psy_desc, &psy_cfg);
>
> So that code called within the context of power_supply_register_no_ws() could already
> refer to initialized di->bat.

Indeed this issue was not present in previous design however I think
the architecture was still error-prone. Starting from driver's probe:
 - some_driver_probe()
   - power_supply_register(&psy)
     - device_add()
       - kobject_uevent() - notify userspace
        - power_supply_uevent()
          - power_supply_show_property()
            - power_supply_get_property()
              - some_driver_get_property()

So before probe() ends the power supply core calls driver's
get_property(). In that time the driver internal structures may not be
ready yet, because the probe did not end. The get_property() could
access some registers or other core functions which will be ready
after power_supply_register() call. For example the
bq27x00_battery_get_property() accesses delayed work which could be
(by mistake) not yet initialized.

I looked at other dev_uevent implementations and almost all of them do
not call back the driver.

Of course this does not change that I introduced the issue and I feel
bad about it.
I got some ideas for resolving it:
1. Fix bq27x00_battery and maybe others so they won't access the 'psy'
member of its state container. This does not solve the issue from
architectural point of view - still some uninitialised data may be
accessed because probe() is in progress. However this would be the
fastest and the least intrusive.

2. Remove calls to get_property() (and other functions provided by
driver) from power_supply_uevent(). Unfortunately this may break
userspace which expects that some data will be present in uevent.

3. Change the API to the previous convention, which you pointed as a remedy:
ret = power_supply_register_no_ws(di->dev, &di->bat, psy_desc, &psy_cfg);
This also won't solve the issue from 1. that uevent will be called
before probe ends.

4. Block the power_supply_uevent() from calling the get_property()
functions until device_add() finishes. This would solve this issue but
first uevent messages (from adding device) won't contain all of device
data (just like in solution no 2.) and this won't solve other race -
someone may call sysfs show() on created device nodes and thus access
the get_property() before probe finishes.

Any ideas?

Best regards,
Krzysztof

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Bug: power supply - NULL pointer dereference in bq27x00 driver
  2015-05-18 12:32   ` Krzysztof Kozlowski
@ 2015-05-18 13:30     ` Dr. H. Nikolaus Schaller
  -1 siblings, 0 replies; 12+ messages in thread
From: Dr. H. Nikolaus Schaller @ 2015-05-18 13:30 UTC (permalink / raw)
  To: Krzysztof Kozlowski
  Cc: Rodolfo Giometti, Sebastian Reichel, LKML, Marek Belisko,
	Linux PM mailing list, Dmitry Eremin-Solenikov, David Woodhouse,
	List for communicating with real GTA04 owners

Hi Krzysztof,

Am 18.05.2015 um 14:32 schrieb Krzysztof Kozlowski <k.kozlowski@samsung.com>:

> 2015-05-18 18:40 GMT+09:00 Dr. H. Nikolaus Schaller <hns@goldelico.com>:
>> Hi,
>> we tried to upgrade from 4.0 to 4.1-rc3 kernel and ran into a NULL pointer problem within the bq27x00 driver.
>> 
>> It appears to be introduced by your patch 297d716f6260cc9421d971b124ca196b957ee458
>> 
>> The problem appears to be that bq27x00_powersupply_init() calls power_supply_register_no_ws() and
>> sets di->bat *after* return. The old code did pass an uninitialized struct pointer.
>> 
>> Now for reasons I don’t understand, the power_supply_register_no_ws() appears to call
>> uevent related stuff which in turn calls bq27x00_battery_get_property() before di->bat
>> is properly initialized.
>> 
>> I have checked with printk in bq27x00_battery_get_property() that di>bat == NULL in this case and
>> right before we see the segfault.
>> 
>> The old code simply did pass a zeroed struct power_supply and perhaps initialized its components
>> during registration.
>> 
>> Returning some -EINVAL if di->bat == NULL would likely solve the NULL pointer dereference but
>> I don’t know what it does to the uevent and if it restores previous operation.
>> 
>> It could have been that it was for good purpose that power_supply_register_no_ws() did not return
>> by value, but by reference to the di->bat struct:
>> 
>> -       ret = power_supply_register_no_ws(di->dev, &di->bat, NULL);
>> +       di->bat = power_supply_register_no_ws(di->dev, psy_desc, &psy_cfg);
>> 
>> So that code called within the context of power_supply_register_no_ws() could already
>> refer to initialized di->bat.
> 
> Indeed this issue was not present in previous design however I think
> the architecture was still error-prone. Starting from driver's probe:
> - some_driver_probe()
>   - power_supply_register(&psy)
>     - device_add()
>       - kobject_uevent() - notify userspace
>        - power_supply_uevent()
>          - power_supply_show_property()
>            - power_supply_get_property()
>              - some_driver_get_property()
> 
> So before probe() ends the power supply core calls driver's
> get_property(). In that time the driver internal structures may not be
> ready yet, because the probe did not end.

Yes, that is indeed a problem. Maybe it did work with the bq27x00 driver
earlier because the call to power_supply_register(&psy) was the last
activity.

> The get_property() could
> access some registers or other core functions which will be ready
> after power_supply_register() call. For example the
> bq27x00_battery_get_property() accesses delayed work which could be
> (by mistake) not yet initialized.
> 
> I looked at other dev_uevent implementations and almost all of them do
> not call back the driver.
> 
> Of course this does not change that I introduced the issue and I feel
> bad about it.
> I got some ideas for resolving it:
> 1. Fix bq27x00_battery and maybe others so they won't access the 'psy'
> member of its state container. This does not solve the issue from
> architectural point of view - still some uninitialised data may be
> accessed because probe() is in progress. However this would be the
> fastest and the least intrusive.

The problem might be that it fundamentally changes the driver code
architecture. For example one call using di->bat is in

bq27x00_battery_status() {
…
		else if (power_supply_am_i_supplied(di->bat))
			status = POWER_SUPPLY_STATUS_NOT_CHARGING;
…
}

> 
> 2. Remove calls to get_property() (and other functions provided by
> driver) from power_supply_uevent(). Unfortunately this may break
> userspace which expects that some data will be present in uevent.
> 
> 3. Change the API to the previous convention, which you pointed as a remedy:
> ret = power_supply_register_no_ws(di->dev, &di->bat, psy_desc, &psy_cfg);
> This also won't solve the issue from 1. that uevent will be called
> before probe ends.

Well, we could require that power_supply_register_no_ws() is the last
activity to be done in any_driver_probe().

> 
> 4. Block the power_supply_uevent() from calling the get_property()
> functions until device_add() finishes. This would solve this issue but
> first uevent messages (from adding device) won't contain all of device
> data (just like in solution no 2.) and this won't solve other race -
> someone may call sysfs show() on created device nodes and thus access
> the get_property() before probe finishes.

> 
> Any ideas?

5. is it possible to delay the call to kobject_uevent() after some_driver_probe()
is finished? E.g. by registering a one-shot delayed work?

There seems to be a shared workqueue (mentioned e.g. in
<http://www.makelinux.net/ldd3/chp-7-sect-6>)
but that is an area of the kernel I am not familiar with.

BR,
Nikolaus


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Bug: power supply - NULL pointer dereference in bq27x00 driver
@ 2015-05-18 13:30     ` Dr. H. Nikolaus Schaller
  0 siblings, 0 replies; 12+ messages in thread
From: Dr. H. Nikolaus Schaller @ 2015-05-18 13:30 UTC (permalink / raw)
  To: Krzysztof Kozlowski
  Cc: Rodolfo Giometti, Sebastian Reichel, LKML, Marek Belisko,
	Linux PM mailing list, Dmitry Eremin-Solenikov, David Woodhouse,
	List for communicating with real GTA04 owners

Hi Krzysztof,

Am 18.05.2015 um 14:32 schrieb Krzysztof Kozlowski <k.kozlowski@samsung.com>:

> 2015-05-18 18:40 GMT+09:00 Dr. H. Nikolaus Schaller <hns@goldelico.com>:
>> Hi,
>> we tried to upgrade from 4.0 to 4.1-rc3 kernel and ran into a NULL pointer problem within the bq27x00 driver.
>> 
>> It appears to be introduced by your patch 297d716f6260cc9421d971b124ca196b957ee458
>> 
>> The problem appears to be that bq27x00_powersupply_init() calls power_supply_register_no_ws() and
>> sets di->bat *after* return. The old code did pass an uninitialized struct pointer.
>> 
>> Now for reasons I don’t understand, the power_supply_register_no_ws() appears to call
>> uevent related stuff which in turn calls bq27x00_battery_get_property() before di->bat
>> is properly initialized.
>> 
>> I have checked with printk in bq27x00_battery_get_property() that di>bat == NULL in this case and
>> right before we see the segfault.
>> 
>> The old code simply did pass a zeroed struct power_supply and perhaps initialized its components
>> during registration.
>> 
>> Returning some -EINVAL if di->bat == NULL would likely solve the NULL pointer dereference but
>> I don’t know what it does to the uevent and if it restores previous operation.
>> 
>> It could have been that it was for good purpose that power_supply_register_no_ws() did not return
>> by value, but by reference to the di->bat struct:
>> 
>> -       ret = power_supply_register_no_ws(di->dev, &di->bat, NULL);
>> +       di->bat = power_supply_register_no_ws(di->dev, psy_desc, &psy_cfg);
>> 
>> So that code called within the context of power_supply_register_no_ws() could already
>> refer to initialized di->bat.
> 
> Indeed this issue was not present in previous design however I think
> the architecture was still error-prone. Starting from driver's probe:
> - some_driver_probe()
>   - power_supply_register(&psy)
>     - device_add()
>       - kobject_uevent() - notify userspace
>        - power_supply_uevent()
>          - power_supply_show_property()
>            - power_supply_get_property()
>              - some_driver_get_property()
> 
> So before probe() ends the power supply core calls driver's
> get_property(). In that time the driver internal structures may not be
> ready yet, because the probe did not end.

Yes, that is indeed a problem. Maybe it did work with the bq27x00 driver
earlier because the call to power_supply_register(&psy) was the last
activity.

> The get_property() could
> access some registers or other core functions which will be ready
> after power_supply_register() call. For example the
> bq27x00_battery_get_property() accesses delayed work which could be
> (by mistake) not yet initialized.
> 
> I looked at other dev_uevent implementations and almost all of them do
> not call back the driver.
> 
> Of course this does not change that I introduced the issue and I feel
> bad about it.
> I got some ideas for resolving it:
> 1. Fix bq27x00_battery and maybe others so they won't access the 'psy'
> member of its state container. This does not solve the issue from
> architectural point of view - still some uninitialised data may be
> accessed because probe() is in progress. However this would be the
> fastest and the least intrusive.

The problem might be that it fundamentally changes the driver code
architecture. For example one call using di->bat is in

bq27x00_battery_status() {
…
		else if (power_supply_am_i_supplied(di->bat))
			status = POWER_SUPPLY_STATUS_NOT_CHARGING;
…
}

> 
> 2. Remove calls to get_property() (and other functions provided by
> driver) from power_supply_uevent(). Unfortunately this may break
> userspace which expects that some data will be present in uevent.
> 
> 3. Change the API to the previous convention, which you pointed as a remedy:
> ret = power_supply_register_no_ws(di->dev, &di->bat, psy_desc, &psy_cfg);
> This also won't solve the issue from 1. that uevent will be called
> before probe ends.

Well, we could require that power_supply_register_no_ws() is the last
activity to be done in any_driver_probe().

> 
> 4. Block the power_supply_uevent() from calling the get_property()
> functions until device_add() finishes. This would solve this issue but
> first uevent messages (from adding device) won't contain all of device
> data (just like in solution no 2.) and this won't solve other race -
> someone may call sysfs show() on created device nodes and thus access
> the get_property() before probe finishes.

> 
> Any ideas?

5. is it possible to delay the call to kobject_uevent() after some_driver_probe()
is finished? E.g. by registering a one-shot delayed work?

There seems to be a shared workqueue (mentioned e.g. in
<http://www.makelinux.net/ldd3/chp-7-sect-6>)
but that is an area of the kernel I am not familiar with.

BR,
Nikolaus


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Bug: power supply - NULL pointer dereference in bq27x00 driver
  2015-05-18 12:32   ` Krzysztof Kozlowski
@ 2015-05-18 13:32     ` Krzysztof Kozlowski
  -1 siblings, 0 replies; 12+ messages in thread
From: Krzysztof Kozlowski @ 2015-05-18 13:32 UTC (permalink / raw)
  To: Krzysztof Kozlowski
  Cc: Dr. H. Nikolaus Schaller, Rodolfo Giometti, Sebastian Reichel,
	LKML, Marek Belisko, Linux PM mailing list,
	Dmitry Eremin-Solenikov, David Woodhouse,
	List for communicating with real GTA04 owners

2015-05-18 21:32 GMT+09:00 Krzysztof Kozlowski <k.kozlowski@samsung.com>:
> 2015-05-18 18:40 GMT+09:00 Dr. H. Nikolaus Schaller <hns@goldelico.com>:
>> Hi,
>> we tried to upgrade from 4.0 to 4.1-rc3 kernel and ran into a NULL pointer problem within the bq27x00 driver.
>>
>> It appears to be introduced by your patch 297d716f6260cc9421d971b124ca196b957ee458
>>
>> The problem appears to be that bq27x00_powersupply_init() calls power_supply_register_no_ws() and
>> sets di->bat *after* return. The old code did pass an uninitialized struct pointer.
>>
>> Now for reasons I don’t understand, the power_supply_register_no_ws() appears to call
>> uevent related stuff which in turn calls bq27x00_battery_get_property() before di->bat
>> is properly initialized.
>>
>> I have checked with printk in bq27x00_battery_get_property() that di>bat == NULL in this case and
>> right before we see the segfault.
>>
>> The old code simply did pass a zeroed struct power_supply and perhaps initialized its components
>> during registration.
>>
>> Returning some -EINVAL if di->bat == NULL would likely solve the NULL pointer dereference but
>> I don’t know what it does to the uevent and if it restores previous operation.
>>
>> It could have been that it was for good purpose that power_supply_register_no_ws() did not return
>> by value, but by reference to the di->bat struct:
>>
>> -       ret = power_supply_register_no_ws(di->dev, &di->bat, NULL);
>> +       di->bat = power_supply_register_no_ws(di->dev, psy_desc, &psy_cfg);
>>
>> So that code called within the context of power_supply_register_no_ws() could already
>> refer to initialized di->bat.
>
> Indeed this issue was not present in previous design however I think
> the architecture was still error-prone. Starting from driver's probe:
>  - some_driver_probe()
>    - power_supply_register(&psy)
>      - device_add()
>        - kobject_uevent() - notify userspace
>         - power_supply_uevent()
>           - power_supply_show_property()
>             - power_supply_get_property()
>               - some_driver_get_property()
>
> So before probe() ends the power supply core calls driver's
> get_property(). In that time the driver internal structures may not be
> ready yet, because the probe did not end. The get_property() could
> access some registers or other core functions which will be ready
> after power_supply_register() call. For example the
> bq27x00_battery_get_property() accesses delayed work which could be
> (by mistake) not yet initialized.
>
> I looked at other dev_uevent implementations and almost all of them do
> not call back the driver.
>
> Of course this does not change that I introduced the issue and I feel
> bad about it.
> I got some ideas for resolving it:
> 1. Fix bq27x00_battery and maybe others so they won't access the 'psy'
> member of its state container. This does not solve the issue from
> architectural point of view - still some uninitialised data may be
> accessed because probe() is in progress. However this would be the
> fastest and the least intrusive.
>
> 2. Remove calls to get_property() (and other functions provided by
> driver) from power_supply_uevent(). Unfortunately this may break
> userspace which expects that some data will be present in uevent.
>
> 3. Change the API to the previous convention, which you pointed as a remedy:
> ret = power_supply_register_no_ws(di->dev, &di->bat, psy_desc, &psy_cfg);
> This also won't solve the issue from 1. that uevent will be called
> before probe ends.
>
> 4. Block the power_supply_uevent() from calling the get_property()
> functions until device_add() finishes. This would solve this issue but
> first uevent messages (from adding device) won't contain all of device
> data (just like in solution no 2.) and this won't solve other race -
> someone may call sysfs show() on created device nodes and thus access
> the get_property() before probe finishes.

Unfortunately similar issue is with changed_work of power supply. The
power_supply_register() calls at the end power_supply_changed_work()
which through leds calls get_property().

Best regards,
Krzysztof

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Bug: power supply - NULL pointer dereference in bq27x00 driver
@ 2015-05-18 13:32     ` Krzysztof Kozlowski
  0 siblings, 0 replies; 12+ messages in thread
From: Krzysztof Kozlowski @ 2015-05-18 13:32 UTC (permalink / raw)
  To: Krzysztof Kozlowski
  Cc: Dr. H. Nikolaus Schaller, Rodolfo Giometti, Sebastian Reichel,
	LKML, Marek Belisko, Linux PM mailing list,
	Dmitry Eremin-Solenikov, David Woodhouse,
	List for communicating with real GTA04 owners

2015-05-18 21:32 GMT+09:00 Krzysztof Kozlowski <k.kozlowski@samsung.com>:
> 2015-05-18 18:40 GMT+09:00 Dr. H. Nikolaus Schaller <hns@goldelico.com>:
>> Hi,
>> we tried to upgrade from 4.0 to 4.1-rc3 kernel and ran into a NULL pointer problem within the bq27x00 driver.
>>
>> It appears to be introduced by your patch 297d716f6260cc9421d971b124ca196b957ee458
>>
>> The problem appears to be that bq27x00_powersupply_init() calls power_supply_register_no_ws() and
>> sets di->bat *after* return. The old code did pass an uninitialized struct pointer.
>>
>> Now for reasons I don’t understand, the power_supply_register_no_ws() appears to call
>> uevent related stuff which in turn calls bq27x00_battery_get_property() before di->bat
>> is properly initialized.
>>
>> I have checked with printk in bq27x00_battery_get_property() that di>bat == NULL in this case and
>> right before we see the segfault.
>>
>> The old code simply did pass a zeroed struct power_supply and perhaps initialized its components
>> during registration.
>>
>> Returning some -EINVAL if di->bat == NULL would likely solve the NULL pointer dereference but
>> I don’t know what it does to the uevent and if it restores previous operation.
>>
>> It could have been that it was for good purpose that power_supply_register_no_ws() did not return
>> by value, but by reference to the di->bat struct:
>>
>> -       ret = power_supply_register_no_ws(di->dev, &di->bat, NULL);
>> +       di->bat = power_supply_register_no_ws(di->dev, psy_desc, &psy_cfg);
>>
>> So that code called within the context of power_supply_register_no_ws() could already
>> refer to initialized di->bat.
>
> Indeed this issue was not present in previous design however I think
> the architecture was still error-prone. Starting from driver's probe:
>  - some_driver_probe()
>    - power_supply_register(&psy)
>      - device_add()
>        - kobject_uevent() - notify userspace
>         - power_supply_uevent()
>           - power_supply_show_property()
>             - power_supply_get_property()
>               - some_driver_get_property()
>
> So before probe() ends the power supply core calls driver's
> get_property(). In that time the driver internal structures may not be
> ready yet, because the probe did not end. The get_property() could
> access some registers or other core functions which will be ready
> after power_supply_register() call. For example the
> bq27x00_battery_get_property() accesses delayed work which could be
> (by mistake) not yet initialized.
>
> I looked at other dev_uevent implementations and almost all of them do
> not call back the driver.
>
> Of course this does not change that I introduced the issue and I feel
> bad about it.
> I got some ideas for resolving it:
> 1. Fix bq27x00_battery and maybe others so they won't access the 'psy'
> member of its state container. This does not solve the issue from
> architectural point of view - still some uninitialised data may be
> accessed because probe() is in progress. However this would be the
> fastest and the least intrusive.
>
> 2. Remove calls to get_property() (and other functions provided by
> driver) from power_supply_uevent(). Unfortunately this may break
> userspace which expects that some data will be present in uevent.
>
> 3. Change the API to the previous convention, which you pointed as a remedy:
> ret = power_supply_register_no_ws(di->dev, &di->bat, psy_desc, &psy_cfg);
> This also won't solve the issue from 1. that uevent will be called
> before probe ends.
>
> 4. Block the power_supply_uevent() from calling the get_property()
> functions until device_add() finishes. This would solve this issue but
> first uevent messages (from adding device) won't contain all of device
> data (just like in solution no 2.) and this won't solve other race -
> someone may call sysfs show() on created device nodes and thus access
> the get_property() before probe finishes.

Unfortunately similar issue is with changed_work of power supply. The
power_supply_register() calls at the end power_supply_changed_work()
which through leds calls get_property().

Best regards,
Krzysztof

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Bug: power supply - NULL pointer dereference in bq27x00 driver
  2015-05-18 13:30     ` Dr. H. Nikolaus Schaller
@ 2015-05-18 14:09       ` Krzysztof Kozlowski
  -1 siblings, 0 replies; 12+ messages in thread
From: Krzysztof Kozlowski @ 2015-05-18 14:09 UTC (permalink / raw)
  To: Dr. H. Nikolaus Schaller
  Cc: Krzysztof Kozlowski, Rodolfo Giometti, Sebastian Reichel, LKML,
	Marek Belisko, Linux PM mailing list, Dmitry Eremin-Solenikov,
	David Woodhouse, List for communicating with real GTA04 owners

2015-05-18 22:30 GMT+09:00 Dr. H. Nikolaus Schaller <hns@goldelico.com>:
> Hi Krzysztof,
>
> Am 18.05.2015 um 14:32 schrieb Krzysztof Kozlowski <k.kozlowski@samsung.com>:
>
>> 2015-05-18 18:40 GMT+09:00 Dr. H. Nikolaus Schaller <hns@goldelico.com>:
>>> Hi,
>>> we tried to upgrade from 4.0 to 4.1-rc3 kernel and ran into a NULL pointer problem within the bq27x00 driver.
>>>
>>> It appears to be introduced by your patch 297d716f6260cc9421d971b124ca196b957ee458
>>>
>>> The problem appears to be that bq27x00_powersupply_init() calls power_supply_register_no_ws() and
>>> sets di->bat *after* return. The old code did pass an uninitialized struct pointer.
>>>
>>> Now for reasons I don’t understand, the power_supply_register_no_ws() appears to call
>>> uevent related stuff which in turn calls bq27x00_battery_get_property() before di->bat
>>> is properly initialized.
>>>
>>> I have checked with printk in bq27x00_battery_get_property() that di>bat == NULL in this case and
>>> right before we see the segfault.
>>>
>>> The old code simply did pass a zeroed struct power_supply and perhaps initialized its components
>>> during registration.
>>>
>>> Returning some -EINVAL if di->bat == NULL would likely solve the NULL pointer dereference but
>>> I don’t know what it does to the uevent and if it restores previous operation.
>>>
>>> It could have been that it was for good purpose that power_supply_register_no_ws() did not return
>>> by value, but by reference to the di->bat struct:
>>>
>>> -       ret = power_supply_register_no_ws(di->dev, &di->bat, NULL);
>>> +       di->bat = power_supply_register_no_ws(di->dev, psy_desc, &psy_cfg);
>>>
>>> So that code called within the context of power_supply_register_no_ws() could already
>>> refer to initialized di->bat.
>>
>> Indeed this issue was not present in previous design however I think
>> the architecture was still error-prone. Starting from driver's probe:
>> - some_driver_probe()
>>   - power_supply_register(&psy)
>>     - device_add()
>>       - kobject_uevent() - notify userspace
>>        - power_supply_uevent()
>>          - power_supply_show_property()
>>            - power_supply_get_property()
>>              - some_driver_get_property()
>>
>> So before probe() ends the power supply core calls driver's
>> get_property(). In that time the driver internal structures may not be
>> ready yet, because the probe did not end.
>
> Yes, that is indeed a problem. Maybe it did work with the bq27x00 driver
> earlier because the call to power_supply_register(&psy) was the last
> activity.

Right, for most of the drivers it is the last part of probe.

>
>> The get_property() could
>> access some registers or other core functions which will be ready
>> after power_supply_register() call. For example the
>> bq27x00_battery_get_property() accesses delayed work which could be
>> (by mistake) not yet initialized.
>>
>> I looked at other dev_uevent implementations and almost all of them do
>> not call back the driver.
>>
>> Of course this does not change that I introduced the issue and I feel
>> bad about it.
>> I got some ideas for resolving it:
>> 1. Fix bq27x00_battery and maybe others so they won't access the 'psy'
>> member of its state container. This does not solve the issue from
>> architectural point of view - still some uninitialised data may be
>> accessed because probe() is in progress. However this would be the
>> fastest and the least intrusive.
>
> The problem might be that it fundamentally changes the driver code
> architecture. For example one call using di->bat is in
>
> bq27x00_battery_status() {
> …
>                 else if (power_supply_am_i_supplied(di->bat))
>                         status = POWER_SUPPLY_STATUS_NOT_CHARGING;
> …
> }
>
>>
>> 2. Remove calls to get_property() (and other functions provided by
>> driver) from power_supply_uevent(). Unfortunately this may break
>> userspace which expects that some data will be present in uevent.
>>
>> 3. Change the API to the previous convention, which you pointed as a remedy:
>> ret = power_supply_register_no_ws(di->dev, &di->bat, psy_desc, &psy_cfg);
>> This also won't solve the issue from 1. that uevent will be called
>> before probe ends.
>
> Well, we could require that power_supply_register_no_ws() is the last
> activity to be done in any_driver_probe().

Some drivers (like drivers/power/lp8727_charger.c) register multiple
power supplies and there can be only one last call. What is even
important in this lp8727 case, is that it registers:
 - a battery,
 - two chargers which supply this battery.

So when power supplies are registered, the delayed work is executed
for each supplied battery. Hopefully the the battery is registered as
last... but I am not quite sure that this is still safe.

>> 4. Block the power_supply_uevent() from calling the get_property()
>> functions until device_add() finishes. This would solve this issue but
>> first uevent messages (from adding device) won't contain all of device
>> data (just like in solution no 2.) and this won't solve other race -
>> someone may call sysfs show() on created device nodes and thus access
>> the get_property() before probe finishes.
>
>>
>> Any ideas?
>
> 5. is it possible to delay the call to kobject_uevent() after some_driver_probe()
> is finished? E.g. by registering a one-shot delayed work?
>
> There seems to be a shared workqueue (mentioned e.g. in
> <http://www.makelinux.net/ldd3/chp-7-sect-6>)
> but that is an area of the kernel I am not familiar with.

Unfortunately there is no guarantee that delayed work will be executed
after probe ends. It should be but... who knows when it be scheduled?

Anyway thanks for reporting the issue and ideas.

Best regards,
Krzysztof

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Bug: power supply - NULL pointer dereference in bq27x00 driver
@ 2015-05-18 14:09       ` Krzysztof Kozlowski
  0 siblings, 0 replies; 12+ messages in thread
From: Krzysztof Kozlowski @ 2015-05-18 14:09 UTC (permalink / raw)
  To: Dr. H. Nikolaus Schaller
  Cc: Krzysztof Kozlowski, Rodolfo Giometti, Sebastian Reichel, LKML,
	Marek Belisko, Linux PM mailing list, Dmitry Eremin-Solenikov,
	David Woodhouse, List for communicating with real GTA04 owners

2015-05-18 22:30 GMT+09:00 Dr. H. Nikolaus Schaller <hns@goldelico.com>:
> Hi Krzysztof,
>
> Am 18.05.2015 um 14:32 schrieb Krzysztof Kozlowski <k.kozlowski@samsung.com>:
>
>> 2015-05-18 18:40 GMT+09:00 Dr. H. Nikolaus Schaller <hns@goldelico.com>:
>>> Hi,
>>> we tried to upgrade from 4.0 to 4.1-rc3 kernel and ran into a NULL pointer problem within the bq27x00 driver.
>>>
>>> It appears to be introduced by your patch 297d716f6260cc9421d971b124ca196b957ee458
>>>
>>> The problem appears to be that bq27x00_powersupply_init() calls power_supply_register_no_ws() and
>>> sets di->bat *after* return. The old code did pass an uninitialized struct pointer.
>>>
>>> Now for reasons I don’t understand, the power_supply_register_no_ws() appears to call
>>> uevent related stuff which in turn calls bq27x00_battery_get_property() before di->bat
>>> is properly initialized.
>>>
>>> I have checked with printk in bq27x00_battery_get_property() that di>bat == NULL in this case and
>>> right before we see the segfault.
>>>
>>> The old code simply did pass a zeroed struct power_supply and perhaps initialized its components
>>> during registration.
>>>
>>> Returning some -EINVAL if di->bat == NULL would likely solve the NULL pointer dereference but
>>> I don’t know what it does to the uevent and if it restores previous operation.
>>>
>>> It could have been that it was for good purpose that power_supply_register_no_ws() did not return
>>> by value, but by reference to the di->bat struct:
>>>
>>> -       ret = power_supply_register_no_ws(di->dev, &di->bat, NULL);
>>> +       di->bat = power_supply_register_no_ws(di->dev, psy_desc, &psy_cfg);
>>>
>>> So that code called within the context of power_supply_register_no_ws() could already
>>> refer to initialized di->bat.
>>
>> Indeed this issue was not present in previous design however I think
>> the architecture was still error-prone. Starting from driver's probe:
>> - some_driver_probe()
>>   - power_supply_register(&psy)
>>     - device_add()
>>       - kobject_uevent() - notify userspace
>>        - power_supply_uevent()
>>          - power_supply_show_property()
>>            - power_supply_get_property()
>>              - some_driver_get_property()
>>
>> So before probe() ends the power supply core calls driver's
>> get_property(). In that time the driver internal structures may not be
>> ready yet, because the probe did not end.
>
> Yes, that is indeed a problem. Maybe it did work with the bq27x00 driver
> earlier because the call to power_supply_register(&psy) was the last
> activity.

Right, for most of the drivers it is the last part of probe.

>
>> The get_property() could
>> access some registers or other core functions which will be ready
>> after power_supply_register() call. For example the
>> bq27x00_battery_get_property() accesses delayed work which could be
>> (by mistake) not yet initialized.
>>
>> I looked at other dev_uevent implementations and almost all of them do
>> not call back the driver.
>>
>> Of course this does not change that I introduced the issue and I feel
>> bad about it.
>> I got some ideas for resolving it:
>> 1. Fix bq27x00_battery and maybe others so they won't access the 'psy'
>> member of its state container. This does not solve the issue from
>> architectural point of view - still some uninitialised data may be
>> accessed because probe() is in progress. However this would be the
>> fastest and the least intrusive.
>
> The problem might be that it fundamentally changes the driver code
> architecture. For example one call using di->bat is in
>
> bq27x00_battery_status() {
> …
>                 else if (power_supply_am_i_supplied(di->bat))
>                         status = POWER_SUPPLY_STATUS_NOT_CHARGING;
> …
> }
>
>>
>> 2. Remove calls to get_property() (and other functions provided by
>> driver) from power_supply_uevent(). Unfortunately this may break
>> userspace which expects that some data will be present in uevent.
>>
>> 3. Change the API to the previous convention, which you pointed as a remedy:
>> ret = power_supply_register_no_ws(di->dev, &di->bat, psy_desc, &psy_cfg);
>> This also won't solve the issue from 1. that uevent will be called
>> before probe ends.
>
> Well, we could require that power_supply_register_no_ws() is the last
> activity to be done in any_driver_probe().

Some drivers (like drivers/power/lp8727_charger.c) register multiple
power supplies and there can be only one last call. What is even
important in this lp8727 case, is that it registers:
 - a battery,
 - two chargers which supply this battery.

So when power supplies are registered, the delayed work is executed
for each supplied battery. Hopefully the the battery is registered as
last... but I am not quite sure that this is still safe.

>> 4. Block the power_supply_uevent() from calling the get_property()
>> functions until device_add() finishes. This would solve this issue but
>> first uevent messages (from adding device) won't contain all of device
>> data (just like in solution no 2.) and this won't solve other race -
>> someone may call sysfs show() on created device nodes and thus access
>> the get_property() before probe finishes.
>
>>
>> Any ideas?
>
> 5. is it possible to delay the call to kobject_uevent() after some_driver_probe()
> is finished? E.g. by registering a one-shot delayed work?
>
> There seems to be a shared workqueue (mentioned e.g. in
> <http://www.makelinux.net/ldd3/chp-7-sect-6>)
> but that is an area of the kernel I am not familiar with.

Unfortunately there is no guarantee that delayed work will be executed
after probe ends. It should be but... who knows when it be scheduled?

Anyway thanks for reporting the issue and ideas.

Best regards,
Krzysztof

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Bug: power supply - NULL pointer dereference in bq27x00 driver
  2015-05-18 14:09       ` Krzysztof Kozlowski
@ 2015-05-18 16:03         ` Dr. H. Nikolaus Schaller
  -1 siblings, 0 replies; 12+ messages in thread
From: Dr. H. Nikolaus Schaller @ 2015-05-18 16:03 UTC (permalink / raw)
  To: Krzysztof Kozlowski
  Cc: Rodolfo Giometti, Sebastian Reichel, LKML, Marek Belisko,
	Linux PM mailing list, Dmitry Eremin-Solenikov, David Woodhouse,
	List for communicating with real GTA04 owners


Am 18.05.2015 um 16:09 schrieb Krzysztof Kozlowski <k.kozlowski@samsung.com>:

> 2015-05-18 22:30 GMT+09:00 Dr. H. Nikolaus Schaller <hns@goldelico.com>:
>> Hi Krzysztof,
>> 
>> Am 18.05.2015 um 14:32 schrieb Krzysztof Kozlowski <k.kozlowski@samsung.com>:
>> 
>>> 2015-05-18 18:40 GMT+09:00 Dr. H. Nikolaus Schaller <hns@goldelico.com>:
>>>> Hi,
>>>> we tried to upgrade from 4.0 to 4.1-rc3 kernel and ran into a NULL pointer problem within the bq27x00 driver.
>>>> 
>>>> It appears to be introduced by your patch 297d716f6260cc9421d971b124ca196b957ee458
>>>> 
>>>> The problem appears to be that bq27x00_powersupply_init() calls power_supply_register_no_ws() and
>>>> sets di->bat *after* return. The old code did pass an uninitialized struct pointer.
>>>> 
>>>> Now for reasons I don’t understand, the power_supply_register_no_ws() appears to call
>>>> uevent related stuff which in turn calls bq27x00_battery_get_property() before di->bat
>>>> is properly initialized.
>>>> 
>>>> I have checked with printk in bq27x00_battery_get_property() that di>bat == NULL in this case and
>>>> right before we see the segfault.
>>>> 
>>>> The old code simply did pass a zeroed struct power_supply and perhaps initialized its components
>>>> during registration.
>>>> 
>>>> Returning some -EINVAL if di->bat == NULL would likely solve the NULL pointer dereference but
>>>> I don’t know what it does to the uevent and if it restores previous operation.
>>>> 
>>>> It could have been that it was for good purpose that power_supply_register_no_ws() did not return
>>>> by value, but by reference to the di->bat struct:
>>>> 
>>>> -       ret = power_supply_register_no_ws(di->dev, &di->bat, NULL);
>>>> +       di->bat = power_supply_register_no_ws(di->dev, psy_desc, &psy_cfg);
>>>> 
>>>> So that code called within the context of power_supply_register_no_ws() could already
>>>> refer to initialized di->bat.
>>> 
>>> Indeed this issue was not present in previous design however I think
>>> the architecture was still error-prone. Starting from driver's probe:
>>> - some_driver_probe()
>>>  - power_supply_register(&psy)
>>>    - device_add()
>>>      - kobject_uevent() - notify userspace
>>>       - power_supply_uevent()
>>>         - power_supply_show_property()
>>>           - power_supply_get_property()
>>>             - some_driver_get_property()
>>> 
>>> So before probe() ends the power supply core calls driver's
>>> get_property(). In that time the driver internal structures may not be
>>> ready yet, because the probe did not end.
>> 
>> Yes, that is indeed a problem. Maybe it did work with the bq27x00 driver
>> earlier because the call to power_supply_register(&psy) was the last
>> activity.
> 
> Right, for most of the drivers it is the last part of probe.
> 
>> 
>>> The get_property() could
>>> access some registers or other core functions which will be ready
>>> after power_supply_register() call. For example the
>>> bq27x00_battery_get_property() accesses delayed work which could be
>>> (by mistake) not yet initialized.
>>> 
>>> I looked at other dev_uevent implementations and almost all of them do
>>> not call back the driver.
>>> 
>>> Of course this does not change that I introduced the issue and I feel
>>> bad about it.
>>> I got some ideas for resolving it:
>>> 1. Fix bq27x00_battery and maybe others so they won't access the 'psy'
>>> member of its state container. This does not solve the issue from
>>> architectural point of view - still some uninitialised data may be
>>> accessed because probe() is in progress. However this would be the
>>> fastest and the least intrusive.
>> 
>> The problem might be that it fundamentally changes the driver code
>> architecture. For example one call using di->bat is in
>> 
>> bq27x00_battery_status() {
>> …
>>                else if (power_supply_am_i_supplied(di->bat))
>>                        status = POWER_SUPPLY_STATUS_NOT_CHARGING;
>> …
>> }
>> 
>>> 
>>> 2. Remove calls to get_property() (and other functions provided by
>>> driver) from power_supply_uevent(). Unfortunately this may break
>>> userspace which expects that some data will be present in uevent.
>>> 
>>> 3. Change the API to the previous convention, which you pointed as a remedy:
>>> ret = power_supply_register_no_ws(di->dev, &di->bat, psy_desc, &psy_cfg);
>>> This also won't solve the issue from 1. that uevent will be called
>>> before probe ends.
>> 
>> Well, we could require that power_supply_register_no_ws() is the last
>> activity to be done in any_driver_probe().
> 
> Some drivers (like drivers/power/lp8727_charger.c) register multiple
> power supplies and there can be only one last call. What is even
> important in this lp8727 case, is that it registers:
> - a battery,
> - two chargers which supply this battery.
> 
> So when power supplies are registered, the delayed work is executed
> for each supplied battery. Hopefully the the battery is registered as
> last... but I am not quite sure that this is still safe.
> 
>>> 4. Block the power_supply_uevent() from calling the get_property()
>>> functions until device_add() finishes. This would solve this issue but
>>> first uevent messages (from adding device) won't contain all of device
>>> data (just like in solution no 2.) and this won't solve other race -
>>> someone may call sysfs show() on created device nodes and thus access
>>> the get_property() before probe finishes.
>> 
>>> 
>>> Any ideas?
>> 
>> 5. is it possible to delay the call to kobject_uevent() after some_driver_probe()
>> is finished? E.g. by registering a one-shot delayed work?
>> 
>> There seems to be a shared workqueue (mentioned e.g. in
>> <http://www.makelinux.net/ldd3/chp-7-sect-6>)
>> but that is an area of the kernel I am not familiar with.
> 
> Unfortunately there is no guarantee that delayed work will be executed
> after probe ends. It should be but… who knows when it be scheduled?

Ah, I see. It would be necessary to schedule the worker thread only after
probing is successful. So this would deeply go into general driver probing
mechanisms. Or it would need a “call me as soon as my probe function
has successfully completed” callback that a driver can register somewhere.

> Anyway thanks for reporting the issue and ideas.

Please let me know if we can test something.

BR and thanks,
Nikolaus


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Bug: power supply - NULL pointer dereference in bq27x00 driver
@ 2015-05-18 16:03         ` Dr. H. Nikolaus Schaller
  0 siblings, 0 replies; 12+ messages in thread
From: Dr. H. Nikolaus Schaller @ 2015-05-18 16:03 UTC (permalink / raw)
  To: Krzysztof Kozlowski
  Cc: Rodolfo Giometti, Sebastian Reichel, LKML, Marek Belisko,
	Linux PM mailing list, Dmitry Eremin-Solenikov, David Woodhouse,
	List for communicating with real GTA04 owners


Am 18.05.2015 um 16:09 schrieb Krzysztof Kozlowski <k.kozlowski@samsung.com>:

> 2015-05-18 22:30 GMT+09:00 Dr. H. Nikolaus Schaller <hns@goldelico.com>:
>> Hi Krzysztof,
>> 
>> Am 18.05.2015 um 14:32 schrieb Krzysztof Kozlowski <k.kozlowski@samsung.com>:
>> 
>>> 2015-05-18 18:40 GMT+09:00 Dr. H. Nikolaus Schaller <hns@goldelico.com>:
>>>> Hi,
>>>> we tried to upgrade from 4.0 to 4.1-rc3 kernel and ran into a NULL pointer problem within the bq27x00 driver.
>>>> 
>>>> It appears to be introduced by your patch 297d716f6260cc9421d971b124ca196b957ee458
>>>> 
>>>> The problem appears to be that bq27x00_powersupply_init() calls power_supply_register_no_ws() and
>>>> sets di->bat *after* return. The old code did pass an uninitialized struct pointer.
>>>> 
>>>> Now for reasons I don’t understand, the power_supply_register_no_ws() appears to call
>>>> uevent related stuff which in turn calls bq27x00_battery_get_property() before di->bat
>>>> is properly initialized.
>>>> 
>>>> I have checked with printk in bq27x00_battery_get_property() that di>bat == NULL in this case and
>>>> right before we see the segfault.
>>>> 
>>>> The old code simply did pass a zeroed struct power_supply and perhaps initialized its components
>>>> during registration.
>>>> 
>>>> Returning some -EINVAL if di->bat == NULL would likely solve the NULL pointer dereference but
>>>> I don’t know what it does to the uevent and if it restores previous operation.
>>>> 
>>>> It could have been that it was for good purpose that power_supply_register_no_ws() did not return
>>>> by value, but by reference to the di->bat struct:
>>>> 
>>>> -       ret = power_supply_register_no_ws(di->dev, &di->bat, NULL);
>>>> +       di->bat = power_supply_register_no_ws(di->dev, psy_desc, &psy_cfg);
>>>> 
>>>> So that code called within the context of power_supply_register_no_ws() could already
>>>> refer to initialized di->bat.
>>> 
>>> Indeed this issue was not present in previous design however I think
>>> the architecture was still error-prone. Starting from driver's probe:
>>> - some_driver_probe()
>>>  - power_supply_register(&psy)
>>>    - device_add()
>>>      - kobject_uevent() - notify userspace
>>>       - power_supply_uevent()
>>>         - power_supply_show_property()
>>>           - power_supply_get_property()
>>>             - some_driver_get_property()
>>> 
>>> So before probe() ends the power supply core calls driver's
>>> get_property(). In that time the driver internal structures may not be
>>> ready yet, because the probe did not end.
>> 
>> Yes, that is indeed a problem. Maybe it did work with the bq27x00 driver
>> earlier because the call to power_supply_register(&psy) was the last
>> activity.
> 
> Right, for most of the drivers it is the last part of probe.
> 
>> 
>>> The get_property() could
>>> access some registers or other core functions which will be ready
>>> after power_supply_register() call. For example the
>>> bq27x00_battery_get_property() accesses delayed work which could be
>>> (by mistake) not yet initialized.
>>> 
>>> I looked at other dev_uevent implementations and almost all of them do
>>> not call back the driver.
>>> 
>>> Of course this does not change that I introduced the issue and I feel
>>> bad about it.
>>> I got some ideas for resolving it:
>>> 1. Fix bq27x00_battery and maybe others so they won't access the 'psy'
>>> member of its state container. This does not solve the issue from
>>> architectural point of view - still some uninitialised data may be
>>> accessed because probe() is in progress. However this would be the
>>> fastest and the least intrusive.
>> 
>> The problem might be that it fundamentally changes the driver code
>> architecture. For example one call using di->bat is in
>> 
>> bq27x00_battery_status() {
>> …
>>                else if (power_supply_am_i_supplied(di->bat))
>>                        status = POWER_SUPPLY_STATUS_NOT_CHARGING;
>> …
>> }
>> 
>>> 
>>> 2. Remove calls to get_property() (and other functions provided by
>>> driver) from power_supply_uevent(). Unfortunately this may break
>>> userspace which expects that some data will be present in uevent.
>>> 
>>> 3. Change the API to the previous convention, which you pointed as a remedy:
>>> ret = power_supply_register_no_ws(di->dev, &di->bat, psy_desc, &psy_cfg);
>>> This also won't solve the issue from 1. that uevent will be called
>>> before probe ends.
>> 
>> Well, we could require that power_supply_register_no_ws() is the last
>> activity to be done in any_driver_probe().
> 
> Some drivers (like drivers/power/lp8727_charger.c) register multiple
> power supplies and there can be only one last call. What is even
> important in this lp8727 case, is that it registers:
> - a battery,
> - two chargers which supply this battery.
> 
> So when power supplies are registered, the delayed work is executed
> for each supplied battery. Hopefully the the battery is registered as
> last... but I am not quite sure that this is still safe.
> 
>>> 4. Block the power_supply_uevent() from calling the get_property()
>>> functions until device_add() finishes. This would solve this issue but
>>> first uevent messages (from adding device) won't contain all of device
>>> data (just like in solution no 2.) and this won't solve other race -
>>> someone may call sysfs show() on created device nodes and thus access
>>> the get_property() before probe finishes.
>> 
>>> 
>>> Any ideas?
>> 
>> 5. is it possible to delay the call to kobject_uevent() after some_driver_probe()
>> is finished? E.g. by registering a one-shot delayed work?
>> 
>> There seems to be a shared workqueue (mentioned e.g. in
>> <http://www.makelinux.net/ldd3/chp-7-sect-6>)
>> but that is an area of the kernel I am not familiar with.
> 
> Unfortunately there is no guarantee that delayed work will be executed
> after probe ends. It should be but… who knows when it be scheduled?

Ah, I see. It would be necessary to schedule the worker thread only after
probing is successful. So this would deeply go into general driver probing
mechanisms. Or it would need a “call me as soon as my probe function
has successfully completed” callback that a driver can register somewhere.

> Anyway thanks for reporting the issue and ideas.

Please let me know if we can test something.

BR and thanks,
Nikolaus


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2015-05-18 16:04 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-18  9:40 Bug: power supply - NULL pointer dereference in bq27x00 driver Dr. H. Nikolaus Schaller
2015-05-18  9:40 ` Dr. H. Nikolaus Schaller
2015-05-18 12:32 ` Krzysztof Kozlowski
2015-05-18 12:32   ` Krzysztof Kozlowski
2015-05-18 13:30   ` Dr. H. Nikolaus Schaller
2015-05-18 13:30     ` Dr. H. Nikolaus Schaller
2015-05-18 14:09     ` Krzysztof Kozlowski
2015-05-18 14:09       ` Krzysztof Kozlowski
2015-05-18 16:03       ` Dr. H. Nikolaus Schaller
2015-05-18 16:03         ` Dr. H. Nikolaus Schaller
2015-05-18 13:32   ` Krzysztof Kozlowski
2015-05-18 13:32     ` Krzysztof Kozlowski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.