linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* BUG: drivers/pinctrl/core: races in pinctrl_groups and deferred probing
@ 2018-06-13 12:39 H. Nikolaus Schaller
  2018-06-14 12:01 ` Tony Lindgren
  0 siblings, 1 reply; 22+ messages in thread
From: H. Nikolaus Schaller @ 2018-06-13 12:39 UTC (permalink / raw)
  To: Linus Walleij
  Cc: open list:GPIO SUBSYSTEM, Linux Kernel Mailing List,
	Discussions about the Letux Kernel, Tony Lindgren, kernel

Hi Linus,
we have sporadically observed this bug on OMAP3 and OMAP5 machines during boot:

> [    5.988882] Unable to handle kernel NULL pointer dereference at virtual address 00000000
> [    6.003089] pgd = (ptrval)
> [    6.011859] [00000000] *pgd=ad268003, *pmd=27f089003
> [    6.025112] ehci-omap 4a064c00.ehci: irq 146, io mem 0x4a064c00
> [    6.032874] omapdss: unknown parameter 'def_disp' ignored
> [    6.040373] bq24296_get_vsys_voltage(0)
> [    6.052902] Internal error: Oops: 207 [#1] PREEMPT SMP ARM
> [    6.058661] Modules linked in: encoder_tpd12s015(+) pwm_omap_dmtimer(+) connector_hdmi(+) omapdss(+) generic_adc_battery pwm_bl(+) omapdss_base cec ehci_omap(+) wlcore_sdio dwc3_omap leds_is31fl319x crtouch_mt tsc2007 bmp280_spi bq2429x_charger(+) bq27xxx_battery_i2c bq27xxx_battery ina2xx tca8418_keypad as5013(+) gpio_twl6040 twl6040_vibra palmas_pwrbutton palmas_gpadc bmc150_magn_i2c bmc150_accel_i2c usb3503 bmp280_i2c bmc150_accel_core bmc150_magn bmp280 industrialio_triggered_buffer w2cbw003_bluetooth bno055 kfifo_buf industrialio snd_soc_omap_mcpdm snd_soc_omap_mcbsp snd_soc_omap snd_pcm_dmaengine
> [    6.114731] CPU: 0 PID: 1307 Comm: udevd Tainted: G        W         4.17.0-letux-lpae+ #2406
> [    6.123661] Hardware name: Generic OMAP5 (Flattened Device Tree)
> [    6.129962] PC is at strcmp+0x0/0x34
> [    6.133716] LR is at pinctrl_get_group_selector+0x44/0x78
> [    6.139362] pc : [<c0849b98>]    lr : [<c0526260>]    psr: a00e0013
> [    6.145920] sp : ed26dd00  ip : ed3ff390  fp : 00000019
> [    6.151389] r10: 00000018  r9 : ed2dfb40  r8 : 00000019
> [    6.156866] r7 : c088aba8  r6 : ebfcdb84  r5 : ee55f880  r4 : 00000017
> [    6.163703] r3 : c05244a4  r2 : 00000000  r1 : ebfcdb84  r0 : 00000000
> [    6.170548] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
> [    6.178026] Control: 30c5387d  Table: ad245b80  DAC: fffffffd
> [    6.184053] Process udevd (pid: 1307, stack limit = 0x(ptrval))
> [    6.190258] Stack: (0xed26dd00 to 0xed26e000)
> [    6.194823] dd00: ee55f880 ebfcdb84 00000000 ed0e7810 00000000 c0527220 ed2dfd90 00000001
> [    6.203402] dd20: 00000002 ed2dfdc0 ed2dfb80 ed2dfbc0 ed0e7810 00000000 00000000 ed2dfb40
> [    6.211977] dd40: 00000000 c0525a88 00000014 ee263680 c0e4bdf0 c09d4e04 ee22b810 00000000
> [    6.220557] dd60: ed2dfe10 ee22b810 c0ec3aec fffffdfb bf20a014 0000002f 00000000 c0525c50
> [    6.229135] dd80: 00000000 ee22b810 ed2dfe50 c05ead24 ee22b810 00000000 c0ec3af0 c05cbd88
> [    6.237710] dda0: ee22b810 ee22b844 bf20a014 c0e57bf0 bf20a180 bf20a1b0 00000000 c05cc038
> [    6.246275] ddc0: ee22b810 bf20a014 c05cbfb8 c05ca51c ee0dde58 ee1f5434 bf20a014 00000000
> [    6.254856] dde0: ed15df80 c05cb3b4 bf2092de bf2092df 00000000 bf20a014 c0e8a360 bf20d000
> [    6.263432] de00: 00000000 c05ccbdc c05cd400 bf20a080 c0e8a360 c0202c74 014000c0 c0840fd4
> [    6.272006] de20: bf20a0c8 ed3ff688 ed3ff5c0 c09d32e4 ed313ec0 00000000 00000000 ed313e80
> [    6.280582] de40: ed313e80 a00f0113 ed313140 ed313ec0 ee000000 014000c0 bf20a080 ed26df50
> [    6.289157] de60: bf20a08c bf20a080 ed26df50 ed313140 00000000 c02a098c bf20a080 ed313140
> [    6.297739] de80: bf20a080 ed26df50 bf20a08c c029f7d4 ffff8000 00007fff bf20a080 c029cb04
> [    6.306309] dea0: 00000064 c085b2f4 f0b05460 b6dc49f8 00000000 00000003 ed1d8240 000024b0
> [    6.314875] dec0: 00000000 c0347e60 00000003 00000000 00000000 00000000 00000000 00000000
> [    6.323454] dee0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> [    6.332028] df00: 00000000 00000000 7fffffff 00000000 b6dc49f8 00000006 0000017b c02011e4
> [    6.340601] df20: ed26c000 00000000 000545c0 c029fb40 7fffffff 00000000 00000003 c03104bc
> [    6.349179] df40: 00000002 f0b03000 000024b0 00000000 f0b03c20 f0b03000 000024b0 f0b04fb0
> [    6.357754] df60: f0b04077 f0b04800 00003000 00003120 00000000 00000000 00000000 00001b08
> [    6.366323] df80: 0000001e 0000001f 00000016 00000000 00000012 00000000 00000000 bed4e004
> [    6.374899] dfa0: 00000000 c0201000 00000000 bed4e004 00000006 b6dc49f8 00000000 00000000
> [    6.383480] dfc0: 00000000 bed4e004 00000000 0000017b 00020000 00037f78 00050048 000545c0
> [    6.392056] dfe0: bed4dee0 bed4ded0 b6dbec4b b6ec9a42 800f0030 00000006 00000000 00000000
> [    6.400640] [<c0849b98>] (strcmp) from [<c0526260>] (pinctrl_get_group_selector+0x44/0x78)
> [    6.409317] [<c0526260>] (pinctrl_get_group_selector) from [<c0527220>] (pinmux_map_to_setting+0x158/0x1a0)
> [    6.419533] [<c0527220>] (pinmux_map_to_setting) from [<c0525a88>] (create_pinctrl+0x1f0/0x2f8)
> [    6.428664] [<c0525a88>] (create_pinctrl) from [<c0525c50>] (devm_pinctrl_get+0x2c/0x6c)
> [    6.437146] [<c0525c50>] (devm_pinctrl_get) from [<c05ead24>] (pinctrl_bind_pins+0x3c/0x138)
> [    6.446008] [<c05ead24>] (pinctrl_bind_pins) from [<c05cbd88>] (driver_probe_device+0xe8/0x318)
> [    6.455131] [<c05cbd88>] (driver_probe_device) from [<c05cc038>] (__driver_attach+0x80/0xa4)
> [    6.463983] [<c05cc038>] (__driver_attach) from [<c05ca51c>] (bus_for_each_dev+0x58/0x7c)
> [    6.472556] [<c05ca51c>] (bus_for_each_dev) from [<c05cb3b4>] (bus_add_driver+0xcc/0x1e0)
> [    6.481143] [<c05cb3b4>] (bus_add_driver) from [<c05ccbdc>] (driver_register+0x9c/0xe0)
> [    6.489545] [<c05ccbdc>] (driver_register) from [<c0202c74>] (do_one_initcall+0xb4/0x248)
> [    6.498136] [<c0202c74>] (do_one_initcall) from [<c02a098c>] (do_init_module+0x58/0x1d0)
> [    6.506614] [<c02a098c>] (do_init_module) from [<c029f7d4>] (load_module+0xe04/0xfb0)
> [    6.514821] [<c029f7d4>] (load_module) from [<c029fb40>] (sys_finit_module+0x88/0x90)
> [    6.523041] [<c029fb40>] (sys_finit_module) from [<c0201000>] (ret_fast_syscall+0x0/0x4c)
> [    6.531604] Exception stack(0xed26dfa8 to 0xed26dff0)
> [    6.536899] dfa0:                   00000000 bed4e004 00000006 b6dc49f8 00000000 00000000
> [    6.545472] dfc0: 00000000 bed4e004 00000000 0000017b 00020000 00037f78 00050048 000545c0
> [    6.554037] dfe0: bed4dee0 bed4ded0 b6dbec4b b6ec9a42
> [    6.559339] Code: e3520000 e5e32001 1afffffb e12fff1e (e4d03001) 
> [    6.567709] ---[ end trace 1a9ebd5c1c7fe0e4 ]---

It is initially triggered by udev loading and probing a module.

Here is another bug trace:

> [   13.671844] Unable to handle kernel NULL pointer dereference at virtual address 00000000
> [   13.693389] pgd = (ptrval)
> [   13.696380] [00000000] *pgd=00000000
> [   13.700134] Internal error: Oops: 17 [#1] PREEMPT SMP ARM
> [   13.705780] Modules linked in: snd_soc_gtm601 pwm_omap_dmtimer connector_analog_tv generic_adc_battery pwm_bl bq27xxx_battery_hdq bq27xxx_battery omap3_isp(+) videobuf2_dma_contig videobuf2_memops videobuf2_v4l2 videobuf2_common omap_hdq omap2430 snd_soc_omap_mcbsp(+) snd_soc_omap snd_pcm_dmaengine bmp280_i2c(+) bmp280 ov9655(+) v4l2_fwnode v4l2_common itg3200 at24 videodev phy_twl4030_usb tsc2007 hmc5843_i2c hmc5843_core bma180 musb_hdrc industrialio_triggered_buffer lis3lv02d_i2c media leds_tca6507 lis3lv02d kfifo_buf input_polldev gpio_twl4030 snd_soc_twl4030 twl4030_vibra twl4030_charger twl4030_pwrbutton twl4030_madc industrialio w2sg0004 gps_core w2cbw003_bluetooth ehci_omap omapdss omapdss_base cec
> [   13.771331] CPU: 0 PID: 937 Comm: kworker/0:2 Not tainted 4.17.0-rc2-letux+ #2290
> [   13.779174] Hardware name: Generic OMAP36xx (Flattened Device Tree)
> [   13.785736] Workqueue: events deferred_probe_work_func
> [   13.791107] PC is at strcmp+0x0/0x34
> [   13.794860] LR is at pinctrl_get_group_selector+0x6c/0xa8
> [   13.800506] pc : [<c0705afc>]    lr : [<c0429840>]    psr: 600f0013
> [   13.807067] sp : dddf9e08  ip : 00000000  fp : 00000011
> [   13.812530] r10: 0000000f  r9 : 00000010  r8 : c0745b0c
> [   13.817993] r7 : 00000000  r6 : ddc41600  r5 : df9c5274  r4 : 0000000e
> [   13.824829] r3 : 600f0013  r2 : 00000002  r1 : df9c5274  r0 : 00000000
> [   13.831665] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
> [   13.839111] Control: 10c5387d  Table: 9c88c019  DAC: 00000051
> [   13.845123] Process kworker/0:2 (pid: 937, stack limit = 0x(ptrval))
> [   13.851776] Stack: (0xdddf9e08 to 0xdddfa000)
> [   13.856323] 9e00:                   dddf9e2c ddc41600 df9c5274 00000000 dcc99f10 00000000
> [   13.864868] 9e20: dcc98b80 c042a808 dcc84110 00000001 00000002 dcc98e40 dcc5bd80 dcc85f40
> [   13.873413] 9e40: dcc99f10 00000000 00000000 dcc98b80 00000000 c0429040 00000014 dda424c0
> [   13.881988] 9e60: c0a58df0 c0876b8c dda47c10 00000000 dca09110 dda47c10 c0aca804 fffffdfb
> [   13.890563] 9e80: bf282014 0000002c c0a94390 c0429208 00000000 dda47c10 dca09150 c04bb85c
> [   13.899108] 9ea0: dda47c10 00000000 c0aca808 c049fb58 00000000 dddf9ee8 c049fe64 dda47c44
> [   13.907714] 9ec0: df9bbe00 c0a02d00 00000000 c049e3b8 dd81be6c dcc3d738 dda47c10 c0a63258
> [   13.916259] 9ee0: 00000001 c049f9d8 dda47c10 00000001 00000000 dda47c10 c0a63258 dda47c10
> [   13.924804] 9f00: c0a9dba0 c049ef94 dda47c10 c0a63044 c0a63060 c049f4a4 c049f3b8 dde97900
> [   13.933380] 9f20: c0a63078 df9b8c00 00000000 df9bbe00 c0a02d00 c0145dac dde97900 c0a63078
> [   13.941955] 9f40: ffff9023 dde97900 df9b8c00 df9b8c00 dddf8000 df9b8c18 c0a02d00 dde97918
> [   13.950500] 9f60: 00000008 c0146580 dda48240 dde09ec0 dde16640 00000000 dde97900 c01462c0
> [   13.959075] 9f80: dd8b1ef0 dde09edc 00000000 c014a598 dde16640 c014a464 00000000 00000000
> [   13.967620] 9fa0: 00000000 00000000 00000000 c01010e8 00000000 00000000 00000000 00000000
> [   13.976196] 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> [   13.984741] 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
> [   13.993316] [<c0705afc>] (strcmp) from [<c0429840>] (pinctrl_get_group_selector+0x6c/0xa8)
> [   14.001953] [<c0429840>] (pinctrl_get_group_selector) from [<c042a808>] (pinmux_map_to_setting+0x158/0x1a0)
> [   14.012145] [<c042a808>] (pinmux_map_to_setting) from [<c0429040>] (create_pinctrl+0x1f0/0x2f8)
> [   14.021240] [<c0429040>] (create_pinctrl) from [<c0429208>] (devm_pinctrl_get+0x2c/0x6c)
> [   14.029724] [<c0429208>] (devm_pinctrl_get) from [<c04bb85c>] (pinctrl_bind_pins+0x3c/0x138)
> [   14.038574] [<c04bb85c>] (pinctrl_bind_pins) from [<c049fb58>] (driver_probe_device+0xe8/0x318)
> [   14.047668] [<c049fb58>] (driver_probe_device) from [<c049e3b8>] (bus_for_each_drv+0x84/0x94)
> [   14.056610] [<c049e3b8>] (bus_for_each_drv) from [<c049f9d8>] (__device_attach+0x88/0xfc)
> [   14.065155] [<c049f9d8>] (__device_attach) from [<c049ef94>] (bus_probe_device+0x28/0x80)
> [   14.073730] [<c049ef94>] (bus_probe_device) from [<c049f4a4>] (deferred_probe_work_func+0xec/0x120)
> [   14.083221] [<c049f4a4>] (deferred_probe_work_func) from [<c0145dac>] (process_one_work+0x244/0x464)
> [   14.092803] [<c0145dac>] (process_one_work) from [<c0146580>] (worker_thread+0x2c0/0x3ec)
> [   14.101348] [<c0146580>] (worker_thread) from [<c014a598>] (kthread+0x134/0x150)
> [   14.109100] [<c014a598>] (kthread) from [<c01010e8>] (ret_from_fork+0x14/0x2c)
> [   14.116638] Exception stack(0xdddf9fb0 to 0xdddf9ff8)
> [   14.121917] 9fa0:                                     00000000 00000000 00000000 00000000
> [   14.130462] 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> [   14.139038] 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000
> [   14.145935] Code: e3520000 e5e32001 1afffffb e12fff1e (e4d03001) 
> [   14.299560] omap-mcbsp 49022000.mcbsp: ASoC: Failed to create component debugfs directory

This time it comes from the deferred_probe_work_func().

It does not happen always and depends on which drivers are auto-probed from device tree and
in which sequence (which depends a little on random hardware influences).

Now my theory, after inserting some printk and digging through the code, is:
* the pinctrl-group handling is not thread safe but udev module probing and deferred_probe_work_func
  can run cuncurrently
* the radix tree (pin_group_tree) gets garbled (and is maybe not even the correct data structure)

Here are some details of my thoughts:

First of all, there is a warning that radix trees do not do any locking internally:

https://lwn.net/Articles/175432/

"One should note that none of the radix tree functions perform any sort of locking internally.
It is up to the caller to ensure that multiple threads do not corrupt the tree or get into
other sorts of unpleasant race conditions."

Now if I look into pinctrl_generic_add_group() and pinctrl_generic_get_group_name(),
pctldev->num_groups++ is not protected if pinctrl_generic_add_group() may be called by
two threads in parallel for the same pctldev. Hence a second thread may try to insert
a different node into the radix tree at the same selector index. This fails but there
is no error check - and the second entry is completely missing (but probably assumed to
be there).

Even worse in practise (examples above) seems to be that a driver failing to probe
(-EPROBE_DEFER) should do a call to pinctrl_generic_remove_group() in addition to an initial
pinctrl_generic_add_group(). This may happen multiple times in succession until the
driver finally probes successfully. Each time, num_groups is incremented/decremented.
So if two threads are working on driver probing the pctldev->num_groups may go back
and forward at unexpected times changing the insertion point of the other thread.

[Note: I could not verify that pinctrl_generic_remove_group() is ever called].

Another observation is that the code seems to assume that the selector numbers are
always in sequence and only the last one added is deleted by pinctrl_generic_remove_group().

But if I assume that a deferred driver waits quite a while before deciding to -EPROBE_DEFER,
other drivers may have been probed successfully by the other thread and have incremented
num_groups by more than 1 . Therefore the driver calling pinctrl_generic_remove_group() is
no longer the last one added.

In other words, the assumption seems to be that a radix-tree is an array where deleting
one entry in the middle reduces the index of the following ones and the reduced counter
gives the index of the first free index. But it is a sparse array and indexes are not
modified by insert/remove.

The result is that if several drivers are concurrently being probed (deferred_probe_work_func
executed concurrently to udevd probing), the assignment of the selector (aka radix tree
slot) defined by pctldev->num_groups may become wrong.

So entries may get lost and return NULL for gname in pinctrl_get_group_selector()
and as a result we see the strcmp(NULL). This is my theory about these kernel oopses.

Unfortunately I have no good idea how a solution could look like. One thing is
to add locking but there is a note in the comments of the pinctrl_generic functions
and there is mutex_lock(&pinctrl_maps_mutex); So there is something, but I am not
sure if all call paths are locked.

The other is to get the index/selection assignment right (which may even fail with
proper locking of the radix tree), maybe by using a different data structure than
a radix-tree.

Another observation after adding a printk to pinctrl_generic_add_group() is:

root@letux:~# dmesg|fgrep pinctrl_generic_add_group
[    0.749481] pinctrl_generic_add_group: pctldev=ee33a200 name=pinmux_hsusb2_pins selector=0
[    0.751464] pinctrl_generic_add_group: pctldev=ee33a000 name=pinmux_hsusb2_2_pins selector=0
[    0.752380] pinctrl_generic_add_group: pctldev=ee3f1e80 name=pinmux_mcbsp1_devconf0_pins selector=0
[    0.753143] pinctrl_generic_add_group: pctldev=ee3f1d80 name=pinmux_tv_acbias_devconf1_pins selector=0
[    0.770294] pinctrl_generic_add_group: pctldev=ee33a200 name=pinmux_uart1_pins selector=1
[    0.772094] pinctrl_generic_add_group: pctldev=ee33a200 name=pinmux_uart2_pins selector=2
[    0.773254] pinctrl_generic_add_group: pctldev=ee33a200 name=pinmux_uart3_pins selector=3
[    1.800659] pinctrl_generic_add_group: pctldev=ee33a200 name=pinmux_twl4030_pins selector=4
[    1.809997] pinctrl_generic_add_group: pctldev=ee33a080 name=pinmux_twl4030_vpins selector=0
[    1.972412] pinctrl_generic_add_group: pctldev=ee33a000 name=spi_gpio_pinmux selector=1
[    2.167144] pinctrl_generic_add_group: pctldev=ee33a200 name=pinmux_mmc1_pins selector=5
[    2.204040] pinctrl_generic_add_group: pctldev=ee33a200 name=pinmux_wlan_irq_pin selector=6
[    2.379150] pinctrl_generic_add_group: pctldev=ee33a200 name=pinmux_gpmc_pins selector=7
[    2.774627] pinctrl_generic_add_group: pctldev=ee33a200 name=pinmux_wlan_pins selector=8
[    2.791015] pinctrl_generic_add_group: pctldev=ee33a200 name=pinmux_bt_pins selector=9
[    2.806610] pinctrl_generic_add_group: pctldev=ee33a200 name=pinmux_wlan_irq_pin selector=10
[    5.240020] pinctrl_generic_add_group: pctldev=ee33a200 name=pinmux_dss_dpi_pins selector=11
[    5.874664] pinctrl_generic_add_group: pctldev=ee33a200 name=pinmux_mcbsp1_pins selector=12
[    5.988159] pinctrl_generic_add_group: pctldev=ee33a200 name=pinmux_mcbsp2_pins selector=13
[    5.997283] pinctrl_generic_add_group: pctldev=ee33a200 name=pinmux_mcbsp3_pins selector=14
[    6.089141] pinctrl_generic_add_group: pctldev=ee33a200 name=pinmux_penirq_pins selector=14
[    6.149841] pinctrl_generic_add_group: pctldev=ee33a200 name=pinmux_mcbsp4_pins selector=15
[    6.205993] pinctrl_generic_add_group: pctldev=ee33a200 name=hdq_pins selector=17
[    6.390899] pinctrl_generic_add_group: pctldev=ee33a200 name=pinmux_camera_pins selector=18
[    6.580169] pinctrl_generic_add_group: pctldev=ee33a200 name=pinmux_camera_pins selector=19

^^^ pin group added twice

[    6.890899] pinctrl_generic_add_group: pctldev=ee33a200 name=backlight_pins_pinmux selector=20
[    6.957489] pinctrl_generic_add_group: pctldev=ee33a200 name=pinmux_camera_pins selector=20

^^^ same selector number allocated twice

[    6.985687] pinctrl_generic_add_group: pctldev=ee33a200 name=pinmux_camera_pins selector=21
[    7.020538] pinctrl_generic_add_group: pctldev=ee33a200 name=pinmux_camera_pins selector=22

^^^ 23 missing
[    7.123748] pinctrl_generic_add_group: pctldev=ee33a200 name=pinmux_camera_pins selector=24
[    7.158721] pinctrl_generic_add_group: pctldev=ee33a200 name=backlight_pins_pinmux selector=25

^^^ another pin group added twice

[    7.195953] pinctrl_generic_add_group: pctldev=ee33a200 name=modem_pins selector=26
[    7.213104] pinctrl_generic_add_group: pctldev=ee33a200 name=pinmux_camera_pins selector=26

^^^ same selector allocated twice, 27 is missing

[    7.239318] pinctrl_generic_add_group: pctldev=ee33a200 name=backlight_pins_pinmux selector=28

^^^ repetition

[    7.287414] pinctrl_generic_add_group: pctldev=ee33a200 name=pinmux_camera_pins selector=29
[    7.328491] pinctrl_generic_add_group: pctldev=ee33a200 name=pinmux_camera_pins selector=30
[    7.663543] pinctrl_generic_add_group: pctldev=ee33a200 name=pinmux_camera_pins selector=31
[    7.693725] pinctrl_generic_add_group: pctldev=ee33a200 name=pinmux_camera_pins selector=32
[    7.999755] pinctrl_generic_add_group: pctldev=ee33a200 name=pinmux_camera_pins selector=33
[    8.091491] pinctrl_generic_add_group: pctldev=ee33a200 name=pinmux_pps_pins selector=34
[    8.100219] pinctrl_generic_add_group: pctldev=ee33a200 name=pinmux_camera_pins selector=34

^^^ another duplicate

[    8.197174] pinctrl_generic_add_group: pctldev=ee33a200 name=pinmux_camera_pins selector=36
[    8.928436] pinctrl_generic_add_group: pctldev=ee33a200 name=pinmux_camera_pins selector=37
[    9.465606] pinctrl_generic_add_group: pctldev=ee33a200 name=pinmux_camera_pins selector=38
root@letux:~# 

Which means that some groups are added multiple times and cleanup on EPROBE_DEFER doesn't work.

Well, this is my theory - which may be wrong. The only fact is the problem with strcmp(NULL, ...)
sporadically occuring.

So I hope that someone can make sense out of these findings and propose a patch to fix the real
bug (which may exist for quite a while and became more visible in 4.17-rc1 - but bisect attempts
failed due to the randomness).

BR and thanks,
Nikolaus




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: BUG: drivers/pinctrl/core: races in pinctrl_groups and deferred probing
  2018-06-13 12:39 BUG: drivers/pinctrl/core: races in pinctrl_groups and deferred probing H. Nikolaus Schaller
@ 2018-06-14 12:01 ` Tony Lindgren
  2018-06-14 12:12   ` H. Nikolaus Schaller
  0 siblings, 1 reply; 22+ messages in thread
From: Tony Lindgren @ 2018-06-14 12:01 UTC (permalink / raw)
  To: H. Nikolaus Schaller
  Cc: Linus Walleij, open list:GPIO SUBSYSTEM,
	Linux Kernel Mailing List, Discussions about the Letux Kernel,
	kernel

* H. Nikolaus Schaller <hns@goldelico.com> [180613 12:41]:
> 
> Now if I look into pinctrl_generic_add_group() and pinctrl_generic_get_group_name(),
> pctldev->num_groups++ is not protected if pinctrl_generic_add_group() may be called by
> two threads in parallel for the same pctldev. Hence a second thread may try to insert
> a different node into the radix tree at the same selector index. This fails but there
> is no error check - and the second entry is completely missing (but probably assumed to
> be there).

Sounds like pinctrl-single.c is missing mutex around calls to
pinctrl_generic_add_group()?

Regards,

Tony

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: BUG: drivers/pinctrl/core: races in pinctrl_groups and deferred probing
  2018-06-14 12:01 ` Tony Lindgren
@ 2018-06-14 12:12   ` H. Nikolaus Schaller
  2018-06-15  6:58     ` Tony Lindgren
  0 siblings, 1 reply; 22+ messages in thread
From: H. Nikolaus Schaller @ 2018-06-14 12:12 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Linus Walleij, open list:GPIO SUBSYSTEM,
	Linux Kernel Mailing List, Discussions about the Letux Kernel,
	kernel

Hi Tony,

> Am 14.06.2018 um 14:01 schrieb Tony Lindgren <tony@atomide.com>:
> 
> * H. Nikolaus Schaller <hns@goldelico.com> [180613 12:41]:
>> 
>> Now if I look into pinctrl_generic_add_group() and pinctrl_generic_get_group_name(),
>> pctldev->num_groups++ is not protected if pinctrl_generic_add_group() may be called by
>> two threads in parallel for the same pctldev. Hence a second thread may try to insert
>> a different node into the radix tree at the same selector index. This fails but there
>> is no error check - and the second entry is completely missing (but probably assumed to
>> be there).
> 
> Sounds like pinctrl-single.c is missing mutex around calls to
> pinctrl_generic_add_group()?

Yes, that could be. I didn't research the call path, just the one of
devm_pinctrl_get(). That uses a mutex in

https://elixir.bootlin.com/linux/v4.17.1/source/drivers/pinctrl/core.c#L1021

Maybe a similar mutex is missing elsewhere.

BR,
Nikolaus


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: BUG: drivers/pinctrl/core: races in pinctrl_groups and deferred probing
  2018-06-14 12:12   ` H. Nikolaus Schaller
@ 2018-06-15  6:58     ` Tony Lindgren
  2018-06-15 11:13       ` Tony Lindgren
  0 siblings, 1 reply; 22+ messages in thread
From: Tony Lindgren @ 2018-06-15  6:58 UTC (permalink / raw)
  To: H. Nikolaus Schaller
  Cc: Linus Walleij, open list:GPIO SUBSYSTEM,
	Linux Kernel Mailing List, Discussions about the Letux Kernel,
	kernel

* H. Nikolaus Schaller <hns@goldelico.com> [180614 12:15]:
> Hi Tony,
> 
> > Am 14.06.2018 um 14:01 schrieb Tony Lindgren <tony@atomide.com>:
> > 
> > * H. Nikolaus Schaller <hns@goldelico.com> [180613 12:41]:
> >> 
> >> Now if I look into pinctrl_generic_add_group() and pinctrl_generic_get_group_name(),
> >> pctldev->num_groups++ is not protected if pinctrl_generic_add_group() may be called by
> >> two threads in parallel for the same pctldev. Hence a second thread may try to insert
> >> a different node into the radix tree at the same selector index. This fails but there
> >> is no error check - and the second entry is completely missing (but probably assumed to
> >> be there).
> > 
> > Sounds like pinctrl-single.c is missing mutex around calls to
> > pinctrl_generic_add_group()?
> 
> Yes, that could be. I didn't research the call path, just the one of
> devm_pinctrl_get(). That uses a mutex in

In addition to missing mutex lock around the generic pinctrl functions
we also have racy helpers pinctrl_generic_remove_last_group() and
pinmux_generic_remove_last_function() like you pointed out. I'll post
a patch for you later on today to test.

Regards,

Tony


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: BUG: drivers/pinctrl/core: races in pinctrl_groups and deferred probing
  2018-06-15  6:58     ` Tony Lindgren
@ 2018-06-15 11:13       ` Tony Lindgren
  2018-06-15 11:18         ` H. Nikolaus Schaller
  0 siblings, 1 reply; 22+ messages in thread
From: Tony Lindgren @ 2018-06-15 11:13 UTC (permalink / raw)
  To: H. Nikolaus Schaller
  Cc: open list:GPIO SUBSYSTEM, Linus Walleij,
	Discussions about the Letux Kernel, Linux Kernel Mailing List,
	kernel

* Tony Lindgren <tony@atomide.com> [180615 07:00]:
> * H. Nikolaus Schaller <hns@goldelico.com> [180614 12:15]:
> > Hi Tony,
> > 
> > > Am 14.06.2018 um 14:01 schrieb Tony Lindgren <tony@atomide.com>:
> > > 
> > > * H. Nikolaus Schaller <hns@goldelico.com> [180613 12:41]:
> > >> 
> > >> Now if I look into pinctrl_generic_add_group() and pinctrl_generic_get_group_name(),
> > >> pctldev->num_groups++ is not protected if pinctrl_generic_add_group() may be called by
> > >> two threads in parallel for the same pctldev. Hence a second thread may try to insert
> > >> a different node into the radix tree at the same selector index. This fails but there
> > >> is no error check - and the second entry is completely missing (but probably assumed to
> > >> be there).
> > > 
> > > Sounds like pinctrl-single.c is missing mutex around calls to
> > > pinctrl_generic_add_group()?
> > 
> > Yes, that could be. I didn't research the call path, just the one of
> > devm_pinctrl_get(). That uses a mutex in
> 
> In addition to missing mutex lock around the generic pinctrl functions
> we also have racy helpers pinctrl_generic_remove_last_group() and
> pinmux_generic_remove_last_function() like you pointed out. I'll post
> a patch for you later on today to test.

OK I posted a series to fix these issues hopefully as thread
"[PATCH 0/5] pinctrl fixes for generic functions and groups".

Can you please test and see if that is enough to fix the issues
you're seeing?

Regards,

Tony


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: BUG: drivers/pinctrl/core: races in pinctrl_groups and deferred probing
  2018-06-15 11:13       ` Tony Lindgren
@ 2018-06-15 11:18         ` H. Nikolaus Schaller
  2018-06-16 11:07           ` [Letux-kernel] " H. Nikolaus Schaller
  0 siblings, 1 reply; 22+ messages in thread
From: H. Nikolaus Schaller @ 2018-06-15 11:18 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: open list:GPIO SUBSYSTEM, Linus Walleij,
	Discussions about the Letux Kernel, Linux Kernel Mailing List,
	kernel

Hi Tony,

> Am 15.06.2018 um 13:13 schrieb Tony Lindgren <tony@atomide.com>:
> 
> * Tony Lindgren <tony@atomide.com> [180615 07:00]:
>> * H. Nikolaus Schaller <hns@goldelico.com> [180614 12:15]:
>>> Hi Tony,
>>> 
>>>> Am 14.06.2018 um 14:01 schrieb Tony Lindgren <tony@atomide.com>:
>>>> 
>>>> * H. Nikolaus Schaller <hns@goldelico.com> [180613 12:41]:
>>>>> 
>>>>> Now if I look into pinctrl_generic_add_group() and pinctrl_generic_get_group_name(),
>>>>> pctldev->num_groups++ is not protected if pinctrl_generic_add_group() may be called by
>>>>> two threads in parallel for the same pctldev. Hence a second thread may try to insert
>>>>> a different node into the radix tree at the same selector index. This fails but there
>>>>> is no error check - and the second entry is completely missing (but probably assumed to
>>>>> be there).
>>>> 
>>>> Sounds like pinctrl-single.c is missing mutex around calls to
>>>> pinctrl_generic_add_group()?
>>> 
>>> Yes, that could be. I didn't research the call path, just the one of
>>> devm_pinctrl_get(). That uses a mutex in
>> 
>> In addition to missing mutex lock around the generic pinctrl functions
>> we also have racy helpers pinctrl_generic_remove_last_group() and
>> pinmux_generic_remove_last_function() like you pointed out. I'll post
>> a patch for you later on today to test.
> 
> OK I posted a series to fix these issues hopefully as thread
> "[PATCH 0/5] pinctrl fixes for generic functions and groups".

Fine, I have located them.

> 
> Can you please test and see if that is enough to fix the issues
> you're seeing?

Yes, I'll try asap.

BR and thanks,
Nikolaus


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Letux-kernel] BUG: drivers/pinctrl/core: races in pinctrl_groups and deferred probing
  2018-06-15 11:18         ` H. Nikolaus Schaller
@ 2018-06-16 11:07           ` H. Nikolaus Schaller
  2018-06-18  6:46             ` Tony Lindgren
  2018-06-18  8:22             ` Andy Shevchenko
  0 siblings, 2 replies; 22+ messages in thread
From: H. Nikolaus Schaller @ 2018-06-16 11:07 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: open list:GPIO SUBSYSTEM, Linus Walleij,
	Linux Kernel Mailing List, kernel,
	Discussions about the Letux Kernel

Hi Tony,

> Am 15.06.2018 um 13:18 schrieb H. Nikolaus Schaller <hns@goldelico.com>:
> 
> Hi Tony,
> 
>> Am 15.06.2018 um 13:13 schrieb Tony Lindgren <tony@atomide.com>:
>> 
>> * Tony Lindgren <tony@atomide.com> [180615 07:00]:
>> 
>> OK I posted a series to fix these issues hopefully as thread
>> "[PATCH 0/5] pinctrl fixes for generic functions and groups".
>> 
>> Can you please test and see if that is enough to fix the issues
>> you're seeing?
> 
> Yes, I'll try asap.

Ok, I tried on OMAP3 (GTA04A5 board).

First boot was ok, but second did show again the strcmp(NULL) :(

Interestingly, adding my printk() to pinctrl_generic_add_group makes it disappear.
(I think the printk adds more delay to one of the code paths and avoids the race or
problematic concurrency).

There is also good news: the selectors are now assigned in strict sequence.
This is a big improvement:

root@letux:~# dmesg|fgrep generic_add
[    0.739501] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_hsusb2_pins selector=0
[    0.741638] pinctrl_generic_add_group: pctldev=ee481f80 name=pinmux_hsusb2_2_pins selector=0
[    0.742401] pinctrl_generic_add_group: pctldev=ee481e00 name=pinmux_mcbsp1_devconf0_pins selector=0
[    0.743072] pinctrl_generic_add_group: pctldev=ee481d00 name=pinmux_tv_acbias_devconf1_pins selector=0
[    0.760437] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_uart1_pins selector=1
[    0.762268] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_uart2_pins selector=2
[    0.763610] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_uart3_pins selector=3
[    1.790527] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_twl4030_pins selector=4
[    1.799835] pinctrl_generic_add_group: pctldev=ee3cd000 name=pinmux_twl4030_vpins selector=0
[    1.969635] pinctrl_generic_add_group: pctldev=ee481f80 name=spi_gpio_pinmux selector=1
[    2.164093] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_mmc1_pins selector=5
[    2.198242] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_wlan_irq_pin selector=6
[    2.374328] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_gpmc_pins selector=7
[    2.769989] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_wlan_pins selector=8
[    2.785888] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_bt_pins selector=9
[    2.802154] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_wlan_irq_pin selector=10
[    5.415222] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_dss_dpi_pins selector=11
[    6.080566] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_mcbsp1_pins selector=12
[    6.182617] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_mcbsp2_pins selector=13
[    6.271240] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_mcbsp3_pins selector=14
[    6.330718] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_penirq_pins selector=15
[    6.368927] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_mcbsp4_pins selector=16
[    6.616546] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_camera_pins selector=17
[    6.835418] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_camera_pins selector=18
[    6.874694] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_camera_pins selector=19
[    6.900299] pinctrl_generic_add_group: pctldev=ee3cd180 name=hdq_pins selector=20
[    6.937042] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_camera_pins selector=21
[    7.062744] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_camera_pins selector=22
[    7.108032] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_camera_pins selector=23
[    7.145080] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_camera_pins selector=24
[    7.197723] pinctrl_generic_add_group: pctldev=ee3cd180 name=backlight_pins_pinmux selector=25
[    7.427398] pinctrl_generic_add_group: pctldev=ee3cd180 name=modem_pins selector=26
[    7.476562] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_camera_pins selector=27
[    7.505981] pinctrl_generic_add_group: pctldev=ee3cd180 name=backlight_pins_pinmux selector=28
[    7.564910] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_camera_pins selector=29
[    7.894439] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_camera_pins selector=30
[    8.207489] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_camera_pins selector=31
[    8.311096] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_pps_pins selector=32
[    8.340667] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_camera_pins selector=33
[    8.414428] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_camera_pins selector=34
[    9.231445] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_camera_pins selector=35
[    9.446716] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_camera_pins selector=36
root@letux:~# 

But it looks as if we still have duplicate assignments by deferred probing, i.e. some cleanup is
missing (or is this intended behaviour?).

Regarding the strcmp(NULL) faults, it seems as we still have the situation that there are
invalid selector numbers stored somewhere and used by generic driver probing code (i.e. before
the driver is modprobed):

[    0.000000] Booting Linux on physical CPU 0x0
[    0.000000] Linux version 4.17.0-letux+ (hns@iMac.fritz.box) (gcc version 4.9.2 (GCC)) #2453 SMP PREEMPT Fri Jun 15 20:34:03 CEST 2018
[    0.000000] CPU: ARMv7 Processor [413fc082] revision 2 (ARMv7), cr=10c5387d
[    0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache
[    0.000000] OF: fdt: Machine model: Goldelico GTA04A5/Letux 2804

...

[    6.884918] pinctrl_generic_get_group_name: group>name is NULL
[    6.933441] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
[    6.947204] pinctrl_get_group_selector: strcmp: (null) modem_pins
[    6.980468]   get_group_name = pinctrl_generic_get_group_name+0x0/0x5c
[    7.009735]   pctldev = 0xee396180
[    7.013336] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[    7.099395] pgd = (ptrval)
[    7.102233] [00000000] *pgd=be11a831
[    7.106018] Internal error: Oops: 17 [#1] PREEMPT SMP ARM
[    7.111663] Modules linked in: pps_gpio(+) panel_tpo_td028ttec1 snd_soc_simple_card(+) snd_soc_simple_card_utils snd_soc_omap_twl4030(+) pps_core encoder_opa362 wwan_on_off(+) snd_soc_gtm601 pwm_omap_dmtimer connector_analog_tv generic_adc_battery pwm_bl bq27xxx_battery_hdq bq27xxx_battery wlcore_sdio bmp280_spi omap_hdq omap2430 ov9655 v4l2_fwnode v4l2_common snd_soc_omap_mcbsp snd_soc_omap snd_pcm_dmaengine bmp280_i2c bmp280 videodev at24 leds_tca6507 tsc2007 phy_twl4030_usb bmc150_accel_i2c bmc150_magn_i2c media bmc150_accel_core bmc150_magn industrialio_triggered_buffer kfifo_buf musb_hdrc gpio_twl4030 snd_soc_twl4030 twl4030_pwrbutton twl4030_charger twl4030_vibra twl4030_madc industrialio gnss_w2sg0004 ehci_omap w2cbw003_bluetooth gnss omapdss omapdss_base cec
[    7.182952] CPU: 0 PID: 1198 Comm: udevd Not tainted 4.17.0-letux+ #2453
[    7.189971] Hardware name: Generic OMAP36xx (Flattened Device Tree)
[    7.196563] PC is at strcmp+0x0/0x34
[    7.200317] LR is at pinctrl_get_group_selector+0x84/0xc8
[    7.205993] pc : [<c0758ec0>]    lr : [<c042d9bc>]    psr: 600f0013
[    7.212554] sp : ed019cf8  ip : 00000000  fp : 0000001d
[    7.218017] r10: 0000001c  r9 : 0000001d  r8 : 00000000
[    7.223480] r7 : c0799fac  r6 : ee396180  r5 : ef7c4b74  r4 : 0000001a
[    7.230316] r3 : 00000000  r2 : 00000000  r1 : ef7c4b74  r0 : 00000000
[    7.237182] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[    7.244659] Control: 10c5387d  Table: ad01c019  DAC: 00000051
[    7.250671] Process udevd (pid: 1198, stack limit = 0x(ptrval))
[    7.256866] Stack: (0xed019cf8 to 0xed01a000)
[    7.261444] 9ce0:                                                       ed019d1c ee396180
[    7.270019] 9d00: ef7c4b74 00000000 ed0fc810 00000000 ed20d740 c042e990 ed20d990 00000001
[    7.278594] 9d20: 00000002 ed20d9c0 ed20d780 ed20d7c0 ed0fc810 00000000 00000000 ed20d740
[    7.287139] 9d40: 00000000 c042d1a4 00000014 ee23aac0 c0b59e30 c08e9f64 ee23ca10 00000000
[    7.295715] 9d60: ed20dad0 ee23ca10 c0bd1df0 fffffdfb bf248020 00000030 00000000 c042d36c
[    7.304290] 9d80: 00000000 ee23ca10 ed20db50 c04f2e70 ee23ca10 00000000 c0bd1df4 c04d3e8c
[    7.312866] 9da0: ee23ca10 ee23ca44 bf248020 c0b666c8 bf248180 bf2481b0 00000000 c04d413c
[    7.321441] 9dc0: ee23ca10 bf248020 c04d40bc c04d2620 ee01be58 ee25e034 bf248020 00000000
[    7.330017] 9de0: ed0fc000 c04d34b8 bf247727 bf247728 00000000 bf248020 c0b98de0 bf24b000
[    7.338592] 9e00: 00000000 c04d4ce0 c04d5500 bf248080 c0b98de0 c0102d74 014000c0 c07502e0
[    7.347167] 9e20: bf2480c8 ed22e2c8 ed22e200 c08e85a2 ed1fb080 bf248080 00000001 ed22e2e4
[    7.355743] 9e40: ed1fb0c0 a0070113 ed1fb100 ed1fb080 ee000000 014000c0 bf248080 ed019f50
[    7.364318] 9e60: bf24808c bf248080 ed019f50 ed1fb100 00000000 c01a7174 bf248080 ed1fb100
[    7.372894] 9e80: bf248080 ed019f50 bf24808c c01a5fa4 ffff8000 00007fff bf248080 c01a32d4
[    7.381469] 9ea0: 00000075 c076a678 f0bb090c b6e119f8 00000000 00000003 ee76e780 0000295c
[    7.390045] 9ec0: 00000000 c024cf6c 00000003 00000000 00000000 00000000 00000000 00000000
[    7.398620] 9ee0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    7.407196] 9f00: 00000000 00000000 7fffffff 00000000 b6e119f8 00000006 0000017b c01011e4
[    7.415771] 9f20: ed018000 00000000 00058378 c01a6328 7fffffff 00000000 00000003 c021707c
[    7.424316] 9f40: 00000002 f0bae000 0000295c 00000000 f0baebbe f0bae000 0000295c f0bb0394
[    7.432891] 9f60: f0baf1b7 f0bafa98 00003000 000031b0 00000000 00000000 00000000 00001c1c
[    7.441467] 9f80: 00000023 00000024 0000001b 00000000 00000017 00000000 b6e127d4 00051f70
[    7.450042] 9fa0: 17964f00 c0101000 b6e127d4 00051f70 00000006 b6e119f8 00000000 b6e1231c
[    7.458618] 9fc0: b6e127d4 00051f70 17964f00 0000017b 00020000 00037f78 00050048 00058378
[    7.467193] 9fe0: bee21ee0 bee21ed0 b6e0bc4b b6f16a42 60070030 00000006 00000000 00000000
[    7.475769] [<c0758ec0>] (strcmp) from [<c042d9bc>] (pinctrl_get_group_selector+0x84/0xc8)
[    7.484436] [<c042d9bc>] (pinctrl_get_group_selector) from [<c042e990>] (pinmux_map_to_setting+0x158/0x1a0)
[    7.494659] [<c042e990>] (pinmux_map_to_setting) from [<c042d1a4>] (create_pinctrl+0x1f0/0x2f8)
[    7.503784] [<c042d1a4>] (create_pinctrl) from [<c042d36c>] (devm_pinctrl_get+0x2c/0x6c)
[    7.512268] [<c042d36c>] (devm_pinctrl_get) from [<c04f2e70>] (pinctrl_bind_pins+0x3c/0x138)
[    7.521118] [<c04f2e70>] (pinctrl_bind_pins) from [<c04d3e8c>] (driver_probe_device+0xe8/0x318)
[    7.530242] [<c04d3e8c>] (driver_probe_device) from [<c04d413c>] (__driver_attach+0x80/0xa4)
[    7.539093] [<c04d413c>] (__driver_attach) from [<c04d2620>] (bus_for_each_dev+0x58/0x7c)
[    7.547668] [<c04d2620>] (bus_for_each_dev) from [<c04d34b8>] (bus_add_driver+0xcc/0x1e0)
[    7.556243] [<c04d34b8>] (bus_add_driver) from [<c04d4ce0>] (driver_register+0x9c/0xe0)
[    7.564636] [<c04d4ce0>] (driver_register) from [<c0102d74>] (do_one_initcall+0xb4/0x248)
[    7.573211] [<c0102d74>] (do_one_initcall) from [<c01a7174>] (do_init_module+0x58/0x1d0)
[    7.581695] [<c01a7174>] (do_init_module) from [<c01a5fa4>] (load_module+0xe04/0xfb0)
[    7.589904] [<c01a5fa4>] (load_module) from [<c01a6328>] (sys_finit_module+0x88/0x90)
[    7.598114] [<c01a6328>] (sys_finit_module) from [<c0101000>] (ret_fast_syscall+0x0/0x54)
[    7.606689] Exception stack(0xed019fa8 to 0xed019ff0)
[    7.611999] 9fa0:                   b6e127d4 00051f70 00000006 b6e119f8 00000000 b6e1231c
[    7.620574] 9fc0: b6e127d4 00051f70 17964f00 0000017b 00020000 00037f78 00050048 00058378
[    7.629119] 9fe0: bee21ee0 bee21ed0 b6e0bc4b b6f16a42
[    7.634429] Code: e3520000 e5e32001 1afffffb e12fff1e (e4d03001) 
[    8.755401] cfg80211: Loading compiled-in X.509 certificates for regulatory database
[    8.764160] ---[ end trace e60e3511cff37b8a ]---

Can this be a side-effect of the duplicate selector assigments?

Let's assume a first probe attempt does:

[    6.616546] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_camera_pins selector=17

This fails and is probed again doing:

[    6.835418] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_camera_pins selector=18
[    6.874694] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_camera_pins selector=19

Then, a different driver does

[    6.900299] pinctrl_generic_add_group: pctldev=ee3cd180 name=hdq_pins selector=20

And calls pinctrl_get_group_selector() to find itself in the list by pinmux_map_to_setting().

For this the code in pinctrl_get_group_selector() loops over all 21 known selector numbers,
calling pctlops->get_group_name(pctldev, group_selector) for each one.

If slots 17 and 18 do not really exist (or are deleted after failed probing), there is
a NULL return. Which triggers the strcmp(NULL), because the strcmp is not guarded against
non-existing slots in the radix tree.

So a simple workaround could be:

		if (gname && !strcmp(gname, pin_group)) {

But I think the fundamental problem is that the same driver assigns multiple slots if
probing is deferred.

Therefore, I tried to find out more why this happens. Here are some more observations by
studying the code and adding a dump_stack() inside pinctrl_generic_add_group():

1.  the records stored in the radix tree are allocated through devm_kzalloc() within
    pinctrl_generic_add_group().
2.  I have not found a mechanism that removes them from the radix tree if they are
    devm freed which IMHO happens if the probe fails (and all devm objects are
    rewound).
3.  I could not observe calls to pinctrl_generic_remove_group()
4.  this means a stale pointer to the group_desc is still stored in the radix tree
    if driver probing fails
5.  scanning through the radix tree for a matching gname in pinctrl_get_group_selector()
    accesses this stale radix tree entry.
    It has either been overwritten by something else - or contains a dangling pointer for
    group->name. This explains the randomness of the problem but that it is also repeatable
    to some extent. For a pure race of some 100 instructions it happens IMHO too often.
6.  the pinctrl_generic_add_group() is called from create_pinctrl() through 
    pinctrl_dt_to_map() and pcs_dt_node_to_map(), i.e. before the pinctrl_maps_mutex
    is locked in create_pinctrl(). This means that pinctrl_generic_add_group() is
    not locked in this case.

I hope this helps to pinpoint and solve the remaining bugs.

BR and thanks,
Nikolaus


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Letux-kernel] BUG: drivers/pinctrl/core: races in pinctrl_groups and deferred probing
  2018-06-16 11:07           ` [Letux-kernel] " H. Nikolaus Schaller
@ 2018-06-18  6:46             ` Tony Lindgren
  2018-06-18  8:22             ` Andy Shevchenko
  1 sibling, 0 replies; 22+ messages in thread
From: Tony Lindgren @ 2018-06-18  6:46 UTC (permalink / raw)
  To: H. Nikolaus Schaller
  Cc: open list:GPIO SUBSYSTEM, Linus Walleij,
	Linux Kernel Mailing List, kernel,
	Discussions about the Letux Kernel

* H. Nikolaus Schaller <hns@goldelico.com> [180616 11:10]:
> There is also good news: the selectors are now assigned in strict sequence.
> This is a big improvement:

OK thanks for testing.

> If slots 17 and 18 do not really exist (or are deleted after failed probing), there is
> a NULL return. Which triggers the strcmp(NULL), because the strcmp is not guarded against
> non-existing slots in the radix tree.
> 
> So a simple workaround could be:
> 
> 		if (gname && !strcmp(gname, pin_group)) {
> 
> But I think the fundamental problem is that the same driver assigns multiple slots if
> probing is deferred.
> 
> Therefore, I tried to find out more why this happens. Here are some more observations by
> studying the code and adding a dump_stack() inside pinctrl_generic_add_group():
> 
> 1.  the records stored in the radix tree are allocated through devm_kzalloc() within
>     pinctrl_generic_add_group().
> 2.  I have not found a mechanism that removes them from the radix tree if they are
>     devm freed which IMHO happens if the probe fails (and all devm objects are
>     rewound).
> 3.  I could not observe calls to pinctrl_generic_remove_group()
> 4.  this means a stale pointer to the group_desc is still stored in the radix tree
>     if driver probing fails
> 5.  scanning through the radix tree for a matching gname in pinctrl_get_group_selector()
>     accesses this stale radix tree entry.
>     It has either been overwritten by something else - or contains a dangling pointer for
>     group->name. This explains the randomness of the problem but that it is also repeatable
>     to some extent. For a pure race of some 100 instructions it happens IMHO too often.
> 6.  the pinctrl_generic_add_group() is called from create_pinctrl() through 
>     pinctrl_dt_to_map() and pcs_dt_node_to_map(), i.e. before the pinctrl_maps_mutex
>     is locked in create_pinctrl(). This means that pinctrl_generic_add_group() is
>     not locked in this case.
> 
> I hope this helps to pinpoint and solve the remaining bugs.

Yes sounds like that's where things still go wrong. I'll take a look
if it makes sense to assign the selector from the pin controller
driver instead or just ignoring the holes.

I'll try to debug the devm issue too.

Regards,

Tony

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Letux-kernel] BUG: drivers/pinctrl/core: races in pinctrl_groups and deferred probing
  2018-06-16 11:07           ` [Letux-kernel] " H. Nikolaus Schaller
  2018-06-18  6:46             ` Tony Lindgren
@ 2018-06-18  8:22             ` Andy Shevchenko
  2018-06-18  9:14               ` Tony Lindgren
  1 sibling, 1 reply; 22+ messages in thread
From: Andy Shevchenko @ 2018-06-18  8:22 UTC (permalink / raw)
  To: H. Nikolaus Schaller
  Cc: Tony Lindgren, open list:GPIO SUBSYSTEM, Linus Walleij,
	Linux Kernel Mailing List, kernel,
	Discussions about the Letux Kernel

On Sat, Jun 16, 2018 at 2:08 PM H. Nikolaus Schaller <hns@goldelico.com> wrote:

> [    0.739501] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_hsusb2_pins selector=0
> [    0.741638] pinctrl_generic_add_group: pctldev=ee481f80 name=pinmux_hsusb2_2_pins selector=0
> [    0.742401] pinctrl_generic_add_group: pctldev=ee481e00 name=pinmux_mcbsp1_devconf0_pins selector=0
> [    0.743072] pinctrl_generic_add_group: pctldev=ee481d00 name=pinmux_tv_acbias_devconf1_pins selector=0
> [    0.760437] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_uart1_pins selector=1
> [    0.762268] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_uart2_pins selector=2
> [    0.763610] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_uart3_pins selector=3
> [    1.790527] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_twl4030_pins selector=4
> [    1.799835] pinctrl_generic_add_group: pctldev=ee3cd000 name=pinmux_twl4030_vpins selector=0
> [    1.969635] pinctrl_generic_add_group: pctldev=ee481f80 name=spi_gpio_pinmux selector=1
> [    2.164093] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_mmc1_pins selector=5
> [    2.198242] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_wlan_irq_pin selector=6
> [    2.374328] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_gpmc_pins selector=7
> [    2.769989] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_wlan_pins selector=8
> [    2.785888] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_bt_pins selector=9
> [    2.802154] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_wlan_irq_pin selector=10
> [    5.415222] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_dss_dpi_pins selector=11
> [    6.080566] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_mcbsp1_pins selector=12
> [    6.182617] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_mcbsp2_pins selector=13
> [    6.271240] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_mcbsp3_pins selector=14
> [    6.330718] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_penirq_pins selector=15
> [    6.368927] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_mcbsp4_pins selector=16
> [    6.616546] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_camera_pins selector=17
> [    6.835418] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_camera_pins selector=18
> [    6.874694] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_camera_pins selector=19
> [    6.900299] pinctrl_generic_add_group: pctldev=ee3cd180 name=hdq_pins selector=20
> [    6.937042] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_camera_pins selector=21
> [    7.062744] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_camera_pins selector=22
> [    7.108032] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_camera_pins selector=23
> [    7.145080] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_camera_pins selector=24
> [    7.197723] pinctrl_generic_add_group: pctldev=ee3cd180 name=backlight_pins_pinmux selector=25
> [    7.427398] pinctrl_generic_add_group: pctldev=ee3cd180 name=modem_pins selector=26
> [    7.476562] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_camera_pins selector=27
> [    7.505981] pinctrl_generic_add_group: pctldev=ee3cd180 name=backlight_pins_pinmux selector=28
> [    7.564910] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_camera_pins selector=29
> [    7.894439] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_camera_pins selector=30
> [    8.207489] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_camera_pins selector=31
> [    8.311096] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_pps_pins selector=32
> [    8.340667] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_camera_pins selector=33
> [    8.414428] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_camera_pins selector=34
> [    9.231445] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_camera_pins selector=35
> [    9.446716] pinctrl_generic_add_group: pctldev=ee3cd180 name=pinmux_camera_pins selector=36
> root@letux:~#
>
> But it looks as if we still have duplicate assignments by deferred probing, i.e. some cleanup is
> missing (or is this intended behaviour?).

> But I think the fundamental problem is that the same driver assigns multiple slots if
> probing is deferred.

Indeed.

I think there is a simple way to clean up pinctrl stuff on failed probe. See
https://elixir.bootlin.com/linux/v4.18-rc1/source/drivers/base/dd.c#L416

We only bind pins, and do not perform any actions when failure happens later on.

> 3.  I could not observe calls to pinctrl_generic_remove_group()
> 4.  this means a stale pointer to the group_desc is still stored in the radix tree
>     if driver probing fails

Exactly.

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Letux-kernel] BUG: drivers/pinctrl/core: races in pinctrl_groups and deferred probing
  2018-06-18  8:22             ` Andy Shevchenko
@ 2018-06-18  9:14               ` Tony Lindgren
  2018-06-18  9:29                 ` H. Nikolaus Schaller
  0 siblings, 1 reply; 22+ messages in thread
From: Tony Lindgren @ 2018-06-18  9:14 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: H. Nikolaus Schaller, open list:GPIO SUBSYSTEM, Linus Walleij,
	Linux Kernel Mailing List, kernel,
	Discussions about the Letux Kernel

* Andy Shevchenko <andy.shevchenko@gmail.com> [180618 08:25]:
> On Sat, Jun 16, 2018 at 2:08 PM H. Nikolaus Schaller <hns@goldelico.com> wrote:
> > But it looks as if we still have duplicate assignments by deferred probing, i.e. some cleanup is
> > missing (or is this intended behaviour?).
> 
> > But I think the fundamental problem is that the same driver assigns multiple slots if
> > probing is deferred.
> 
> Indeed.
> 
> I think there is a simple way to clean up pinctrl stuff on failed probe. See
> https://elixir.bootlin.com/linux/v4.18-rc1/source/drivers/base/dd.c#L416
> 
> We only bind pins, and do not perform any actions when failure happens later on.

Yup seems like a good approach. I'll take a look if we can just
check if the function or group name already exists and return
the existing selector in that case.

Regards,

Tony

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Letux-kernel] BUG: drivers/pinctrl/core: races in pinctrl_groups and deferred probing
  2018-06-18  9:14               ` Tony Lindgren
@ 2018-06-18  9:29                 ` H. Nikolaus Schaller
  2018-06-18  9:54                   ` Tony Lindgren
  0 siblings, 1 reply; 22+ messages in thread
From: H. Nikolaus Schaller @ 2018-06-18  9:29 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Andy Shevchenko, open list:GPIO SUBSYSTEM, Linus Walleij,
	Linux Kernel Mailing List, kernel,
	Discussions about the Letux Kernel


> Am 18.06.2018 um 11:14 schrieb Tony Lindgren <tony@atomide.com>:
> 
> * Andy Shevchenko <andy.shevchenko@gmail.com> [180618 08:25]:
>> On Sat, Jun 16, 2018 at 2:08 PM H. Nikolaus Schaller <hns@goldelico.com> wrote:
>>> But it looks as if we still have duplicate assignments by deferred probing, i.e. some cleanup is
>>> missing (or is this intended behaviour?).
>> 
>>> But I think the fundamental problem is that the same driver assigns multiple slots if
>>> probing is deferred.
>> 
>> Indeed.
>> 
>> I think there is a simple way to clean up pinctrl stuff on failed probe. See
>> https://elixir.bootlin.com/linux/v4.18-rc1/source/drivers/base/dd.c#L416
>> 
>> We only bind pins, and do not perform any actions when failure happens later on.
> 
> Yup seems like a good approach. I'll take a look if we can just
> check if the function or group name already exists and return
> the existing selector in that case.

Ok, that would also solve the duplication issue.

On the other hand we still have a stale entry if the probing process
finally fails after several attempts.

This may happen if a driver with a valid DT entry is blacklisted in
/etc/modprobe.d/blacklist.conf. Then, the kernel will try to modprobe
it several times through udev until it gives up. The reason seems to
be that the deferred probing thread does not know why the driver did
not probe successfully.

BR and thanks,
Nikolaus

> 
> Regards,
> 
> Tony


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Letux-kernel] BUG: drivers/pinctrl/core: races in pinctrl_groups and deferred probing
  2018-06-18  9:29                 ` H. Nikolaus Schaller
@ 2018-06-18  9:54                   ` Tony Lindgren
  2018-06-18  9:59                     ` H. Nikolaus Schaller
       [not found]                     ` <CA+Ot1OxNj30KgHWJHXgWbnXYkNXTNvd_t-b6+uG3u=z8+pZf2Q@mail.gmail.com>
  0 siblings, 2 replies; 22+ messages in thread
From: Tony Lindgren @ 2018-06-18  9:54 UTC (permalink / raw)
  To: H. Nikolaus Schaller
  Cc: Andy Shevchenko, open list:GPIO SUBSYSTEM, Linus Walleij,
	Linux Kernel Mailing List, kernel,
	Discussions about the Letux Kernel

* H. Nikolaus Schaller <hns@goldelico.com> [180618 09:32]:
> 
> > Am 18.06.2018 um 11:14 schrieb Tony Lindgren <tony@atomide.com>:
> > 
> > * Andy Shevchenko <andy.shevchenko@gmail.com> [180618 08:25]:
> >> On Sat, Jun 16, 2018 at 2:08 PM H. Nikolaus Schaller <hns@goldelico.com> wrote:
> >>> But it looks as if we still have duplicate assignments by deferred probing, i.e. some cleanup is
> >>> missing (or is this intended behaviour?).
> >> 
> >>> But I think the fundamental problem is that the same driver assigns multiple slots if
> >>> probing is deferred.
> >> 
> >> Indeed.
> >> 
> >> I think there is a simple way to clean up pinctrl stuff on failed probe. See
> >> https://elixir.bootlin.com/linux/v4.18-rc1/source/drivers/base/dd.c#L416
> >> 
> >> We only bind pins, and do not perform any actions when failure happens later on.
> > 
> > Yup seems like a good approach. I'll take a look if we can just
> > check if the function or group name already exists and return
> > the existing selector in that case.
> 
> Ok, that would also solve the duplication issue.

Below is an incremental patch to check for existing entries. Care
to test again?

If that works, I'll fold it into the patch series and repost the
whole series.

> On the other hand we still have a stale entry if the probing process
> finally fails after several attempts.
>
> This may happen if a driver with a valid DT entry is blacklisted in
> /etc/modprobe.d/blacklist.conf. Then, the kernel will try to modprobe
> it several times through udev until it gives up. The reason seems to
> be that the deferred probing thread does not know why the driver did
> not probe successfully.

Hmm I think this might be fixed then too. Then on pinctrl module
unbind/unload we should free the radix tree entries if that is
not yet done. Seems we may only free them on -ENOMEM right now.

Regards,

Tony

8< ------------------------
diff --git a/drivers/pinctrl/core.c b/drivers/pinctrl/core.c
--- a/drivers/pinctrl/core.c
+++ b/drivers/pinctrl/core.c
@@ -617,6 +617,26 @@ struct group_desc *pinctrl_generic_get_group(struct pinctrl_dev *pctldev,
 }
 EXPORT_SYMBOL_GPL(pinctrl_generic_get_group);
 
+static int pinctrl_generic_group_name_to_selector(struct pinctrl_dev *pctldev,
+						  const char *function)
+{
+	const struct pinctrl_ops *ops = pctldev->desc->pctlops;
+	unsigned ngroups = ops->get_groups_count(pctldev);
+	unsigned selector = 0;
+
+	/* See if this pctldev has this group */
+	while (selector < ngroups) {
+		const char *gname = ops->get_group_name(pctldev, selector);
+
+		if (!strcmp(function, gname))
+			return selector;
+
+		selector++;
+	}
+
+	return -EINVAL;
+}
+
 /**
  * pinctrl_generic_add_group() - adds a new pin group
  * @pctldev: pin controller device
@@ -631,7 +651,13 @@ int pinctrl_generic_add_group(struct pinctrl_dev *pctldev, const char *name,
 			      int *pins, int num_pins, void *data)
 {
 	struct group_desc *group;
-	int selector = pctldev->num_groups;
+	int selector;
+
+	selector = pinctrl_generic_group_name_to_selector(pctldev, name);
+	if (selector >= 0)
+		return selector;
+
+	selector = pctldev->num_groups;
 
 	group = devm_kzalloc(pctldev->dev, sizeof(*group), GFP_KERNEL);
 	if (!group)
diff --git a/drivers/pinctrl/pinmux.c b/drivers/pinctrl/pinmux.c
--- a/drivers/pinctrl/pinmux.c
+++ b/drivers/pinctrl/pinmux.c
@@ -308,7 +308,6 @@ static int pinmux_func_name_to_selector(struct pinctrl_dev *pctldev,
 		selector++;
 	}
 
-	dev_err(pctldev->dev, "function '%s' not supported\n", function);
 	return -EINVAL;
 }
 
@@ -775,7 +774,13 @@ int pinmux_generic_add_function(struct pinctrl_dev *pctldev,
 				void *data)
 {
 	struct function_desc *function;
-	int selector = pctldev->num_functions;
+	int selector;
+
+	selector = pinmux_func_name_to_selector(pctldev, name);
+	if (selector >= 0)
+		return selector;
+
+	selector = pctldev->num_functions;
 
 	function = devm_kzalloc(pctldev->dev, sizeof(*function), GFP_KERNEL);
 	if (!function)
-- 
2.17.1

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Letux-kernel] BUG: drivers/pinctrl/core: races in pinctrl_groups and deferred probing
  2018-06-18  9:54                   ` Tony Lindgren
@ 2018-06-18  9:59                     ` H. Nikolaus Schaller
       [not found]                     ` <CA+Ot1OxNj30KgHWJHXgWbnXYkNXTNvd_t-b6+uG3u=z8+pZf2Q@mail.gmail.com>
  1 sibling, 0 replies; 22+ messages in thread
From: H. Nikolaus Schaller @ 2018-06-18  9:59 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Andy Shevchenko, open list:GPIO SUBSYSTEM, Linus Walleij,
	Linux Kernel Mailing List, kernel,
	Discussions about the Letux Kernel


> Am 18.06.2018 um 11:54 schrieb Tony Lindgren <tony@atomide.com>:
> 
> * H. Nikolaus Schaller <hns@goldelico.com> [180618 09:32]:
>> 
>>> Am 18.06.2018 um 11:14 schrieb Tony Lindgren <tony@atomide.com>:
>>> 
>>> * Andy Shevchenko <andy.shevchenko@gmail.com> [180618 08:25]:
>>>> On Sat, Jun 16, 2018 at 2:08 PM H. Nikolaus Schaller <hns@goldelico.com> wrote:
>>>>> But it looks as if we still have duplicate assignments by deferred probing, i.e. some cleanup is
>>>>> missing (or is this intended behaviour?).
>>>> 
>>>>> But I think the fundamental problem is that the same driver assigns multiple slots if
>>>>> probing is deferred.
>>>> 
>>>> Indeed.
>>>> 
>>>> I think there is a simple way to clean up pinctrl stuff on failed probe. See
>>>> https://elixir.bootlin.com/linux/v4.18-rc1/source/drivers/base/dd.c#L416
>>>> 
>>>> We only bind pins, and do not perform any actions when failure happens later on.
>>> 
>>> Yup seems like a good approach. I'll take a look if we can just
>>> check if the function or group name already exists and return
>>> the existing selector in that case.
>> 
>> Ok, that would also solve the duplication issue.
> 
> Below is an incremental patch to check for existing entries.

Thanks!

> Care
> to test again?

Yes, asap. It is the most critical bug I currently know for all our OMAP
devices...

> 
> If that works, I'll fold it into the patch series and repost the
> whole series.
> 
>> On the other hand we still have a stale entry if the probing process
>> finally fails after several attempts.
>> 
>> This may happen if a driver with a valid DT entry is blacklisted in
>> /etc/modprobe.d/blacklist.conf. Then, the kernel will try to modprobe
>> it several times through udev until it gives up. The reason seems to
>> be that the deferred probing thread does not know why the driver did
>> not probe successfully.
> 
> Hmm I think this might be fixed then too. Then on pinctrl module
> unbind/unload we should free the radix tree entries if that is
> not yet done. Seems we may only free them on -ENOMEM right now.

Yes, that could do proper cleanup.

I think all the issues were introduced by deferred probing and the
pinctrl code is safe if that does not exist... So we probably should
think about backporting to stable. But let's test it outside the
trees and have it mature in linux-next for a while.

BR and thanks,
Nikolaus


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Letux-kernel] BUG: drivers/pinctrl/core: races in pinctrl_groups and deferred probing
       [not found]                     ` <CA+Ot1OxNj30KgHWJHXgWbnXYkNXTNvd_t-b6+uG3u=z8+pZf2Q@mail.gmail.com>
@ 2018-06-18 11:51                       ` Tony Lindgren
  2018-06-18 16:43                         ` H. Nikolaus Schaller
  0 siblings, 1 reply; 22+ messages in thread
From: Tony Lindgren @ 2018-06-18 11:51 UTC (permalink / raw)
  To: Christ van Willegen
  Cc: Discussions about the Letux Kernel, Linus Walleij,
	Linux Kernel Mailing List, open list:GPIO SUBSYSTEM,
	Andy Shevchenko, kernel

* Christ van Willegen <cvwillegen@gmail.com> [180618 10:01]:
> > --- a/drivers/pinctrl/core.c
> > +++ b/drivers/pinctrl/core.c
> > +               if (!strcmp(function, gname))
> 
> 
> This could never fail? gname guaranteed to never == NULL?

Thanks I'll add checks to make sure we always have a valid
name for new functions and groups added.

Regards,

Tony



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Letux-kernel] BUG: drivers/pinctrl/core: races in pinctrl_groups and deferred probing
  2018-06-18 11:51                       ` Tony Lindgren
@ 2018-06-18 16:43                         ` H. Nikolaus Schaller
  2018-06-18 18:17                           ` Tony Lindgren
  0 siblings, 1 reply; 22+ messages in thread
From: H. Nikolaus Schaller @ 2018-06-18 16:43 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Christ van Willegen, Linus Walleij, Linux Kernel Mailing List,
	open list:GPIO SUBSYSTEM, Andy Shevchenko, kernel,
	Discussions about the Letux Kernel

Hi Tony,

> Am 18.06.2018 um 13:51 schrieb Tony Lindgren <tony@atomide.com>:
> 
> * Christ van Willegen <cvwillegen@gmail.com> [180618 10:01]:
>>> --- a/drivers/pinctrl/core.c
>>> +++ b/drivers/pinctrl/core.c
>>> +               if (!strcmp(function, gname))
>> 
>> 
>> This could never fail? gname guaranteed to never == NULL?

If there are no duplicates and stale entries any more, gname must
be valid.

> 
> Thanks I'll add checks to make sure we always have a valid
> name for new functions and groups added.

Well, not having checks is the cheapest debugging tool that does not
even need to be enabled because strcmp(NULL) checks come for free :)

Jokes aside, I now have tested the latest patch and it seems to solve the issues :) :) :)

I can also demonstrate that the duplication has gone:

root@letux:~# dmesg|fgrep generic_add
[    0.747436] pinctrl_generic_add_group: pctldev=ee446180 name=pinmux_hsusb2_pins selector=0
[    0.750671] pinctrl_generic_add_group: pctldev=ee486f80 name=pinmux_hsusb2_2_pins selector=0
[    0.751403] pinctrl_generic_add_group: pctldev=ee486e00 name=pinmux_mcbsp1_devconf0_pins selector=0
[    0.752227] pinctrl_generic_add_group: pctldev=ee486d00 name=pinmux_tv_acbias_devconf1_pins selector=0
[    0.769104] pinctrl_generic_add_group: pctldev=ee446180 name=pinmux_uart1_pins selector=1
[    0.770874] pinctrl_generic_add_group: pctldev=ee446180 name=pinmux_uart2_pins selector=2
[    0.772003] pinctrl_generic_add_group: pctldev=ee446180 name=pinmux_uart3_pins selector=3
[    1.799316] pinctrl_generic_add_group: pctldev=ee446180 name=pinmux_twl4030_pins selector=4
[    1.808624] pinctrl_generic_add_group: pctldev=ee446000 name=pinmux_twl4030_vpins selector=0
[    1.978912] pinctrl_generic_add_group: pctldev=ee486f80 name=spi_gpio_pinmux selector=1
[    2.173522] pinctrl_generic_add_group: pctldev=ee446180 name=pinmux_mmc1_pins selector=5
[    2.207916] pinctrl_generic_add_group: pctldev=ee446180 name=pinmux_wlan_irq_pin selector=6
[    2.383666] pinctrl_generic_add_group: pctldev=ee446180 name=pinmux_gpmc_pins selector=7
[    2.778808] pinctrl_generic_add_group: pctldev=ee446180 name=pinmux_wlan_pins selector=8
[    2.794708] pinctrl_generic_add_group: pctldev=ee446180 name=pinmux_bt_pins selector=9
[    5.405273] pinctrl_generic_add_group: pctldev=ee446180 name=pinmux_dss_dpi_pins selector=10
[    6.051666] pinctrl_generic_add_group: pctldev=ee446180 name=pinmux_mcbsp1_pins selector=11
[    6.136688] pinctrl_generic_add_group: pctldev=ee446180 name=pinmux_mcbsp2_pins selector=12
[    6.222015] pinctrl_generic_add_group: pctldev=ee446180 name=pinmux_mcbsp3_pins selector=13
[    6.261199] pinctrl_generic_add_group: pctldev=ee446180 name=pinmux_penirq_pins selector=14
[    6.288787] pinctrl_generic_add_group: pctldev=ee446180 name=pinmux_mcbsp4_pins selector=15
[    6.336700] pinctrl_generic_add_group: pctldev=ee446180 name=hdq_pins selector=16
[    6.607788] pinctrl_generic_add_group: pctldev=ee446180 name=pinmux_camera_pins selector=17
[    6.900848] pinctrl_generic_add_group: pctldev=ee446180 name=backlight_pins_pinmux selector=18
[    7.298217] pinctrl_generic_add_group: pctldev=ee446180 name=modem_pins selector=19
[    8.134826] pinctrl_generic_add_group: pctldev=ee446180 name=pinmux_pps_pins selector=20
root@letux:~#

And I was no longer able to reproduce the strcmp(NULL) issue. So it is either better hidden
or gone.

So code just needs group cleanup on failed probing and fixing the mutex around pinctrl_generic_add_group().

I think we need the mutex because a race still can happen when create_pinctrl() is calling pcs_dt_node_to_map()
and pinctrl_generic_add_group() w/o being locked on pinctrl_maps_mutex.

The race I suspect is that two drivers are trying to insert the same name and may come
both to the conclusion that it does not yet exist. And both insert into the radix tree.

The window of risk is small though... It is in pinctrl_generic_add_group() between calling
pinctrl_generic_group_name_to_selector() and radix_tree_insert() so we probably won't
see it in real hardware tests.

Anyhow, I'll apply the patches as "hacks" to the LetuxOS kernel (4.17.x and 4.18-rc1) so that
more users can test. And we can replace them if they are finally accepted upstream.

BR and thanks for fixing the bug,
Nikolaus


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Letux-kernel] BUG: drivers/pinctrl/core: races in pinctrl_groups and deferred probing
  2018-06-18 16:43                         ` H. Nikolaus Schaller
@ 2018-06-18 18:17                           ` Tony Lindgren
  2018-06-18 18:30                             ` H. Nikolaus Schaller
  0 siblings, 1 reply; 22+ messages in thread
From: Tony Lindgren @ 2018-06-18 18:17 UTC (permalink / raw)
  To: H. Nikolaus Schaller
  Cc: Christ van Willegen, Linus Walleij, Linux Kernel Mailing List,
	open list:GPIO SUBSYSTEM, Andy Shevchenko, kernel,
	Discussions about the Letux Kernel

* H. Nikolaus Schaller <hns@goldelico.com> [180618 16:46]:

> 
> I can also demonstrate that the duplication has gone:

OK good to hear.

> And I was no longer able to reproduce the strcmp(NULL) issue. So it is either better hidden
> or gone.

It should not be possible with checks preventing registering
a group or function with no name. I'll try to repost the whole
series tomorrow with that added.

> So code just needs group cleanup on failed probing and fixing the mutex around pinctrl_generic_add_group().
> 
> I think we need the mutex because a race still can happen when create_pinctrl() is calling pcs_dt_node_to_map()
> and pinctrl_generic_add_group() w/o being locked on pinctrl_maps_mutex.
> 
> The race I suspect is that two drivers are trying to insert the same name and may come
> both to the conclusion that it does not yet exist. And both insert into the radix tree.
> 
> The window of risk is small though... It is in pinctrl_generic_add_group() between calling
> pinctrl_generic_group_name_to_selector() and radix_tree_insert() so we probably won't
> see it in real hardware tests.

Hmm but that race should be already fixed with mutex held
by the pin controller drivers with these fixes? Or am I
missing something still?

Regards,

Tony

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Letux-kernel] BUG: drivers/pinctrl/core: races in pinctrl_groups and deferred probing
  2018-06-18 18:17                           ` Tony Lindgren
@ 2018-06-18 18:30                             ` H. Nikolaus Schaller
  2018-06-19  4:34                               ` Tony Lindgren
  0 siblings, 1 reply; 22+ messages in thread
From: H. Nikolaus Schaller @ 2018-06-18 18:30 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Christ van Willegen, Linus Walleij, Linux Kernel Mailing List,
	open list:GPIO SUBSYSTEM, Andy Shevchenko, kernel,
	Discussions about the Letux Kernel

Hi Tony,

> Am 18.06.2018 um 20:17 schrieb Tony Lindgren <tony@atomide.com>:
> 
> * H. Nikolaus Schaller <hns@goldelico.com> [180618 16:46]:
> 
>> 
>> I can also demonstrate that the duplication has gone:
> 
> OK good to hear.
> 
>> And I was no longer able to reproduce the strcmp(NULL) issue. So it is either better hidden
>> or gone.
> 
> It should not be possible with checks preventing registering
> a group or function with no name. I'll try to repost the whole
> series tomorrow with that added.

Fine.

> 
>> So code just needs group cleanup on failed probing and fixing the mutex around pinctrl_generic_add_group().
>> 
>> I think we need the mutex because a race still can happen when create_pinctrl() is calling pcs_dt_node_to_map()
>> and pinctrl_generic_add_group() w/o being locked on pinctrl_maps_mutex.
>> 
>> The race I suspect is that two drivers are trying to insert the same name and may come
>> both to the conclusion that it does not yet exist. And both insert into the radix tree.
>> 
>> The window of risk is small though... It is in pinctrl_generic_add_group() between calling
>> pinctrl_generic_group_name_to_selector() and radix_tree_insert() so we probably won't
>> see it in real hardware tests.
> 
> Hmm but that race should be already fixed with mutex held
> by the pin controller drivers with these fixes? Or am I
> missing something still?

Hm. Maybe we refer to a different mutex?

I had seen the call sequence

create_pinctrl()-> pinctrl_dt_to_map() -> pcs_dt_node_to_map() -> pinctrl_generic_add_group()

w/o any lock inside.

There is a mutex_lock(&pinctrl_maps_mutex); in create_pinctrl(), but locked after that.

Or is there a lock outside of create_pinctrl()?

If I look into the stack dumps, call nesting is

driver_probe_device() -> pinctrl_bind_pins() -> devm_pinctrl_get() -> create_pinctrl()

They all do no locking.

Maybe I am missing something.

BR,
Nikolaus



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Letux-kernel] BUG: drivers/pinctrl/core: races in pinctrl_groups and deferred probing
  2018-06-18 18:30                             ` H. Nikolaus Schaller
@ 2018-06-19  4:34                               ` Tony Lindgren
  2018-06-19  4:51                                 ` H. Nikolaus Schaller
  0 siblings, 1 reply; 22+ messages in thread
From: Tony Lindgren @ 2018-06-19  4:34 UTC (permalink / raw)
  To: H. Nikolaus Schaller
  Cc: Christ van Willegen, Linus Walleij, Linux Kernel Mailing List,
	open list:GPIO SUBSYSTEM, Andy Shevchenko, kernel,
	Discussions about the Letux Kernel

* H. Nikolaus Schaller <hns@goldelico.com> [180618 18:33]:
> >> So code just needs group cleanup on failed probing and fixing the mutex around pinctrl_generic_add_group().
> >> 
> >> I think we need the mutex because a race still can happen when create_pinctrl() is calling pcs_dt_node_to_map()
> >> and pinctrl_generic_add_group() w/o being locked on pinctrl_maps_mutex.
> >> 
> >> The race I suspect is that two drivers are trying to insert the same name and may come
> >> both to the conclusion that it does not yet exist. And both insert into the radix tree.
> >> 
> >> The window of risk is small though... It is in pinctrl_generic_add_group() between calling
> >> pinctrl_generic_group_name_to_selector() and radix_tree_insert() so we probably won't
> >> see it in real hardware tests.
> > 
> > Hmm but that race should be already fixed with mutex held
> > by the pin controller drivers with these fixes? Or am I
> > missing something still?
> 
> Hm. Maybe we refer to a different mutex?

Yes I think that's the case, you're talking about a different
mutex here :)

> I had seen the call sequence
> 
> create_pinctrl()-> pinctrl_dt_to_map() -> pcs_dt_node_to_map() -> pinctrl_generic_add_group()
> 
> w/o any lock inside.
> 
> There is a mutex_lock(&pinctrl_maps_mutex); in create_pinctrl(), but locked after that.
> 
> Or is there a lock outside of create_pinctrl()?
> 
> If I look into the stack dumps, call nesting is
> 
> driver_probe_device() -> pinctrl_bind_pins() -> devm_pinctrl_get() -> create_pinctrl()
> 
> They all do no locking.
> 
> Maybe I am missing something.

Can you please post a patch for that as you already have it
debugged? That's easier to understand than reading a verbal
patch :)

Regards,

Tony

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Letux-kernel] BUG: drivers/pinctrl/core: races in pinctrl_groups and deferred probing
  2018-06-19  4:34                               ` Tony Lindgren
@ 2018-06-19  4:51                                 ` H. Nikolaus Schaller
  2018-06-19  6:11                                   ` Tony Lindgren
  0 siblings, 1 reply; 22+ messages in thread
From: H. Nikolaus Schaller @ 2018-06-19  4:51 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Christ van Willegen, Linus Walleij, Linux Kernel Mailing List,
	open list:GPIO SUBSYSTEM, Andy Shevchenko, kernel,
	Discussions about the Letux Kernel

Hi Tony,

> Am 19.06.2018 um 06:34 schrieb Tony Lindgren <tony@atomide.com>:
> 
> * H. Nikolaus Schaller <hns@goldelico.com> [180618 18:33]:
>>>> So code just needs group cleanup on failed probing and fixing the mutex around pinctrl_generic_add_group().
>>>> 
>>>> I think we need the mutex because a race still can happen when create_pinctrl() is calling pcs_dt_node_to_map()
>>>> and pinctrl_generic_add_group() w/o being locked on pinctrl_maps_mutex.
>>>> 
>>>> The race I suspect is that two drivers are trying to insert the same name and may come
>>>> both to the conclusion that it does not yet exist. And both insert into the radix tree.
>>>> 
>>>> The window of risk is small though... It is in pinctrl_generic_add_group() between calling
>>>> pinctrl_generic_group_name_to_selector() and radix_tree_insert() so we probably won't
>>>> see it in real hardware tests.
>>> 
>>> Hmm but that race should be already fixed with mutex held
>>> by the pin controller drivers with these fixes? Or am I
>>> missing something still?
>> 
>> Hm. Maybe we refer to a different mutex?
> 
> Yes I think that's the case, you're talking about a different
> mutex here :)
> 
>> I had seen the call sequence
>> 
>> create_pinctrl()-> pinctrl_dt_to_map() -> pcs_dt_node_to_map() -> pinctrl_generic_add_group()
>> 
>> w/o any lock inside.
>> 
>> There is a mutex_lock(&pinctrl_maps_mutex); in create_pinctrl(), but locked after that.
>> 
>> Or is there a lock outside of create_pinctrl()?
>> 
>> If I look into the stack dumps, call nesting is
>> 
>> driver_probe_device() -> pinctrl_bind_pins() -> devm_pinctrl_get() -> create_pinctrl()
>> 
>> They all do no locking.
>> 
>> Maybe I am missing something.
> 
> Can you please post a patch for that as you already have it
> debugged? That's easier to understand than reading a verbal
> patch :)

I have no patch for it and all tests were without, but I can suggest a change which IMHO
could solve it:

diff --git a/drivers/pinctrl/core.c b/drivers/pinctrl/core.c
index eb2b217f5e1e..7d125f9a7804 100644
--- a/drivers/pinctrl/core.c
+++ b/drivers/pinctrl/core.c
@@ -1037,15 +1037,16 @@ static struct pinctrl *create_pinctrl(struct device *dev,
        INIT_LIST_HEAD(&p->states);
        INIT_LIST_HEAD(&p->dt_maps);
 
+       mutex_lock(&pinctrl_maps_mutex);
        ret = pinctrl_dt_to_map(p, pctldev);
        if (ret < 0) {
                kfree(p);
+               mutex_unlock(&pinctrl_maps_mutex);
                return ERR_PTR(ret);
        }
 
        devname = dev_name(dev);
 
-       mutex_lock(&pinctrl_maps_mutex);
        /* Iterate over the pin control maps to locate the right ones */
        for_each_maps(maps_node, i, map) {
                /* Map must be for this device */

Description: we should also protect pinctrl_dt_to_map(), which calls pinctrl_generic_add_group()
and the calls inside pinctrl_generic_add_group() to pinctrl_generic_group_name_to_selector()
and radix_tree_insert() with a mutex.

But I haven't tested this case (because it is unlikely to happen and be testable).

BR,
Nikolaus


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [Letux-kernel] BUG: drivers/pinctrl/core: races in pinctrl_groups and deferred probing
  2018-06-19  4:51                                 ` H. Nikolaus Schaller
@ 2018-06-19  6:11                                   ` Tony Lindgren
  2018-06-19  6:39                                     ` H. Nikolaus Schaller
  0 siblings, 1 reply; 22+ messages in thread
From: Tony Lindgren @ 2018-06-19  6:11 UTC (permalink / raw)
  To: H. Nikolaus Schaller
  Cc: Christ van Willegen, Linus Walleij, Linux Kernel Mailing List,
	open list:GPIO SUBSYSTEM, Andy Shevchenko, kernel,
	Discussions about the Letux Kernel

* H. Nikolaus Schaller <hns@goldelico.com> [180619 04:54]:
> >> I had seen the call sequence
> >> 
> >> create_pinctrl()-> pinctrl_dt_to_map() -> pcs_dt_node_to_map() -> pinctrl_generic_add_group()
> >> 
> >> w/o any lock inside.

So the sequence above has mutex added around adding the pin
controller specific functions and groups by the patch series
I posted for both pcs_parse_bits_in_pinctrl_entry() and
pcs_parse_one_pinctrl_entry(). So I think the above should
be fixed now. But please confirm to make sure I'm not mistaken.

> >> There is a mutex_lock(&pinctrl_maps_mutex); in create_pinctrl(), but locked after that.
> >> 
> >> Or is there a lock outside of create_pinctrl()?
> >> 
> >> If I look into the stack dumps, call nesting is
> >> 
> >> driver_probe_device() -> pinctrl_bind_pins() -> devm_pinctrl_get() -> create_pinctrl()
> >> 
> >> They all do no locking.
> >> 
> >> Maybe I am missing something.
> > 
> > Can you please post a patch for that as you already have it
> > debugged? That's easier to understand than reading a verbal
> > patch :)
> 
> I have no patch for it and all tests were without, but I can suggest a change which IMHO
> could solve it:
> 
> diff --git a/drivers/pinctrl/core.c b/drivers/pinctrl/core.c
> index eb2b217f5e1e..7d125f9a7804 100644
> --- a/drivers/pinctrl/core.c
> +++ b/drivers/pinctrl/core.c
> @@ -1037,15 +1037,16 @@ static struct pinctrl *create_pinctrl(struct device *dev,
>         INIT_LIST_HEAD(&p->states);
>         INIT_LIST_HEAD(&p->dt_maps);
>  
> +       mutex_lock(&pinctrl_maps_mutex);
>         ret = pinctrl_dt_to_map(p, pctldev);
>         if (ret < 0) {
>                 kfree(p);
> +               mutex_unlock(&pinctrl_maps_mutex);
>                 return ERR_PTR(ret);
>         }
>  
>         devname = dev_name(dev);
>  
> -       mutex_lock(&pinctrl_maps_mutex);
>         /* Iterate over the pin control maps to locate the right ones */
>         for_each_maps(maps_node, i, map) {
>                 /* Map must be for this device */
> 
> Description: we should also protect pinctrl_dt_to_map(), which calls pinctrl_generic_add_group()
> and the calls inside pinctrl_generic_add_group() to pinctrl_generic_group_name_to_selector()
> and radix_tree_insert() with a mutex.
> 
> But I haven't tested this case (because it is unlikely to happen and be testable).

Hmm not sure about this one. The groups and functions are
specific to the pin controller driver and must be locked
by the pin controller driver like this series does. So for
pinctrl_generic_add_group() moving the pinctrl_maps_mutex
earlier should not help. We already do the locking for
generic functions and groups like I described above.

For pinctrl maps moving the mutex earlier might make sense
though :) But aren't we doing that already where the list
gets modified in pinctrl_register_map() and when searching
through the list with for_each_maps? Linus got any comments
on this one?

Regards,

Tony

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Letux-kernel] BUG: drivers/pinctrl/core: races in pinctrl_groups and deferred probing
  2018-06-19  6:11                                   ` Tony Lindgren
@ 2018-06-19  6:39                                     ` H. Nikolaus Schaller
  2018-06-19  7:15                                       ` Tony Lindgren
  0 siblings, 1 reply; 22+ messages in thread
From: H. Nikolaus Schaller @ 2018-06-19  6:39 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Christ van Willegen, Linus Walleij, Linux Kernel Mailing List,
	open list:GPIO SUBSYSTEM, Andy Shevchenko, kernel,
	Discussions about the Letux Kernel

Hi Tony,

> Am 19.06.2018 um 08:11 schrieb Tony Lindgren <tony@atomide.com>:
> 
> * H. Nikolaus Schaller <hns@goldelico.com> [180619 04:54]:
>>>> I had seen the call sequence
>>>> 
>>>> 
>>>> 
>>>> w/o any lock inside.
> 
> So the sequence above has mutex added around adding the pin
> controller specific functions and groups by the patch series
> I posted for both pcs_parse_bits_in_pinctrl_entry() and
> pcs_parse_one_pinctrl_entry(). So I think the above should
> be fixed now. But please confirm to make sure I'm not mistaken.

Ah, now I see.

My dump_stack() added to pinctrl_generic_add_group() reported

[    6.155487] Hardware name: Generic OMAP36xx (Flattened Device Tree)
[    6.162048] [<c01106ec>] (unwind_backtrace) from [<c010c058>] (show_stack+0x10/0x14)
[    6.170166] [<c010c058>] (show_stack) from [<c074bc28>] (dump_stack+0x7c/0x9c)
[    6.177734] [<c074bc28>] (dump_stack) from [<c042bbec>] (pinctrl_generic_add_group+0x48/0x90)
[    6.186614] [<c042bbec>] (pinctrl_generic_add_group) from [<c043203c>] (pcs_dt_node_to_map+0x4b0/0x81c)
[    6.196441] [<c043203c>] (pcs_dt_node_to_map) from [<c042ffd4>] (pinctrl_dt_to_map+0x1ec/0x2b8)
[    6.205535] [<c042ffd4>] (pinctrl_dt_to_map) from [<c042d028>] (create_pinctrl+0x58/0x2f8)
[    6.214141] [<c042d028>] (create_pinctrl) from [<c042d388>] (devm_pinctrl_get+0x2c/0x6c)
[    6.222625] [<c042d388>] (devm_pinctrl_get) from [<c04f2e9c>] (pinctrl_bind_pins+0x3c/0x138)
[    6.231445] [<c04f2e9c>] (pinctrl_bind_pins) from [<c04d3eb8>] (driver_probe_device+0xe8/0x318)
[    6.240509] [<c04d3eb8>] (driver_probe_device) from [<c04d4168>] (__driver_attach+0x80/0xa4)
[    6.249328] [<c04d4168>] (__driver_attach) from [<c04d264c>] (bus_for_each_dev+0x58/0x7c)

Apparently I didn't notice that pcs_parse_*_pinctrl_entry() are called
inside pcs_dt_node_to_map() and are part of the call sequence.

Hence your new mutex locks calls to pinctrl_generic_add_group() as
needed.

Obviously, the compiler has optimized away the nested calls to static
functions and I had no previous experience with how the whole pinctrl code
works (learning by debugging :).

So it looks sane and no need for further locks.

BR and thanks,
Nikolaus


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Letux-kernel] BUG: drivers/pinctrl/core: races in pinctrl_groups and deferred probing
  2018-06-19  6:39                                     ` H. Nikolaus Schaller
@ 2018-06-19  7:15                                       ` Tony Lindgren
  0 siblings, 0 replies; 22+ messages in thread
From: Tony Lindgren @ 2018-06-19  7:15 UTC (permalink / raw)
  To: H. Nikolaus Schaller
  Cc: Christ van Willegen, Linus Walleij, Linux Kernel Mailing List,
	open list:GPIO SUBSYSTEM, Andy Shevchenko, kernel,
	Discussions about the Letux Kernel

* H. Nikolaus Schaller <hns@goldelico.com> [180619 06:42]:
> Hi Tony,
> 
> > Am 19.06.2018 um 08:11 schrieb Tony Lindgren <tony@atomide.com>:
> > 
> > * H. Nikolaus Schaller <hns@goldelico.com> [180619 04:54]:
> >>>> I had seen the call sequence
> >>>> 
> >>>> 
> >>>> 
> >>>> w/o any lock inside.
> > 
> > So the sequence above has mutex added around adding the pin
> > controller specific functions and groups by the patch series
> > I posted for both pcs_parse_bits_in_pinctrl_entry() and
> > pcs_parse_one_pinctrl_entry(). So I think the above should
> > be fixed now. But please confirm to make sure I'm not mistaken.
> 
> Ah, now I see.
> 
> My dump_stack() added to pinctrl_generic_add_group() reported
> 
> [    6.155487] Hardware name: Generic OMAP36xx (Flattened Device Tree)
> [    6.162048] [<c01106ec>] (unwind_backtrace) from [<c010c058>] (show_stack+0x10/0x14)
> [    6.170166] [<c010c058>] (show_stack) from [<c074bc28>] (dump_stack+0x7c/0x9c)
> [    6.177734] [<c074bc28>] (dump_stack) from [<c042bbec>] (pinctrl_generic_add_group+0x48/0x90)
> [    6.186614] [<c042bbec>] (pinctrl_generic_add_group) from [<c043203c>] (pcs_dt_node_to_map+0x4b0/0x81c)
> [    6.196441] [<c043203c>] (pcs_dt_node_to_map) from [<c042ffd4>] (pinctrl_dt_to_map+0x1ec/0x2b8)
> [    6.205535] [<c042ffd4>] (pinctrl_dt_to_map) from [<c042d028>] (create_pinctrl+0x58/0x2f8)
> [    6.214141] [<c042d028>] (create_pinctrl) from [<c042d388>] (devm_pinctrl_get+0x2c/0x6c)
> [    6.222625] [<c042d388>] (devm_pinctrl_get) from [<c04f2e9c>] (pinctrl_bind_pins+0x3c/0x138)
> [    6.231445] [<c04f2e9c>] (pinctrl_bind_pins) from [<c04d3eb8>] (driver_probe_device+0xe8/0x318)
> [    6.240509] [<c04d3eb8>] (driver_probe_device) from [<c04d4168>] (__driver_attach+0x80/0xa4)
> [    6.249328] [<c04d4168>] (__driver_attach) from [<c04d264c>] (bus_for_each_dev+0x58/0x7c)
> 
> Apparently I didn't notice that pcs_parse_*_pinctrl_entry() are called
> inside pcs_dt_node_to_map() and are part of the call sequence.
> 
> Hence your new mutex locks calls to pinctrl_generic_add_group() as
> needed.
> 
> Obviously, the compiler has optimized away the nested calls to static
> functions and I had no previous experience with how the whole pinctrl code
> works (learning by debugging :).
> 
> So it looks sane and no need for further locks.

OK thanks for checking it.

Regards,

Tony

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2018-06-19  7:15 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-13 12:39 BUG: drivers/pinctrl/core: races in pinctrl_groups and deferred probing H. Nikolaus Schaller
2018-06-14 12:01 ` Tony Lindgren
2018-06-14 12:12   ` H. Nikolaus Schaller
2018-06-15  6:58     ` Tony Lindgren
2018-06-15 11:13       ` Tony Lindgren
2018-06-15 11:18         ` H. Nikolaus Schaller
2018-06-16 11:07           ` [Letux-kernel] " H. Nikolaus Schaller
2018-06-18  6:46             ` Tony Lindgren
2018-06-18  8:22             ` Andy Shevchenko
2018-06-18  9:14               ` Tony Lindgren
2018-06-18  9:29                 ` H. Nikolaus Schaller
2018-06-18  9:54                   ` Tony Lindgren
2018-06-18  9:59                     ` H. Nikolaus Schaller
     [not found]                     ` <CA+Ot1OxNj30KgHWJHXgWbnXYkNXTNvd_t-b6+uG3u=z8+pZf2Q@mail.gmail.com>
2018-06-18 11:51                       ` Tony Lindgren
2018-06-18 16:43                         ` H. Nikolaus Schaller
2018-06-18 18:17                           ` Tony Lindgren
2018-06-18 18:30                             ` H. Nikolaus Schaller
2018-06-19  4:34                               ` Tony Lindgren
2018-06-19  4:51                                 ` H. Nikolaus Schaller
2018-06-19  6:11                                   ` Tony Lindgren
2018-06-19  6:39                                     ` H. Nikolaus Schaller
2018-06-19  7:15                                       ` Tony Lindgren

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).