linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Lockdep warning caused by "driver core: Fix sleeping in invalid context during device link deletion"
@ 2020-08-21  3:35 Dong Aisheng
  2020-08-21 18:28 ` Saravana Kannan
  2020-08-27  5:17 ` Saravana Kannan
  0 siblings, 2 replies; 7+ messages in thread
From: Dong Aisheng @ 2020-08-21  3:35 UTC (permalink / raw)
  To: open list, linux, gregkh, saravanak, m.szyprowski, naresh.kamboju
  Cc: dl-linux-imx,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE,
	Dong Aisheng, Shawn Guo, Sascha Hauer

Hi ALL,

We met the below WARNING during system suspend on an iMX6Q SDB board
with the latest linus/master branch (v5.9-rc1+) and next-20200820.
v5.8 kernel is ok. So i did bisect and finally found it's caused by
the patch below.
Reverting it can get rid of the warning, but I wonder if there may be
other potential issues.
Any ideas?

Defconfig used is: imx_v6_v7_defconfig

commit 843e600b8a2b01463c4d873a90b2c2ea8033f1f6
Author: Saravana Kannan <saravanak@google.com>
Date:   Thu Jul 16 14:45:23 2020 -0700

    driver core: Fix sleeping in invalid context during device link deletion

    Marek and Guenter reported that commit 287905e68dd2 ("driver core:
    Expose device link details in sysfs") caused sleeping/scheduling while
    atomic warnings.

    BUG: sleeping function called from invalid context at
kernel/locking/mutex.c:935
    in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 12, name: kworker/0:1
    2 locks held by kworker/0:1/12:
      #0: ee8074a8 ((wq_completion)rcu_gp){+.+.}-{0:0}, at:
process_one_work+0x174/0x7dc
      #1: ee921f20 ((work_completion)(&sdp->work)){+.+.}-{0:0}, at:
process_one_work+0x174/0x7dc
    Preemption disabled at:
    [<c01b10f0>] srcu_invoke_callbacks+0xc0/0x154
    ----- 8< ----- SNIP
    [<c064590c>] (device_del) from [<c0645c9c>] (device_unregister+0x24/0x64)
    [<c0645c9c>] (device_unregister) from [<c01b10fc>]
(srcu_invoke_callbacks+0xcc/0x154)
    [<c01b10fc>] (srcu_invoke_callbacks) from [<c01493c4>]
(process_one_work+0x234/0x7dc)
    [<c01493c4>] (process_one_work) from [<c01499b0>] (worker_thread+0x44/0x51c)
    [<c01499b0>] (worker_thread) from [<c0150bf4>] (kthread+0x158/0x1a0)
    [<c0150bf4>] (kthread) from [<c0100114>] (ret_from_fork+0x14/0x20)
    Exception stack(0xee921fb0 to 0xee921ff8)

    This was caused by the device link device being released in the context
    of srcu_invoke_callbacks().  There is no need to wait till the RCU
    callback to release the device link device.  So release the device
    earlier and move the call_srcu() into the device release code. That way,
    the memory will get freed only after the device is released AND the RCU
    callback is called.

    Fixes: 287905e68dd2 ("driver core: Expose device link details in sysfs")
    Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
    Reported-by: Guenter Roeck <linux@roeck-us.net>
    Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
    Signed-off-by: Saravana Kannan <saravanak@google.com>
    Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
    Tested-by: Guenter Roeck <linux@roeck-us.net>
    Link: https://lore.kernel.org/r/20200716214523.2924704-1-saravanak@google.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Error log:
# echo mem > /sys/power/state
[   39.111865] PM: suspend entry (deep)
[   39.148650] Filesystems sync: 0.032 seconds
[   39.154034]
[   39.155537] ======================================================
[   39.161723] WARNING: possible circular locking dependency detected
[   39.167911] 5.9.0-rc1-00103-g7eac66d0456f #37 Not tainted
[   39.173315] ------------------------------------------------------
[   39.179500] sh/647 is trying to acquire lock:
[   39.183862] c15a310c (dpm_list_mtx){+.+.}-{3:3}, at:
dpm_for_each_dev+0x20/0x5c
[   39.191200]
[   39.191200] but task is already holding lock:
[   39.197036] c15a37e4 (fw_lock){+.+.}-{3:3}, at: fw_pm_notify+0x90/0xd4
[   39.203582]
[   39.203582] which lock already depends on the new lock.
[   39.203582]
[   39.211763]
[   39.211763] the existing dependency chain (in reverse order) is:
[   39.219249]
[   39.219249] -> #2 (fw_lock){+.+.}-{3:3}:
[   39.224673]        mutex_lock_nested+0x1c/0x24
[   39.229126]        firmware_uevent+0x18/0xa0
[   39.233411]        dev_uevent+0xc4/0x1f8
[   39.237343]        uevent_show+0x98/0x114
[   39.241362]        dev_attr_show+0x18/0x48
[   39.245472]        sysfs_kf_seq_show+0x84/0xec
[   39.249927]        seq_read+0x138/0x550
[   39.253774]        vfs_read+0x94/0x164
[   39.257529]        ksys_read+0x60/0xe8
[   39.261288]        ret_fast_syscall+0x0/0x28
[   39.265564]        0xbed7c808
[   39.268538]
[   39.268538] -> #1 (kn->active#3){++++}-{0:0}:
[   39.274391]        kernfs_remove_by_name_ns+0x40/0x94
[   39.279450]        device_del+0x144/0x3fc
[   39.283467]        __device_link_del+0x4c/0x70
[   39.287919]        device_link_remove+0x5c/0x8c
[   39.292464]        _regulator_put.part.0+0x104/0x1dc
[   39.297436]        regulator_put+0x2c/0x3c
[   39.299731] regulator regulator.5: Failed to increase supply voltage: -110
[   39.301544]        release_nodes+0x1b4/0x204
[   39.301553]        really_probe+0x104/0x3b4
[   39.316881]        driver_probe_device+0x58/0xb4
[   39.321506]        device_driver_attach+0x58/0x60
[   39.326217]        __driver_attach+0x58/0xd0
[   39.330499]        bus_for_each_dev+0x74/0xbc
[   39.334863]        bus_add_driver+0x150/0x1dc
[   39.339227]        driver_register+0x74/0x108
[   39.343599]        i2c_register_driver+0x38/0x8c
[   39.348227]        do_one_initcall+0x84/0x3b4
[   39.352598]        kernel_init_freeable+0x154/0x1e4
[   39.357485]        kernel_init+0x8/0x118
[   39.361415]        ret_from_fork+0x14/0x20
[   39.365518]        0x0
[   39.367883]
[   39.367883] -> #0 (dpm_list_mtx){+.+.}-{3:3}:
[   39.373740]        lock_acquire+0xe0/0x4ec
[   39.377848]        __mutex_lock+0x94/0x9d0
[   39.381952]        mutex_lock_nested+0x1c/0x24
[   39.386405]        dpm_for_each_dev+0x20/0x5c
[   39.390769]        fw_pm_notify+0xa4/0xd4
[   39.394795]        notifier_call_chain+0x48/0x80
[   39.399420]        __blocking_notifier_call_chain+0x48/0x64
[   39.405003]        __pm_notifier_call_chain+0x20/0x3c
[   39.410063]        pm_suspend+0x1ac/0x438
[   39.414080]        state_store+0x68/0xc8
[   39.418013]        kernfs_fop_write+0x10c/0x22c
[   39.419741] cpu cpu0: failed to scale vddpu up: -110
[   39.422551]        vfs_write+0xbc/0x1d8
[   39.422559]        ksys_write+0x60/0xe8
[   39.427529] cpufreq: __target_index: Failed to change cpu frequency: -110
[   39.431362]        ret_fast_syscall+0x0/0x28
[   39.431368]        0xbeec8958
[   39.431372]
[   39.431372] other info that might help us debug this:
[   39.431372]
[   39.431375] Chain exists of:
[   39.431375]   dpm_list_mtx --> kn->active#3 --> fw_lock
[   39.431375]
[   39.431390]  Possible unsafe locking scenario:
[   39.431390]
[   39.431394]        CPU0                    CPU1
[   39.431398]        ----                    ----
[   39.431401]   lock(fw_lock);
[   39.431412]                                lock(kn->active#3);
[   39.490528]                                lock(fw_lock);
[   39.495934]   lock(dpm_list_mtx);
[   39.499255]
[   39.499255]  *** DEADLOCK ***
[   39.499255]
[   39.505181] 6 locks held by sh/647:
[   39.508675]  #0: ecf48a84 (sb_writers#4){.+.+}-{0:0}, at:
vfs_write+0x14c/0x1d8
[   39.516007]  #1: ed2ced48 (&of->mutex){+.+.}-{3:3}, at:
kernfs_fop_write+0xd0/0x22c
[   39.523684]  #2: ec1ff960 (kn->active#90){.+.+}-{0:0}, at:
kernfs_fop_write+0xd8/0x22c
[   39.531620]  #3: c151c4e8 (system_transition_mutex){+.+.}-{3:3},
at: pm_suspend+0x11c/0x438
[   39.539991]  #4: c151e3f4 ((pm_chain_head).rwsem){++++}-{3:3}, at:
__blocking_notifier_call_chain+0x2c/0x64
[   39.549753] usb_otg_vbus: disabling
[   39.549755]  #5: c15a37e4 (fw_lock){+.+.}-{3:3}, at: fw_pm_notify+0x90/0xd4
[   39.553264] wm8962-supply: disabling
[   39.560214]
[   39.560214] stack backtrace:
[   39.560225] CPU: 0 PID: 647 Comm: sh Not tainted
5.9.0-rc1-00103-g7eac66d0456f #37
[   39.560230] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[   39.560260] [<c0112270>] (unwind_backtrace) from [<c010bfd4>]
(show_stack+0x10/0x14)
[   39.560287] [<c010bfd4>] (show_stack) from [<c05af3a0>]
(dump_stack+0xe4/0x118)
[   39.597365] [<c05af3a0>] (dump_stack) from [<c0185390>]
(check_noncircular+0x130/0x1e4)
[   39.605386] [<c0185390>] (check_noncircular) from [<c018880c>]
(__lock_acquire+0x161c/0x31b0)
[   39.613923] [<c018880c>] (__lock_acquire) from [<c018ad00>]
(lock_acquire+0xe0/0x4ec)
[   39.621772] [<c018ad00>] (lock_acquire) from [<c0db0bfc>]
(__mutex_lock+0x94/0x9d0)
[   39.629445] [<c0db0bfc>] (__mutex_lock) from [<c0db1554>]
(mutex_lock_nested+0x1c/0x24)
[   39.637463] [<c0db1554>] (mutex_lock_nested) from [<c07892dc>]
(dpm_for_each_dev+0x20/0x5c)
[   39.645828] [<c07892dc>] (dpm_for_each_dev) from [<c0794a60>]
(fw_pm_notify+0xa4/0xd4)
[   39.653762] [<c0794a60>] (fw_pm_notify) from [<c0152c7c>]
(notifier_call_chain+0x48/0x80)
[   39.661954] [<c0152c7c>] (notifier_call_chain) from [<c01532a8>]
(__blocking_notifier_call_chain+0x48/0x64)
[   39.671709] [<c01532a8>] (__blocking_notifier_call_chain) from
[<c0192498>] (__pm_notifier_call_chain+0x20/0x3c)
[   39.679740] cpu cpu0: failed to scale vddpu up: -110
[   39.681897] [<c0192498>] (__pm_notifier_call_chain) from
[<c0194424>] (pm_suspend+0x1ac/0x438)
[   39.686859] cpufreq: __target_index: Failed to change cpu frequency: -110
[   39.695473] [<c0194424>] (pm_suspend) from [<c019235c>]
(state_store+0x68/0xc8)
[   39.695490] [<c019235c>] (state_store) from [<c035af80>]
(kernfs_fop_write+0x10c/0x22c)
[   39.695506] [<c035af80>] (kernfs_fop_write) from [<c02b0e38>]
(vfs_write+0xbc/0x1d8)
[   39.695524] [<c02b0e38>] (vfs_write) from [<c02b10a0>] (ksys_write+0x60/0xe8)
[   39.704844] VGEN1: disabling
[   39.709627] [<c02b10a0>] (ksys_write) from [<c0100080>]
(ret_fast_syscall+0x0/0x28)
[   39.743035] Exception stack(0xed573fa8 to 0xed573ff0)
[   39.748098] 3fa0:                   00000004 01d3caf8 00000001
01d3caf8 00000004 00000000
[   39.756286] 3fc0: 00000004 01d3caf8 b6f40340 00000004 b6ed6c8c
00000000 00000000 00000000
[   39.764470] 3fe0: 00000004 beec8958 b6e75d4f b6e01d16
[   39.770874] VGEN2: disabling
[   39.776106] VGEN3: disabling

[   69.590256] cfg80211: failed to load regulatory.db
[   69.590312] Freezing user space processes ... (elapsed 0.008 seconds) done.
[   69.606341] OOM killer disabled.
[   69.609599] Freezing remaining freezable tasks ... (elapsed 0.001
seconds) done.
[   69.619021] printk: Suspending console(s) (use no_console_suspend to debug)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Lockdep warning caused by "driver core: Fix sleeping in invalid context during device link deletion"
  2020-08-21  3:35 Lockdep warning caused by "driver core: Fix sleeping in invalid context during device link deletion" Dong Aisheng
@ 2020-08-21 18:28 ` Saravana Kannan
  2020-08-24  8:18   ` Aisheng Dong
  2020-08-27  5:17 ` Saravana Kannan
  1 sibling, 1 reply; 7+ messages in thread
From: Saravana Kannan @ 2020-08-21 18:28 UTC (permalink / raw)
  To: Dong Aisheng
  Cc: open list, Guenter Roeck, gregkh, Marek Szyprowski,
	Naresh Kamboju, dl-linux-imx,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE,
	Dong Aisheng, Shawn Guo, Sascha Hauer

On Thu, Aug 20, 2020 at 8:50 PM Dong Aisheng <dongas86@gmail.com> wrote:
>
> Hi ALL,
>
> We met the below WARNING during system suspend on an iMX6Q SDB board
> with the latest linus/master branch (v5.9-rc1+) and next-20200820.
> v5.8 kernel is ok. So i did bisect and finally found it's caused by
> the patch below.
> Reverting it can get rid of the warning, but I wonder if there may be
> other potential issues.
> Any ideas?

Thanks for the report. I'll look into this more closely after Linux
Plumbers (next week). We can't just revert the patch you pointed out
because it's fixing another locking issue.

I'm not familiar with the code path of the firmware_uevent stuff. But
if I were to make a guess, the fix will probably be one of:
1. Not having to go through one of these code paths for the "device
link" device.
2. Rewriting the device_del() code in device_link_remove() code path.

If anyone has a fix, I'd be happy to review.

-Saravana

> Defconfig used is: imx_v6_v7_defconfig
>
> commit 843e600b8a2b01463c4d873a90b2c2ea8033f1f6
> Author: Saravana Kannan <saravanak@google.com>
> Date:   Thu Jul 16 14:45:23 2020 -0700
>
>     driver core: Fix sleeping in invalid context during device link deletion
>
>     Marek and Guenter reported that commit 287905e68dd2 ("driver core:
>     Expose device link details in sysfs") caused sleeping/scheduling while
>     atomic warnings.
>
>     BUG: sleeping function called from invalid context at
> kernel/locking/mutex.c:935
>     in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 12, name: kworker/0:1
>     2 locks held by kworker/0:1/12:
>       #0: ee8074a8 ((wq_completion)rcu_gp){+.+.}-{0:0}, at:
> process_one_work+0x174/0x7dc
>       #1: ee921f20 ((work_completion)(&sdp->work)){+.+.}-{0:0}, at:
> process_one_work+0x174/0x7dc
>     Preemption disabled at:
>     [<c01b10f0>] srcu_invoke_callbacks+0xc0/0x154
>     ----- 8< ----- SNIP
>     [<c064590c>] (device_del) from [<c0645c9c>] (device_unregister+0x24/0x64)
>     [<c0645c9c>] (device_unregister) from [<c01b10fc>]
> (srcu_invoke_callbacks+0xcc/0x154)
>     [<c01b10fc>] (srcu_invoke_callbacks) from [<c01493c4>]
> (process_one_work+0x234/0x7dc)
>     [<c01493c4>] (process_one_work) from [<c01499b0>] (worker_thread+0x44/0x51c)
>     [<c01499b0>] (worker_thread) from [<c0150bf4>] (kthread+0x158/0x1a0)
>     [<c0150bf4>] (kthread) from [<c0100114>] (ret_from_fork+0x14/0x20)
>     Exception stack(0xee921fb0 to 0xee921ff8)
>
>     This was caused by the device link device being released in the context
>     of srcu_invoke_callbacks().  There is no need to wait till the RCU
>     callback to release the device link device.  So release the device
>     earlier and move the call_srcu() into the device release code. That way,
>     the memory will get freed only after the device is released AND the RCU
>     callback is called.
>
>     Fixes: 287905e68dd2 ("driver core: Expose device link details in sysfs")
>     Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
>     Reported-by: Guenter Roeck <linux@roeck-us.net>
>     Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
>     Signed-off-by: Saravana Kannan <saravanak@google.com>
>     Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
>     Tested-by: Guenter Roeck <linux@roeck-us.net>
>     Link: https://lore.kernel.org/r/20200716214523.2924704-1-saravanak@google.com
>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>
> Error log:
> # echo mem > /sys/power/state
> [   39.111865] PM: suspend entry (deep)
> [   39.148650] Filesystems sync: 0.032 seconds
> [   39.154034]
> [   39.155537] ======================================================
> [   39.161723] WARNING: possible circular locking dependency detected
> [   39.167911] 5.9.0-rc1-00103-g7eac66d0456f #37 Not tainted
> [   39.173315] ------------------------------------------------------
> [   39.179500] sh/647 is trying to acquire lock:
> [   39.183862] c15a310c (dpm_list_mtx){+.+.}-{3:3}, at:
> dpm_for_each_dev+0x20/0x5c
> [   39.191200]
> [   39.191200] but task is already holding lock:
> [   39.197036] c15a37e4 (fw_lock){+.+.}-{3:3}, at: fw_pm_notify+0x90/0xd4
> [   39.203582]
> [   39.203582] which lock already depends on the new lock.
> [   39.203582]
> [   39.211763]
> [   39.211763] the existing dependency chain (in reverse order) is:
> [   39.219249]
> [   39.219249] -> #2 (fw_lock){+.+.}-{3:3}:
> [   39.224673]        mutex_lock_nested+0x1c/0x24
> [   39.229126]        firmware_uevent+0x18/0xa0
> [   39.233411]        dev_uevent+0xc4/0x1f8
> [   39.237343]        uevent_show+0x98/0x114
> [   39.241362]        dev_attr_show+0x18/0x48
> [   39.245472]        sysfs_kf_seq_show+0x84/0xec
> [   39.249927]        seq_read+0x138/0x550
> [   39.253774]        vfs_read+0x94/0x164
> [   39.257529]        ksys_read+0x60/0xe8
> [   39.261288]        ret_fast_syscall+0x0/0x28
> [   39.265564]        0xbed7c808
> [   39.268538]
> [   39.268538] -> #1 (kn->active#3){++++}-{0:0}:
> [   39.274391]        kernfs_remove_by_name_ns+0x40/0x94
> [   39.279450]        device_del+0x144/0x3fc
> [   39.283467]        __device_link_del+0x4c/0x70
> [   39.287919]        device_link_remove+0x5c/0x8c
> [   39.292464]        _regulator_put.part.0+0x104/0x1dc
> [   39.297436]        regulator_put+0x2c/0x3c
> [   39.299731] regulator regulator.5: Failed to increase supply voltage: -110
> [   39.301544]        release_nodes+0x1b4/0x204
> [   39.301553]        really_probe+0x104/0x3b4
> [   39.316881]        driver_probe_device+0x58/0xb4
> [   39.321506]        device_driver_attach+0x58/0x60
> [   39.326217]        __driver_attach+0x58/0xd0
> [   39.330499]        bus_for_each_dev+0x74/0xbc
> [   39.334863]        bus_add_driver+0x150/0x1dc
> [   39.339227]        driver_register+0x74/0x108
> [   39.343599]        i2c_register_driver+0x38/0x8c
> [   39.348227]        do_one_initcall+0x84/0x3b4
> [   39.352598]        kernel_init_freeable+0x154/0x1e4
> [   39.357485]        kernel_init+0x8/0x118
> [   39.361415]        ret_from_fork+0x14/0x20
> [   39.365518]        0x0
> [   39.367883]
> [   39.367883] -> #0 (dpm_list_mtx){+.+.}-{3:3}:
> [   39.373740]        lock_acquire+0xe0/0x4ec
> [   39.377848]        __mutex_lock+0x94/0x9d0
> [   39.381952]        mutex_lock_nested+0x1c/0x24
> [   39.386405]        dpm_for_each_dev+0x20/0x5c
> [   39.390769]        fw_pm_notify+0xa4/0xd4
> [   39.394795]        notifier_call_chain+0x48/0x80
> [   39.399420]        __blocking_notifier_call_chain+0x48/0x64
> [   39.405003]        __pm_notifier_call_chain+0x20/0x3c
> [   39.410063]        pm_suspend+0x1ac/0x438
> [   39.414080]        state_store+0x68/0xc8
> [   39.418013]        kernfs_fop_write+0x10c/0x22c
> [   39.419741] cpu cpu0: failed to scale vddpu up: -110
> [   39.422551]        vfs_write+0xbc/0x1d8
> [   39.422559]        ksys_write+0x60/0xe8
> [   39.427529] cpufreq: __target_index: Failed to change cpu frequency: -110
> [   39.431362]        ret_fast_syscall+0x0/0x28
> [   39.431368]        0xbeec8958
> [   39.431372]
> [   39.431372] other info that might help us debug this:
> [   39.431372]
> [   39.431375] Chain exists of:
> [   39.431375]   dpm_list_mtx --> kn->active#3 --> fw_lock
> [   39.431375]
> [   39.431390]  Possible unsafe locking scenario:
> [   39.431390]
> [   39.431394]        CPU0                    CPU1
> [   39.431398]        ----                    ----
> [   39.431401]   lock(fw_lock);
> [   39.431412]                                lock(kn->active#3);
> [   39.490528]                                lock(fw_lock);
> [   39.495934]   lock(dpm_list_mtx);
> [   39.499255]
> [   39.499255]  *** DEADLOCK ***
> [   39.499255]
> [   39.505181] 6 locks held by sh/647:
> [   39.508675]  #0: ecf48a84 (sb_writers#4){.+.+}-{0:0}, at:
> vfs_write+0x14c/0x1d8
> [   39.516007]  #1: ed2ced48 (&of->mutex){+.+.}-{3:3}, at:
> kernfs_fop_write+0xd0/0x22c
> [   39.523684]  #2: ec1ff960 (kn->active#90){.+.+}-{0:0}, at:
> kernfs_fop_write+0xd8/0x22c
> [   39.531620]  #3: c151c4e8 (system_transition_mutex){+.+.}-{3:3},
> at: pm_suspend+0x11c/0x438
> [   39.539991]  #4: c151e3f4 ((pm_chain_head).rwsem){++++}-{3:3}, at:
> __blocking_notifier_call_chain+0x2c/0x64
> [   39.549753] usb_otg_vbus: disabling
> [   39.549755]  #5: c15a37e4 (fw_lock){+.+.}-{3:3}, at: fw_pm_notify+0x90/0xd4
> [   39.553264] wm8962-supply: disabling
> [   39.560214]
> [   39.560214] stack backtrace:
> [   39.560225] CPU: 0 PID: 647 Comm: sh Not tainted
> 5.9.0-rc1-00103-g7eac66d0456f #37
> [   39.560230] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> [   39.560260] [<c0112270>] (unwind_backtrace) from [<c010bfd4>]
> (show_stack+0x10/0x14)
> [   39.560287] [<c010bfd4>] (show_stack) from [<c05af3a0>]
> (dump_stack+0xe4/0x118)
> [   39.597365] [<c05af3a0>] (dump_stack) from [<c0185390>]
> (check_noncircular+0x130/0x1e4)
> [   39.605386] [<c0185390>] (check_noncircular) from [<c018880c>]
> (__lock_acquire+0x161c/0x31b0)
> [   39.613923] [<c018880c>] (__lock_acquire) from [<c018ad00>]
> (lock_acquire+0xe0/0x4ec)
> [   39.621772] [<c018ad00>] (lock_acquire) from [<c0db0bfc>]
> (__mutex_lock+0x94/0x9d0)
> [   39.629445] [<c0db0bfc>] (__mutex_lock) from [<c0db1554>]
> (mutex_lock_nested+0x1c/0x24)
> [   39.637463] [<c0db1554>] (mutex_lock_nested) from [<c07892dc>]
> (dpm_for_each_dev+0x20/0x5c)
> [   39.645828] [<c07892dc>] (dpm_for_each_dev) from [<c0794a60>]
> (fw_pm_notify+0xa4/0xd4)
> [   39.653762] [<c0794a60>] (fw_pm_notify) from [<c0152c7c>]
> (notifier_call_chain+0x48/0x80)
> [   39.661954] [<c0152c7c>] (notifier_call_chain) from [<c01532a8>]
> (__blocking_notifier_call_chain+0x48/0x64)
> [   39.671709] [<c01532a8>] (__blocking_notifier_call_chain) from
> [<c0192498>] (__pm_notifier_call_chain+0x20/0x3c)
> [   39.679740] cpu cpu0: failed to scale vddpu up: -110
> [   39.681897] [<c0192498>] (__pm_notifier_call_chain) from
> [<c0194424>] (pm_suspend+0x1ac/0x438)
> [   39.686859] cpufreq: __target_index: Failed to change cpu frequency: -110
> [   39.695473] [<c0194424>] (pm_suspend) from [<c019235c>]
> (state_store+0x68/0xc8)
> [   39.695490] [<c019235c>] (state_store) from [<c035af80>]
> (kernfs_fop_write+0x10c/0x22c)
> [   39.695506] [<c035af80>] (kernfs_fop_write) from [<c02b0e38>]
> (vfs_write+0xbc/0x1d8)
> [   39.695524] [<c02b0e38>] (vfs_write) from [<c02b10a0>] (ksys_write+0x60/0xe8)
> [   39.704844] VGEN1: disabling
> [   39.709627] [<c02b10a0>] (ksys_write) from [<c0100080>]
> (ret_fast_syscall+0x0/0x28)
> [   39.743035] Exception stack(0xed573fa8 to 0xed573ff0)
> [   39.748098] 3fa0:                   00000004 01d3caf8 00000001
> 01d3caf8 00000004 00000000
> [   39.756286] 3fc0: 00000004 01d3caf8 b6f40340 00000004 b6ed6c8c
> 00000000 00000000 00000000
> [   39.764470] 3fe0: 00000004 beec8958 b6e75d4f b6e01d16
> [   39.770874] VGEN2: disabling
> [   39.776106] VGEN3: disabling
>
> [   69.590256] cfg80211: failed to load regulatory.db
> [   69.590312] Freezing user space processes ... (elapsed 0.008 seconds) done.
> [   69.606341] OOM killer disabled.
> [   69.609599] Freezing remaining freezable tasks ... (elapsed 0.001
> seconds) done.
> [   69.619021] printk: Suspending console(s) (use no_console_suspend to debug)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: Lockdep warning caused by "driver core: Fix sleeping in invalid context during device link deletion"
  2020-08-21 18:28 ` Saravana Kannan
@ 2020-08-24  8:18   ` Aisheng Dong
  0 siblings, 0 replies; 7+ messages in thread
From: Aisheng Dong @ 2020-08-24  8:18 UTC (permalink / raw)
  To: Saravana Kannan, Dong Aisheng
  Cc: open list, Guenter Roeck, gregkh, Marek Szyprowski,
	Naresh Kamboju, dl-linux-imx,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE,
	Shawn Guo, Sascha Hauer

> From: Saravana Kannan <saravanak@google.com>
> Sent: Saturday, August 22, 2020 2:28 AM
> 
> On Thu, Aug 20, 2020 at 8:50 PM Dong Aisheng <dongas86@gmail.com>
> wrote:
> >
> > Hi ALL,
> >
> > We met the below WARNING during system suspend on an iMX6Q SDB board
> > with the latest linus/master branch (v5.9-rc1+) and next-20200820.
> > v5.8 kernel is ok. So i did bisect and finally found it's caused by
> > the patch below.
> > Reverting it can get rid of the warning, but I wonder if there may be
> > other potential issues.
> > Any ideas?
> 
> Thanks for the report. I'll look into this more closely after Linux Plumbers (next
> week). We can't just revert the patch you pointed out because it's fixing another
> locking issue.
> 

Thanks. Pls have me CCed when you send out patches.
I'd love to test it.

Regards
Aisheng

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Lockdep warning caused by "driver core: Fix sleeping in invalid context during device link deletion"
  2020-08-21  3:35 Lockdep warning caused by "driver core: Fix sleeping in invalid context during device link deletion" Dong Aisheng
  2020-08-21 18:28 ` Saravana Kannan
@ 2020-08-27  5:17 ` Saravana Kannan
  2020-08-31 22:15   ` Saravana Kannan
  1 sibling, 1 reply; 7+ messages in thread
From: Saravana Kannan @ 2020-08-27  5:17 UTC (permalink / raw)
  To: Dong Aisheng
  Cc: open list, Guenter Roeck, gregkh, Marek Szyprowski,
	Naresh Kamboju, dl-linux-imx,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE,
	Dong Aisheng, Shawn Guo, Sascha Hauer

On Thu, Aug 20, 2020 at 8:50 PM Dong Aisheng <dongas86@gmail.com> wrote:
>
> Hi ALL,
>
> We met the below WARNING during system suspend on an iMX6Q SDB board
> with the latest linus/master branch (v5.9-rc1+) and next-20200820.
> v5.8 kernel is ok. So i did bisect and finally found it's caused by
> the patch below.
> Reverting it can get rid of the warning, but I wonder if there may be
> other potential issues.
> Any ideas?
>
> Defconfig used is: imx_v6_v7_defconfig
>

----- 8< ----- Snipped text that was a bit misleading

>
> Error log:
> # echo mem > /sys/power/state
> [   39.111865] PM: suspend entry (deep)
> [   39.148650] Filesystems sync: 0.032 seconds
> [   39.154034]
> [   39.155537] ======================================================
> [   39.161723] WARNING: possible circular locking dependency detected
> [   39.167911] 5.9.0-rc1-00103-g7eac66d0456f #37 Not tainted
> [   39.173315] ------------------------------------------------------
> [   39.179500] sh/647 is trying to acquire lock:
> [   39.183862] c15a310c (dpm_list_mtx){+.+.}-{3:3}, at:
> dpm_for_each_dev+0x20/0x5c
> [   39.191200]
> [   39.191200] but task is already holding lock:
> [   39.197036] c15a37e4 (fw_lock){+.+.}-{3:3}, at: fw_pm_notify+0x90/0xd4
> [   39.203582]
> [   39.203582] which lock already depends on the new lock.
> [   39.203582]
> [   39.211763]
> [   39.211763] the existing dependency chain (in reverse order) is:
> [   39.219249]
> [   39.219249] -> #2 (fw_lock){+.+.}-{3:3}:
> [   39.224673]        mutex_lock_nested+0x1c/0x24
> [   39.229126]        firmware_uevent+0x18/0xa0
> [   39.233411]        dev_uevent+0xc4/0x1f8
> [   39.237343]        uevent_show+0x98/0x114
> [   39.241362]        dev_attr_show+0x18/0x48
> [   39.245472]        sysfs_kf_seq_show+0x84/0xec
> [   39.249927]        seq_read+0x138/0x550
> [   39.253774]        vfs_read+0x94/0x164
> [   39.257529]        ksys_read+0x60/0xe8
> [   39.261288]        ret_fast_syscall+0x0/0x28
> [   39.265564]        0xbed7c808
> [   39.268538]
> [   39.268538] -> #1 (kn->active#3){++++}-{0:0}:
> [   39.274391]        kernfs_remove_by_name_ns+0x40/0x94
> [   39.279450]        device_del+0x144/0x3fc

Rafael/Greg,

I'm not very familiar with the #0 and #2 calls stacks. But poking
around a bit, they are NOT due to the device-link-device. But the new
stuff is the above two lines that are deleting the device-link-device
(that's used to expose device link details in sysfs) when the device
link is deleted.

Kicking off a workqueue to break this cycle is easy, but the problem
is that if I queue a work to delete the device, then the sysfs folder
won't get removed immediately. And if the same link is created again
before the work is completed, then there'll be a sysfs name collision
and warning.

So, I'm kinda stuck here. Open to suggestions. Hoping you'll have
better ideas for breaking the cycle. Or point out how I'm
misunderstanding the cycle here.

-Saravana

> [   39.283467]        __device_link_del+0x4c/0x70
> [   39.287919]        device_link_remove+0x5c/0x8c
> [   39.292464]        _regulator_put.part.0+0x104/0x1dc
> [   39.297436]        regulator_put+0x2c/0x3c
> [   39.299731] regulator regulator.5: Failed to increase supply voltage: -110
> [   39.301544]        release_nodes+0x1b4/0x204
> [   39.301553]        really_probe+0x104/0x3b4
> [   39.316881]        driver_probe_device+0x58/0xb4
> [   39.321506]        device_driver_attach+0x58/0x60
> [   39.326217]        __driver_attach+0x58/0xd0
> [   39.330499]        bus_for_each_dev+0x74/0xbc
> [   39.334863]        bus_add_driver+0x150/0x1dc
> [   39.339227]        driver_register+0x74/0x108
> [   39.343599]        i2c_register_driver+0x38/0x8c
> [   39.348227]        do_one_initcall+0x84/0x3b4
> [   39.352598]        kernel_init_freeable+0x154/0x1e4
> [   39.357485]        kernel_init+0x8/0x118
> [   39.361415]        ret_from_fork+0x14/0x20
> [   39.365518]        0x0
> [   39.367883]
> [   39.367883] -> #0 (dpm_list_mtx){+.+.}-{3:3}:
> [   39.373740]        lock_acquire+0xe0/0x4ec
> [   39.377848]        __mutex_lock+0x94/0x9d0
> [   39.381952]        mutex_lock_nested+0x1c/0x24
> [   39.386405]        dpm_for_each_dev+0x20/0x5c
> [   39.390769]        fw_pm_notify+0xa4/0xd4
> [   39.394795]        notifier_call_chain+0x48/0x80
> [   39.399420]        __blocking_notifier_call_chain+0x48/0x64
> [   39.405003]        __pm_notifier_call_chain+0x20/0x3c
> [   39.410063]        pm_suspend+0x1ac/0x438
> [   39.414080]        state_store+0x68/0xc8
> [   39.418013]        kernfs_fop_write+0x10c/0x22c
> [   39.419741] cpu cpu0: failed to scale vddpu up: -110
> [   39.422551]        vfs_write+0xbc/0x1d8
> [   39.422559]        ksys_write+0x60/0xe8
> [   39.427529] cpufreq: __target_index: Failed to change cpu frequency: -110
> [   39.431362]        ret_fast_syscall+0x0/0x28
> [   39.431368]        0xbeec8958
> [   39.431372]
> [   39.431372] other info that might help us debug this:
> [   39.431372]
> [   39.431375] Chain exists of:
> [   39.431375]   dpm_list_mtx --> kn->active#3 --> fw_lock
> [   39.431375]
> [   39.431390]  Possible unsafe locking scenario:
> [   39.431390]
> [   39.431394]        CPU0                    CPU1
> [   39.431398]        ----                    ----
> [   39.431401]   lock(fw_lock);
> [   39.431412]                                lock(kn->active#3);
> [   39.490528]                                lock(fw_lock);
> [   39.495934]   lock(dpm_list_mtx);
> [   39.499255]
> [   39.499255]  *** DEADLOCK ***
> [   39.499255]
> [   39.505181] 6 locks held by sh/647:
> [   39.508675]  #0: ecf48a84 (sb_writers#4){.+.+}-{0:0}, at:
> vfs_write+0x14c/0x1d8
> [   39.516007]  #1: ed2ced48 (&of->mutex){+.+.}-{3:3}, at:
> kernfs_fop_write+0xd0/0x22c
> [   39.523684]  #2: ec1ff960 (kn->active#90){.+.+}-{0:0}, at:
> kernfs_fop_write+0xd8/0x22c
> [   39.531620]  #3: c151c4e8 (system_transition_mutex){+.+.}-{3:3},
> at: pm_suspend+0x11c/0x438
> [   39.539991]  #4: c151e3f4 ((pm_chain_head).rwsem){++++}-{3:3}, at:
> __blocking_notifier_call_chain+0x2c/0x64
> [   39.549753] usb_otg_vbus: disabling
> [   39.549755]  #5: c15a37e4 (fw_lock){+.+.}-{3:3}, at: fw_pm_notify+0x90/0xd4
> [   39.553264] wm8962-supply: disabling
> [   39.560214]
> [   39.560214] stack backtrace:
> [   39.560225] CPU: 0 PID: 647 Comm: sh Not tainted
> 5.9.0-rc1-00103-g7eac66d0456f #37
> [   39.560230] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> [   39.560260] [<c0112270>] (unwind_backtrace) from [<c010bfd4>]
> (show_stack+0x10/0x14)
> [   39.560287] [<c010bfd4>] (show_stack) from [<c05af3a0>]
> (dump_stack+0xe4/0x118)
> [   39.597365] [<c05af3a0>] (dump_stack) from [<c0185390>]
> (check_noncircular+0x130/0x1e4)
> [   39.605386] [<c0185390>] (check_noncircular) from [<c018880c>]
> (__lock_acquire+0x161c/0x31b0)
> [   39.613923] [<c018880c>] (__lock_acquire) from [<c018ad00>]
> (lock_acquire+0xe0/0x4ec)
> [   39.621772] [<c018ad00>] (lock_acquire) from [<c0db0bfc>]
> (__mutex_lock+0x94/0x9d0)
> [   39.629445] [<c0db0bfc>] (__mutex_lock) from [<c0db1554>]
> (mutex_lock_nested+0x1c/0x24)
> [   39.637463] [<c0db1554>] (mutex_lock_nested) from [<c07892dc>]
> (dpm_for_each_dev+0x20/0x5c)
> [   39.645828] [<c07892dc>] (dpm_for_each_dev) from [<c0794a60>]
> (fw_pm_notify+0xa4/0xd4)
> [   39.653762] [<c0794a60>] (fw_pm_notify) from [<c0152c7c>]
> (notifier_call_chain+0x48/0x80)
> [   39.661954] [<c0152c7c>] (notifier_call_chain) from [<c01532a8>]
> (__blocking_notifier_call_chain+0x48/0x64)
> [   39.671709] [<c01532a8>] (__blocking_notifier_call_chain) from
> [<c0192498>] (__pm_notifier_call_chain+0x20/0x3c)
> [   39.679740] cpu cpu0: failed to scale vddpu up: -110
> [   39.681897] [<c0192498>] (__pm_notifier_call_chain) from
> [<c0194424>] (pm_suspend+0x1ac/0x438)
> [   39.686859] cpufreq: __target_index: Failed to change cpu frequency: -110
> [   39.695473] [<c0194424>] (pm_suspend) from [<c019235c>]
> (state_store+0x68/0xc8)
> [   39.695490] [<c019235c>] (state_store) from [<c035af80>]
> (kernfs_fop_write+0x10c/0x22c)
> [   39.695506] [<c035af80>] (kernfs_fop_write) from [<c02b0e38>]
> (vfs_write+0xbc/0x1d8)
> [   39.695524] [<c02b0e38>] (vfs_write) from [<c02b10a0>] (ksys_write+0x60/0xe8)
> [   39.704844] VGEN1: disabling
> [   39.709627] [<c02b10a0>] (ksys_write) from [<c0100080>]
> (ret_fast_syscall+0x0/0x28)
> [   39.743035] Exception stack(0xed573fa8 to 0xed573ff0)
> [   39.748098] 3fa0:                   00000004 01d3caf8 00000001
> 01d3caf8 00000004 00000000
> [   39.756286] 3fc0: 00000004 01d3caf8 b6f40340 00000004 b6ed6c8c
> 00000000 00000000 00000000
> [   39.764470] 3fe0: 00000004 beec8958 b6e75d4f b6e01d16
> [   39.770874] VGEN2: disabling
> [   39.776106] VGEN3: disabling
>
> [   69.590256] cfg80211: failed to load regulatory.db
> [   69.590312] Freezing user space processes ... (elapsed 0.008 seconds) done.
> [   69.606341] OOM killer disabled.
> [   69.609599] Freezing remaining freezable tasks ... (elapsed 0.001
> seconds) done.
> [   69.619021] printk: Suspending console(s) (use no_console_suspend to debug)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Lockdep warning caused by "driver core: Fix sleeping in invalid context during device link deletion"
  2020-08-27  5:17 ` Saravana Kannan
@ 2020-08-31 22:15   ` Saravana Kannan
  2020-09-01  6:07     ` Peng Fan
  0 siblings, 1 reply; 7+ messages in thread
From: Saravana Kannan @ 2020-08-31 22:15 UTC (permalink / raw)
  To: Dong Aisheng
  Cc: open list, Guenter Roeck, gregkh, Marek Szyprowski,
	Naresh Kamboju, dl-linux-imx,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE,
	Dong Aisheng, Shawn Guo, Sascha Hauer

On Wed, Aug 26, 2020 at 10:17 PM Saravana Kannan <saravanak@google.com> wrote:
>
> On Thu, Aug 20, 2020 at 8:50 PM Dong Aisheng <dongas86@gmail.com> wrote:
> >
> > Hi ALL,
> >
> > We met the below WARNING during system suspend on an iMX6Q SDB board
> > with the latest linus/master branch (v5.9-rc1+) and next-20200820.
> > v5.8 kernel is ok. So i did bisect and finally found it's caused by
> > the patch below.
> > Reverting it can get rid of the warning, but I wonder if there may be
> > other potential issues.
> > Any ideas?
> >
> > Defconfig used is: imx_v6_v7_defconfig
> >
>
> ----- 8< ----- Snipped text that was a bit misleading
>
> >
> > Error log:
> > # echo mem > /sys/power/state
> > [   39.111865] PM: suspend entry (deep)
> > [   39.148650] Filesystems sync: 0.032 seconds
> > [   39.154034]
> > [   39.155537] ======================================================
> > [   39.161723] WARNING: possible circular locking dependency detected
> > [   39.167911] 5.9.0-rc1-00103-g7eac66d0456f #37 Not tainted
> > [   39.173315] ------------------------------------------------------
> > [   39.179500] sh/647 is trying to acquire lock:
> > [   39.183862] c15a310c (dpm_list_mtx){+.+.}-{3:3}, at:
> > dpm_for_each_dev+0x20/0x5c
> > [   39.191200]
> > [   39.191200] but task is already holding lock:
> > [   39.197036] c15a37e4 (fw_lock){+.+.}-{3:3}, at: fw_pm_notify+0x90/0xd4
> > [   39.203582]
> > [   39.203582] which lock already depends on the new lock.
> > [   39.203582]
> > [   39.211763]
> > [   39.211763] the existing dependency chain (in reverse order) is:
> > [   39.219249]
> > [   39.219249] -> #2 (fw_lock){+.+.}-{3:3}:
> > [   39.224673]        mutex_lock_nested+0x1c/0x24
> > [   39.229126]        firmware_uevent+0x18/0xa0
> > [   39.233411]        dev_uevent+0xc4/0x1f8
> > [   39.237343]        uevent_show+0x98/0x114
> > [   39.241362]        dev_attr_show+0x18/0x48
> > [   39.245472]        sysfs_kf_seq_show+0x84/0xec
> > [   39.249927]        seq_read+0x138/0x550
> > [   39.253774]        vfs_read+0x94/0x164
> > [   39.257529]        ksys_read+0x60/0xe8
> > [   39.261288]        ret_fast_syscall+0x0/0x28
> > [   39.265564]        0xbed7c808
> > [   39.268538]
> > [   39.268538] -> #1 (kn->active#3){++++}-{0:0}:
> > [   39.274391]        kernfs_remove_by_name_ns+0x40/0x94
> > [   39.279450]        device_del+0x144/0x3fc
>
> Rafael/Greg,
>
> I'm not very familiar with the #0 and #2 calls stacks. But poking
> around a bit, they are NOT due to the device-link-device. But the new
> stuff is the above two lines that are deleting the device-link-device
> (that's used to expose device link details in sysfs) when the device
> link is deleted.
>
> Kicking off a workqueue to break this cycle is easy, but the problem
> is that if I queue a work to delete the device, then the sysfs folder
> won't get removed immediately. And if the same link is created again
> before the work is completed, then there'll be a sysfs name collision
> and warning.
>
> So, I'm kinda stuck here. Open to suggestions. Hoping you'll have
> better ideas for breaking the cycle. Or point out how I'm
> misunderstanding the cycle here.
>

Aisheng,

Sent out a fix that I think should work.
https://lore.kernel.org/lkml/20200831221007.1506441-1-saravanak@google.com/T/#u

I wasn't able to reproduce it in my hardware. So, if you can test that
patch (and respond to that thread), that'd be great.

-Saravana

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: Lockdep warning caused by "driver core: Fix sleeping in invalid context during device link deletion"
  2020-08-31 22:15   ` Saravana Kannan
@ 2020-09-01  6:07     ` Peng Fan
  2020-09-01  6:46       ` Saravana Kannan
  0 siblings, 1 reply; 7+ messages in thread
From: Peng Fan @ 2020-09-01  6:07 UTC (permalink / raw)
  To: Saravana Kannan, Dong Aisheng
  Cc: open list, Guenter Roeck, gregkh, Marek Szyprowski,
	Naresh Kamboju, dl-linux-imx,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE,
	Aisheng Dong, Shawn Guo, Sascha Hauer

> Subject: Re: Lockdep warning caused by "driver core: Fix sleeping in invalid
> context during device link deletion"
> 
> On Wed, Aug 26, 2020 at 10:17 PM Saravana Kannan
> <saravanak@google.com> wrote:
> >
> > On Thu, Aug 20, 2020 at 8:50 PM Dong Aisheng <dongas86@gmail.com>
> wrote:
> > >
> > > Hi ALL,
> > >
> > > We met the below WARNING during system suspend on an iMX6Q SDB
> board
> > > with the latest linus/master branch (v5.9-rc1+) and next-20200820.
> > > v5.8 kernel is ok. So i did bisect and finally found it's caused by
> > > the patch below.
> > > Reverting it can get rid of the warning, but I wonder if there may
> > > be other potential issues.
> > > Any ideas?
> > >
> > > Defconfig used is: imx_v6_v7_defconfig
> > >
> >
> > ----- 8< ----- Snipped text that was a bit misleading
> >
> > >
> > > Error log:
> > > # echo mem > /sys/power/state
> > > [   39.111865] PM: suspend entry (deep)
> > > [   39.148650] Filesystems sync: 0.032 seconds
> > > [   39.154034]
> > > [   39.155537]
> ======================================================
> > > [   39.161723] WARNING: possible circular locking dependency detected
> > > [   39.167911] 5.9.0-rc1-00103-g7eac66d0456f #37 Not tainted
> > > [   39.173315] ------------------------------------------------------
> > > [   39.179500] sh/647 is trying to acquire lock:
> > > [   39.183862] c15a310c (dpm_list_mtx){+.+.}-{3:3}, at:
> > > dpm_for_each_dev+0x20/0x5c
> > > [   39.191200]
> > > [   39.191200] but task is already holding lock:
> > > [   39.197036] c15a37e4 (fw_lock){+.+.}-{3:3}, at:
> fw_pm_notify+0x90/0xd4
> > > [   39.203582]
> > > [   39.203582] which lock already depends on the new lock.
> > > [   39.203582]
> > > [   39.211763]
> > > [   39.211763] the existing dependency chain (in reverse order) is:
> > > [   39.219249]
> > > [   39.219249] -> #2 (fw_lock){+.+.}-{3:3}:
> > > [   39.224673]        mutex_lock_nested+0x1c/0x24
> > > [   39.229126]        firmware_uevent+0x18/0xa0
> > > [   39.233411]        dev_uevent+0xc4/0x1f8
> > > [   39.237343]        uevent_show+0x98/0x114
> > > [   39.241362]        dev_attr_show+0x18/0x48
> > > [   39.245472]        sysfs_kf_seq_show+0x84/0xec
> > > [   39.249927]        seq_read+0x138/0x550
> > > [   39.253774]        vfs_read+0x94/0x164
> > > [   39.257529]        ksys_read+0x60/0xe8
> > > [   39.261288]        ret_fast_syscall+0x0/0x28
> > > [   39.265564]        0xbed7c808
> > > [   39.268538]
> > > [   39.268538] -> #1 (kn->active#3){++++}-{0:0}:
> > > [   39.274391]        kernfs_remove_by_name_ns+0x40/0x94
> > > [   39.279450]        device_del+0x144/0x3fc
> >
> > Rafael/Greg,
> >
> > I'm not very familiar with the #0 and #2 calls stacks. But poking
> > around a bit, they are NOT due to the device-link-device. But the new
> > stuff is the above two lines that are deleting the device-link-device
> > (that's used to expose device link details in sysfs) when the device
> > link is deleted.
> >
> > Kicking off a workqueue to break this cycle is easy, but the problem
> > is that if I queue a work to delete the device, then the sysfs folder
> > won't get removed immediately. And if the same link is created again
> > before the work is completed, then there'll be a sysfs name collision
> > and warning.
> >
> > So, I'm kinda stuck here. Open to suggestions. Hoping you'll have
> > better ideas for breaking the cycle. Or point out how I'm
> > misunderstanding the cycle here.
> >
> 
> Aisheng,
> 
> Sent out a fix that I think should work.
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.ke
> rnel.org%2Flkml%2F20200831221007.1506441-1-saravanak%40google.com%
> 2FT%2F%23u&amp;data=02%7C01%7Cpeng.fan%40nxp.com%7C3254604d7
> 41b4d1ce73b08d84dfb65af%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0
> %7C0%7C637345089428077609&amp;sdata=5lh8WO%2BYMh4C1sBn58Fsm
> XsjqjPj%2B%2FB71%2FENfMGDtTk%3D&amp;reserved=0
> 
> I wasn't able to reproduce it in my hardware. So, if you can test that patch
> (and respond to that thread), that'd be great.

I not found your patch in my mailbox, but anyway I tested it.

Tested-by: Peng Fan <peng.fan@nxp.com> (i.MX7ULP EVK)

Regards,
Peng.

> 
> -Saravana

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Lockdep warning caused by "driver core: Fix sleeping in invalid context during device link deletion"
  2020-09-01  6:07     ` Peng Fan
@ 2020-09-01  6:46       ` Saravana Kannan
  0 siblings, 0 replies; 7+ messages in thread
From: Saravana Kannan @ 2020-09-01  6:46 UTC (permalink / raw)
  To: Peng Fan
  Cc: Dong Aisheng, open list, Guenter Roeck, gregkh, Marek Szyprowski,
	Naresh Kamboju, dl-linux-imx,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE,
	Aisheng Dong, Shawn Guo, Sascha Hauer

On Mon, Aug 31, 2020 at 11:07 PM Peng Fan <peng.fan@nxp.com> wrote:
>
> > Subject: Re: Lockdep warning caused by "driver core: Fix sleeping in invalid
> > context during device link deletion"
> >
> > On Wed, Aug 26, 2020 at 10:17 PM Saravana Kannan
> > <saravanak@google.com> wrote:
> > >
> > > On Thu, Aug 20, 2020 at 8:50 PM Dong Aisheng <dongas86@gmail.com>
> > wrote:
> > > >
> > > > Hi ALL,
> > > >
> > > > We met the below WARNING during system suspend on an iMX6Q SDB
> > board
> > > > with the latest linus/master branch (v5.9-rc1+) and next-20200820.
> > > > v5.8 kernel is ok. So i did bisect and finally found it's caused by
> > > > the patch below.
> > > > Reverting it can get rid of the warning, but I wonder if there may
> > > > be other potential issues.
> > > > Any ideas?
> > > >
> > > > Defconfig used is: imx_v6_v7_defconfig
> > > >
> > >
> > > ----- 8< ----- Snipped text that was a bit misleading
> > >
> > > >
> > > > Error log:
> > > > # echo mem > /sys/power/state
> > > > [   39.111865] PM: suspend entry (deep)
> > > > [   39.148650] Filesystems sync: 0.032 seconds
> > > > [   39.154034]
> > > > [   39.155537]
> > ======================================================
> > > > [   39.161723] WARNING: possible circular locking dependency detected
> > > > [   39.167911] 5.9.0-rc1-00103-g7eac66d0456f #37 Not tainted
> > > > [   39.173315] ------------------------------------------------------
> > > > [   39.179500] sh/647 is trying to acquire lock:
> > > > [   39.183862] c15a310c (dpm_list_mtx){+.+.}-{3:3}, at:
> > > > dpm_for_each_dev+0x20/0x5c
> > > > [   39.191200]
> > > > [   39.191200] but task is already holding lock:
> > > > [   39.197036] c15a37e4 (fw_lock){+.+.}-{3:3}, at:
> > fw_pm_notify+0x90/0xd4
> > > > [   39.203582]
> > > > [   39.203582] which lock already depends on the new lock.
> > > > [   39.203582]
> > > > [   39.211763]
> > > > [   39.211763] the existing dependency chain (in reverse order) is:
> > > > [   39.219249]
> > > > [   39.219249] -> #2 (fw_lock){+.+.}-{3:3}:
> > > > [   39.224673]        mutex_lock_nested+0x1c/0x24
> > > > [   39.229126]        firmware_uevent+0x18/0xa0
> > > > [   39.233411]        dev_uevent+0xc4/0x1f8
> > > > [   39.237343]        uevent_show+0x98/0x114
> > > > [   39.241362]        dev_attr_show+0x18/0x48
> > > > [   39.245472]        sysfs_kf_seq_show+0x84/0xec
> > > > [   39.249927]        seq_read+0x138/0x550
> > > > [   39.253774]        vfs_read+0x94/0x164
> > > > [   39.257529]        ksys_read+0x60/0xe8
> > > > [   39.261288]        ret_fast_syscall+0x0/0x28
> > > > [   39.265564]        0xbed7c808
> > > > [   39.268538]
> > > > [   39.268538] -> #1 (kn->active#3){++++}-{0:0}:
> > > > [   39.274391]        kernfs_remove_by_name_ns+0x40/0x94
> > > > [   39.279450]        device_del+0x144/0x3fc
> > >
> > > Rafael/Greg,
> > >
> > > I'm not very familiar with the #0 and #2 calls stacks. But poking
> > > around a bit, they are NOT due to the device-link-device. But the new
> > > stuff is the above two lines that are deleting the device-link-device
> > > (that's used to expose device link details in sysfs) when the device
> > > link is deleted.
> > >
> > > Kicking off a workqueue to break this cycle is easy, but the problem
> > > is that if I queue a work to delete the device, then the sysfs folder
> > > won't get removed immediately. And if the same link is created again
> > > before the work is completed, then there'll be a sysfs name collision
> > > and warning.
> > >
> > > So, I'm kinda stuck here. Open to suggestions. Hoping you'll have
> > > better ideas for breaking the cycle. Or point out how I'm
> > > misunderstanding the cycle here.
> > >
> >
> > Aisheng,
> >
> > Sent out a fix that I think should work.
> > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.ke
> > rnel.org%2Flkml%2F20200831221007.1506441-1-saravanak%40google.com%
> > 2FT%2F%23u&amp;data=02%7C01%7Cpeng.fan%40nxp.com%7C3254604d7
> > 41b4d1ce73b08d84dfb65af%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0
> > %7C0%7C637345089428077609&amp;sdata=5lh8WO%2BYMh4C1sBn58Fsm
> > XsjqjPj%2B%2FB71%2FENfMGDtTk%3D&amp;reserved=0
> >
> > I wasn't able to reproduce it in my hardware. So, if you can test that patch
> > (and respond to that thread), that'd be great.
>
> I not found your patch in my mailbox, but anyway I tested it.

Sorry I forgot to CC everyone from the original email!

>
> Tested-by: Peng Fan <peng.fan@nxp.com> (i.MX7ULP EVK)

Thanks for testing!

-Saravana

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-09-01  6:47 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-21  3:35 Lockdep warning caused by "driver core: Fix sleeping in invalid context during device link deletion" Dong Aisheng
2020-08-21 18:28 ` Saravana Kannan
2020-08-24  8:18   ` Aisheng Dong
2020-08-27  5:17 ` Saravana Kannan
2020-08-31 22:15   ` Saravana Kannan
2020-09-01  6:07     ` Peng Fan
2020-09-01  6:46       ` Saravana Kannan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).