linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Adreno devfreq lockdep splat with 6.3-rc2
@ 2023-03-15  9:19 Johan Hovold
  2023-06-08 14:13 ` Johan Hovold
  0 siblings, 1 reply; 5+ messages in thread
From: Johan Hovold @ 2023-03-15  9:19 UTC (permalink / raw)
  To: Rob Clark
  Cc: Abhinav Kumar, Dmitry Baryshkov, Sean Paul, David Airlie,
	Daniel Vetter, dri-devel, linux-arm-msm, freedreno, linux-kernel

Hi Rob,

Since 6.3-rc2 (or possibly -rc1), I'm now seeing the below
devfreq-related lockdep splat.

I noticed that you posted a fix for something similar here:

	https://lore.kernel.org/r/20230312204150.1353517-9-robdclark@gmail.com

but that particular patch makes no difference.

From skimming the calltraces below and qos/devfreq related changes in
6.3-rc1 it seems like this could be related to:

	fadcc3ab1302 ("drm/msm/gpu: Bypass PM QoS constraint for idle clamp")

Johan


[   35.389822] ======================================================
[   35.389824] WARNING: possible circular locking dependency detected
[   35.389826] 6.3.0-rc2 #208 Not tainted
[   35.389828] ------------------------------------------------------
[   35.389829] ring0/348 is trying to acquire lock:
[   35.389830] ffff43424cfa2078 (&devfreq->lock){+.+.}-{3:3}, at: qos_min_notifier_call+0x28/0x88
[   35.389845] 
               but task is already holding lock:
[   35.389846] ffff4342432b78e8 (&(c->notifiers)->rwsem){++++}-{3:3}, at: blocking_notifier_call_chain+0x34/0xa0
[   35.389855] 
               which lock already depends on the new lock.

[   35.389856] 
               the existing dependency chain (in reverse order) is:
[   35.389857] 
               -> #4 (&(c->notifiers)->rwsem){++++}-{3:3}:
[   35.389861]        lock_acquire+0x68/0x84
[   35.389865]        down_write+0x58/0xfc
[   35.389869]        blocking_notifier_chain_register+0x30/0x8c
[   35.389872]        freq_qos_add_notifier+0x68/0x7c
[   35.389876]        dev_pm_qos_add_notifier+0xe8/0x114
[   35.389881]        devfreq_add_device.part.0+0x360/0x5a4
[   35.389884]        devm_devfreq_add_device+0x74/0xe0
[   35.389886]        msm_devfreq_init+0xa0/0x154 [msm]
[   35.389915]        msm_gpu_init+0x320/0x5b0 [msm]
[   35.389933]        adreno_gpu_init+0x164/0x2d8 [msm]
[   35.389951]        a6xx_gpu_init+0x270/0x608 [msm]
[   35.389968]        adreno_bind+0x184/0x284 [msm]
[   35.389983]        component_bind_all+0x124/0x288
[   35.389989]        msm_drm_bind+0x1d8/0x6a8 [msm]
[   35.390004]        try_to_bring_up_aggregate_device+0x1ec/0x2f4
[   35.390007]        __component_add+0xa8/0x194
[   35.390010]        component_add+0x14/0x20
[   35.390012]        dp_display_probe+0x2b4/0x474 [msm]
[   35.390029]        platform_probe+0x68/0xd8
[   35.390031]        really_probe+0x184/0x3c8
[   35.390034]        __driver_probe_device+0x7c/0x188
[   35.390036]        driver_probe_device+0x3c/0x110
[   35.390039]        __device_attach_driver+0xbc/0x158
[   35.390041]        bus_for_each_drv+0x84/0xe0
[   35.390044]        __device_attach+0xa8/0x1d4
[   35.390046]        device_initial_probe+0x14/0x20
[   35.390049]        bus_probe_device+0xac/0xb0
[   35.390051]        deferred_probe_work_func+0xa0/0xf4
[   35.390053]        process_one_work+0x288/0x6c4
[   35.390056]        worker_thread+0x74/0x450
[   35.390058]        kthread+0x118/0x11c
[   35.390060]        ret_from_fork+0x10/0x20
[   35.390063] 
               -> #3 (dev_pm_qos_mtx){+.+.}-{3:3}:
[   35.390066]        lock_acquire+0x68/0x84
[   35.390068]        __mutex_lock+0x98/0x428
[   35.390072]        mutex_lock_nested+0x2c/0x38
[   35.390074]        dev_pm_qos_remove_notifier+0x34/0x140
[   35.390077]        genpd_remove_device+0x3c/0x174
[   35.390081]        genpd_dev_pm_detach+0x78/0x1b4
[   35.390083]        dev_pm_domain_detach+0x24/0x34
[   35.390085]        a6xx_gmu_remove+0x64/0xd0 [msm]
[   35.390101]        a6xx_destroy+0xa8/0x138 [msm]
[   35.390116]        adreno_unbind+0x40/0x64 [msm]
[   35.390131]        component_unbind+0x38/0x6c
[   35.390134]        component_unbind_all+0xc8/0xd4
[   35.390136]        msm_drm_uninit.isra.0+0x168/0x1dc [msm]
[   35.390152]        msm_drm_bind+0x2f4/0x6a8 [msm]
[   35.390167]        try_to_bring_up_aggregate_device+0x1ec/0x2f4
[   35.390170]        __component_add+0xa8/0x194
[   35.390172]        component_add+0x14/0x20
[   35.390175]        dp_display_probe+0x2b4/0x474 [msm]
[   35.390190]        platform_probe+0x68/0xd8
[   35.390192]        really_probe+0x184/0x3c8
[   35.390194]        __driver_probe_device+0x7c/0x188
[   35.390197]        driver_probe_device+0x3c/0x110
[   35.390199]        __device_attach_driver+0xbc/0x158
[   35.390201]        bus_for_each_drv+0x84/0xe0
[   35.390203]        __device_attach+0xa8/0x1d4
[   35.390206]        device_initial_probe+0x14/0x20
[   35.390208]        bus_probe_device+0xac/0xb0
[   35.390210]        deferred_probe_work_func+0xa0/0xf4
[   35.390212]        process_one_work+0x288/0x6c4
[   35.390214]        worker_thread+0x74/0x450
[   35.390216]        kthread+0x118/0x11c
[   35.390217]        ret_from_fork+0x10/0x20
[   35.390219] 
               -> #2 (&gmu->lock){+.+.}-{3:3}:
[   35.390222]        lock_acquire+0x68/0x84
[   35.390224]        __mutex_lock+0x98/0x428
[   35.390226]        mutex_lock_nested+0x2c/0x38
[   35.390229]        a6xx_gpu_set_freq+0x30/0x5c [msm]
[   35.390245]        msm_devfreq_target+0xb4/0x218 [msm]
[   35.390260]        devfreq_set_target+0x84/0x2f4
[   35.390262]        devfreq_update_target+0xc4/0xec
[   35.390263]        devfreq_monitor+0x38/0x1f0
[   35.390265]        process_one_work+0x288/0x6c4
[   35.390267]        worker_thread+0x74/0x450
[   35.390269]        kthread+0x118/0x11c
[   35.390270]        ret_from_fork+0x10/0x20
[   35.390272] 
               -> #1 (&df->lock){+.+.}-{3:3}:
[   35.390275]        lock_acquire+0x68/0x84
[   35.390276]        __mutex_lock+0x98/0x428
[   35.390279]        mutex_lock_nested+0x2c/0x38
[   35.390281]        msm_devfreq_get_dev_status+0x48/0x134 [msm]
[   35.390296]        devfreq_simple_ondemand_func+0x3c/0x144
[   35.390298]        devfreq_update_target+0x4c/0xec
[   35.390300]        devfreq_monitor+0x38/0x1f0
[   35.390302]        process_one_work+0x288/0x6c4
[   35.390304]        worker_thread+0x74/0x450
[   35.390305]        kthread+0x118/0x11c
[   35.390307]        ret_from_fork+0x10/0x20
[   35.390308] 
               -> #0 (&devfreq->lock){+.+.}-{3:3}:
[   35.390311]        __lock_acquire+0x1394/0x21fc
[   35.390313]        lock_acquire.part.0+0xc4/0x1fc
[   35.390314]        lock_acquire+0x68/0x84
[   35.390316]        __mutex_lock+0x98/0x428
[   35.390319]        mutex_lock_nested+0x2c/0x38
[   35.390321]        qos_min_notifier_call+0x28/0x88
[   35.390323]        blocking_notifier_call_chain+0x6c/0xa0
[   35.390325]        pm_qos_update_target+0xdc/0x24c
[   35.390327]        freq_qos_apply+0x68/0x74
[   35.390329]        apply_constraint+0x100/0x148
[   35.390331]        __dev_pm_qos_update_request+0xb8/0x278
[   35.390333]        dev_pm_qos_update_request+0x3c/0x64
[   35.390335]        msm_devfreq_active+0xf8/0x194 [msm]
[   35.390350]        msm_gpu_submit+0x18c/0x1a8 [msm]
[   35.390365]        msm_job_run+0xbc/0x140 [msm]
[   35.390380]        drm_sched_main+0x1a0/0x528 [gpu_sched]
[   35.390387]        kthread+0x118/0x11c
[   35.390388]        ret_from_fork+0x10/0x20
[   35.390390] 
               other info that might help us debug this:

[   35.390391] Chain exists of:
                 &devfreq->lock --> dev_pm_qos_mtx --> &(c->notifiers)->rwsem

[   35.390395]  Possible unsafe locking scenario:

[   35.390396]        CPU0                    CPU1
[   35.390397]        ----                    ----
[   35.390397]   lock(&(c->notifiers)->rwsem);
[   35.390399]                                lock(dev_pm_qos_mtx);
[   35.390401]                                lock(&(c->notifiers)->rwsem);
[   35.390403]   lock(&devfreq->lock);
[   35.390405] 
                *** DEADLOCK ***

[   35.390406] 4 locks held by ring0/348:
[   35.390407]  #0: ffff43424cfa1170 (&gpu->lock){+.+.}-{3:3}, at: msm_job_run+0xb0/0x140 [msm]
[   35.390426]  #1: ffff43424cfa1208 (&gpu->active_lock){+.+.}-{3:3}, at: msm_gpu_submit+0xdc/0x1a8 [msm]
[   35.390443]  #2: ffffdbf2a5472718 (dev_pm_qos_mtx){+.+.}-{3:3}, at: dev_pm_qos_update_request+0x30/0x64
[   35.390448]  #3: ffff4342432b78e8 (&(c->notifiers)->rwsem){++++}-{3:3}, at: blocking_notifier_call_chain+0x34/0xa0
[   35.390452] 
               stack backtrace:
[   35.390454] CPU: 4 PID: 348 Comm: ring0 Not tainted 6.3.0-rc2 #208
[   35.390456] Hardware name: LENOVO 21BYZ9SRUS/21BYZ9SRUS, BIOS N3HET32D (1.04 ) 03/16/2020
[   35.390458] Call trace:
[   35.390460]  dump_backtrace+0xa4/0x128
[   35.390463]  show_stack+0x18/0x24
[   35.390465]  dump_stack_lvl+0x60/0xac
[   35.390469]  dump_stack+0x18/0x24
[   35.390470]  print_circular_bug+0x24c/0x2f8
[   35.390472]  check_noncircular+0x134/0x148
[   35.390473]  __lock_acquire+0x1394/0x21fc
[   35.390475]  lock_acquire.part.0+0xc4/0x1fc
[   35.390477]  lock_acquire+0x68/0x84
[   35.390478]  __mutex_lock+0x98/0x428
[   35.390481]  mutex_lock_nested+0x2c/0x38
[   35.390483]  qos_min_notifier_call+0x28/0x88
[   35.390485]  blocking_notifier_call_chain+0x6c/0xa0
[   35.390487]  pm_qos_update_target+0xdc/0x24c
[   35.390489]  freq_qos_apply+0x68/0x74
[   35.390491]  apply_constraint+0x100/0x148
[   35.390493]  __dev_pm_qos_update_request+0xb8/0x278
[   35.390495]  dev_pm_qos_update_request+0x3c/0x64
[   35.390497]  msm_devfreq_active+0xf8/0x194 [msm]
[   35.390512]  msm_gpu_submit+0x18c/0x1a8 [msm]
[   35.390527]  msm_job_run+0xbc/0x140 [msm]
[   35.390542]  drm_sched_main+0x1a0/0x528 [gpu_sched]
[   35.390547]  kthread+0x118/0x11c
[   35.390548]  ret_from_fork+0x10/0x20

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Adreno devfreq lockdep splat with 6.3-rc2
  2023-03-15  9:19 Adreno devfreq lockdep splat with 6.3-rc2 Johan Hovold
@ 2023-06-08 14:13 ` Johan Hovold
  2023-06-08 21:17   ` Rob Clark
  0 siblings, 1 reply; 5+ messages in thread
From: Johan Hovold @ 2023-06-08 14:13 UTC (permalink / raw)
  To: Rob Clark
  Cc: Abhinav Kumar, Dmitry Baryshkov, Sean Paul, David Airlie,
	Daniel Vetter, dri-devel, linux-arm-msm, freedreno, linux-kernel

Hi Rob,

Have you had a chance to look at this regression yet? It prevents us
from using lockdep on the X13s as it is disabled as soon as we start
the GPU.

On Wed, Mar 15, 2023 at 10:19:21AM +0100, Johan Hovold wrote:
> 
> Since 6.3-rc2 (or possibly -rc1), I'm now seeing the below
> devfreq-related lockdep splat.
> 
> I noticed that you posted a fix for something similar here:
> 
> 	https://lore.kernel.org/r/20230312204150.1353517-9-robdclark@gmail.com
> 
> but that particular patch makes no difference.
> 
> From skimming the calltraces below and qos/devfreq related changes in
> 6.3-rc1 it seems like this could be related to:
> 
> 	fadcc3ab1302 ("drm/msm/gpu: Bypass PM QoS constraint for idle clamp")

Below is an updated splat from 6.4-rc5.

Johan

[ 2941.931507] ======================================================
[ 2941.931509] WARNING: possible circular locking dependency detected
[ 2941.931513] 6.4.0-rc5 #64 Not tainted
[ 2941.931516] ------------------------------------------------------
[ 2941.931518] ring0/359 is trying to acquire lock:
[ 2941.931520] ffff63310e35c078 (&devfreq->lock){+.+.}-{3:3}, at: qos_min_notifier_call+0x28/0x88
[ 2941.931541] 
               but task is already holding lock:
[ 2941.931543] ffff63310e3cace8 (&(c->notifiers)->rwsem){++++}-{3:3}, at: blocking_notifier_call_chain+0x30/0x70
[ 2941.931553] 
               which lock already depends on the new lock.

[ 2941.931555] 
               the existing dependency chain (in reverse order) is:
[ 2941.931556] 
               -> #4 (&(c->notifiers)->rwsem){++++}-{3:3}:
[ 2941.931562]        down_write+0x50/0x198
[ 2941.931567]        blocking_notifier_chain_register+0x30/0x8c
[ 2941.931570]        freq_qos_add_notifier+0x68/0x7c
[ 2941.931574]        dev_pm_qos_add_notifier+0xa0/0xf8
[ 2941.931579]        devfreq_add_device.part.0+0x360/0x5a4
[ 2941.931583]        devm_devfreq_add_device+0x74/0xe0
[ 2941.931587]        msm_devfreq_init+0xa0/0x154 [msm]
[ 2941.931624]        msm_gpu_init+0x2fc/0x588 [msm]
[ 2941.931649]        adreno_gpu_init+0x160/0x2d0 [msm]
[ 2941.931675]        a6xx_gpu_init+0x2c0/0x74c [msm]
[ 2941.931699]        adreno_bind+0x180/0x290 [msm]
[ 2941.931723]        component_bind_all+0x124/0x288
[ 2941.931728]        msm_drm_bind+0x1d8/0x6cc [msm]
[ 2941.931752]        try_to_bring_up_aggregate_device+0x1ec/0x2f4
[ 2941.931755]        __component_add+0xa8/0x194
[ 2941.931758]        component_add+0x14/0x20
[ 2941.931761]        dp_display_probe+0x2b4/0x474 [msm]
[ 2941.931785]        platform_probe+0x68/0xd8
[ 2941.931789]        really_probe+0x184/0x3c8
[ 2941.931792]        __driver_probe_device+0x7c/0x16c
[ 2941.931794]        driver_probe_device+0x3c/0x110
[ 2941.931797]        __device_attach_driver+0xbc/0x158
[ 2941.931800]        bus_for_each_drv+0x84/0xe0
[ 2941.931802]        __device_attach+0xa8/0x1d4
[ 2941.931805]        device_initial_probe+0x14/0x20
[ 2941.931807]        bus_probe_device+0xb0/0xb4
[ 2941.931810]        deferred_probe_work_func+0xa0/0xf4
[ 2941.931812]        process_one_work+0x288/0x5bc
[ 2941.931816]        worker_thread+0x74/0x450
[ 2941.931818]        kthread+0x124/0x128
[ 2941.931822]        ret_from_fork+0x10/0x20
[ 2941.931826] 
               -> #3 (dev_pm_qos_mtx){+.+.}-{3:3}:
[ 2941.931831]        __mutex_lock+0xa0/0x840
[ 2941.931833]        mutex_lock_nested+0x24/0x30
[ 2941.931836]        dev_pm_qos_remove_notifier+0x34/0x140
[ 2941.931838]        genpd_remove_device+0x3c/0x174
[ 2941.931841]        genpd_dev_pm_detach+0x78/0x1b4
[ 2941.931844]        dev_pm_domain_detach+0x24/0x34
[ 2941.931846]        a6xx_gmu_remove+0x34/0xc4 [msm]
[ 2941.931869]        a6xx_destroy+0xd0/0x160 [msm]
[ 2941.931892]        adreno_unbind+0x40/0x64 [msm]
[ 2941.931916]        component_unbind+0x38/0x6c
[ 2941.931919]        component_unbind_all+0xc8/0xd4
[ 2941.931921]        msm_drm_uninit.isra.0+0x150/0x1c4 [msm]
[ 2941.931945]        msm_drm_bind+0x310/0x6cc [msm]
[ 2941.931967]        try_to_bring_up_aggregate_device+0x1ec/0x2f4
[ 2941.931970]        __component_add+0xa8/0x194
[ 2941.931973]        component_add+0x14/0x20
[ 2941.931976]        dp_display_probe+0x2b4/0x474 [msm]
[ 2941.932000]        platform_probe+0x68/0xd8
[ 2941.932003]        really_probe+0x184/0x3c8
[ 2941.932005]        __driver_probe_device+0x7c/0x16c
[ 2941.932008]        driver_probe_device+0x3c/0x110
[ 2941.932011]        __device_attach_driver+0xbc/0x158
[ 2941.932014]        bus_for_each_drv+0x84/0xe0
[ 2941.932016]        __device_attach+0xa8/0x1d4
[ 2941.932018]        device_initial_probe+0x14/0x20
[ 2941.932021]        bus_probe_device+0xb0/0xb4
[ 2941.932023]        deferred_probe_work_func+0xa0/0xf4
[ 2941.932026]        process_one_work+0x288/0x5bc
[ 2941.932028]        worker_thread+0x74/0x450
[ 2941.932031]        kthread+0x124/0x128
[ 2941.932035]        ret_from_fork+0x10/0x20
[ 2941.932037] 
               -> #2 (&gmu->lock){+.+.}-{3:3}:
[ 2941.932043]        __mutex_lock+0xa0/0x840
[ 2941.932045]        mutex_lock_nested+0x24/0x30
[ 2941.932047]        a6xx_gpu_set_freq+0x30/0x5c [msm]
[ 2941.932071]        msm_devfreq_target+0xb8/0x1a8 [msm]
[ 2941.932094]        devfreq_set_target+0x84/0x27c
[ 2941.932098]        devfreq_update_target+0xc4/0xec
[ 2941.932102]        devfreq_monitor+0x38/0x170
[ 2941.932105]        process_one_work+0x288/0x5bc
[ 2941.932108]        worker_thread+0x74/0x450
[ 2941.932110]        kthread+0x124/0x128
[ 2941.932113]        ret_from_fork+0x10/0x20
[ 2941.932116] 
               -> #1 (&df->lock){+.+.}-{3:3}:
[ 2941.932121]        __mutex_lock+0xa0/0x840
[ 2941.932124]        mutex_lock_nested+0x24/0x30
[ 2941.932126]        msm_devfreq_get_dev_status+0x48/0x134 [msm]
[ 2941.932149]        devfreq_simple_ondemand_func+0x3c/0x144
[ 2941.932153]        devfreq_update_target+0x4c/0xec
[ 2941.932157]        devfreq_monitor+0x38/0x170
[ 2941.932160]        process_one_work+0x288/0x5bc
[ 2941.932162]        worker_thread+0x74/0x450
[ 2941.932165]        kthread+0x124/0x128
[ 2941.932168]        ret_from_fork+0x10/0x20
[ 2941.932171] 
               -> #0 (&devfreq->lock){+.+.}-{3:3}:
[ 2941.932175]        __lock_acquire+0x13d8/0x2188
[ 2941.932178]        lock_acquire+0x1e8/0x310
[ 2941.932180]        __mutex_lock+0xa0/0x840
[ 2941.932182]        mutex_lock_nested+0x24/0x30
[ 2941.932184]        qos_min_notifier_call+0x28/0x88
[ 2941.932188]        notifier_call_chain+0xa0/0x17c
[ 2941.932190]        blocking_notifier_call_chain+0x48/0x70
[ 2941.932193]        pm_qos_update_target+0xdc/0x1d0
[ 2941.932195]        freq_qos_apply+0x68/0x74
[ 2941.932198]        apply_constraint+0x100/0x148
[ 2941.932201]        __dev_pm_qos_update_request+0xb8/0x1fc
[ 2941.932203]        dev_pm_qos_update_request+0x3c/0x64
[ 2941.932206]        msm_devfreq_active+0xf8/0x194 [msm]
[ 2941.932227]        msm_gpu_submit+0x18c/0x1a8 [msm]
[ 2941.932249]        msm_job_run+0x98/0x11c [msm]
[ 2941.932272]        drm_sched_main+0x1a0/0x444 [gpu_sched]
[ 2941.932281]        kthread+0x124/0x128
[ 2941.932284]        ret_from_fork+0x10/0x20
[ 2941.932287] 
               other info that might help us debug this:

[ 2941.932289] Chain exists of:
                 &devfreq->lock --> dev_pm_qos_mtx --> &(c->notifiers)->rwsem

[ 2941.932296]  Possible unsafe locking scenario:

[ 2941.932298]        CPU0                    CPU1
[ 2941.932300]        ----                    ----
[ 2941.932301]   rlock(&(c->notifiers)->rwsem);
[ 2941.932304]                                lock(dev_pm_qos_mtx);
[ 2941.932307]                                lock(&(c->notifiers)->rwsem);
[ 2941.932309]   lock(&devfreq->lock);
[ 2941.932312] 
                *** DEADLOCK ***

[ 2941.932313] 4 locks held by ring0/359:
[ 2941.932315]  #0: ffff633110966170 (&gpu->lock){+.+.}-{3:3}, at: msm_job_run+0x8c/0x11c [msm]
[ 2941.932342]  #1: ffff633110966208 (&gpu->active_lock){+.+.}-{3:3}, at: msm_gpu_submit+0xdc/0x1a8 [msm]
[ 2941.932368]  #2: ffffa40da2f91ed0 (dev_pm_qos_mtx){+.+.}-{3:3}, at: dev_pm_qos_update_request+0x30/0x64
[ 2941.932374]  #3: ffff63310e3cace8 (&(c->notifiers)->rwsem){++++}-{3:3}, at: blocking_notifier_call_chain+0x30/0x70
[ 2941.932381] 
               stack backtrace:
[ 2941.932383] CPU: 7 PID: 359 Comm: ring0 Not tainted 6.4.0-rc5 #64
[ 2941.932386] Hardware name: LENOVO 21BYZ9SRUS/21BYZ9SRUS, BIOS N3HET53W (1.25 ) 10/12/2022
[ 2941.932389] Call trace:
[ 2941.932391]  dump_backtrace+0x9c/0x11c
[ 2941.932395]  show_stack+0x18/0x24
[ 2941.932398]  dump_stack_lvl+0x60/0xac
[ 2941.932402]  dump_stack+0x18/0x24
[ 2941.932405]  print_circular_bug+0x26c/0x348
[ 2941.932407]  check_noncircular+0x134/0x148
[ 2941.932409]  __lock_acquire+0x13d8/0x2188
[ 2941.932411]  lock_acquire+0x1e8/0x310
[ 2941.932414]  __mutex_lock+0xa0/0x840
[ 2941.932416]  mutex_lock_nested+0x24/0x30
[ 2941.932418]  qos_min_notifier_call+0x28/0x88
[ 2941.932421]  notifier_call_chain+0xa0/0x17c
[ 2941.932424]  blocking_notifier_call_chain+0x48/0x70
[ 2941.932426]  pm_qos_update_target+0xdc/0x1d0
[ 2941.932428]  freq_qos_apply+0x68/0x74
[ 2941.932431]  apply_constraint+0x100/0x148
[ 2941.932433]  __dev_pm_qos_update_request+0xb8/0x1fc
[ 2941.932435]  dev_pm_qos_update_request+0x3c/0x64
[ 2941.932437]  msm_devfreq_active+0xf8/0x194 [msm]
[ 2941.932460]  msm_gpu_submit+0x18c/0x1a8 [msm]
[ 2941.932482]  msm_job_run+0x98/0x11c [msm]
[ 2941.932504]  drm_sched_main+0x1a0/0x444 [gpu_sched]
[ 2941.932511]  kthread+0x124/0x128
[ 2941.932514]  ret_from_fork+0x10/0x20

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Adreno devfreq lockdep splat with 6.3-rc2
  2023-06-08 14:13 ` Johan Hovold
@ 2023-06-08 21:17   ` Rob Clark
  2023-06-09  6:17     ` Johan Hovold
  0 siblings, 1 reply; 5+ messages in thread
From: Rob Clark @ 2023-06-08 21:17 UTC (permalink / raw)
  To: Johan Hovold
  Cc: Abhinav Kumar, Dmitry Baryshkov, Sean Paul, David Airlie,
	Daniel Vetter, dri-devel, linux-arm-msm, freedreno, linux-kernel

On Thu, Jun 8, 2023 at 7:12 AM Johan Hovold <johan@kernel.org> wrote:
>
> Hi Rob,
>
> Have you had a chance to look at this regression yet? It prevents us
> from using lockdep on the X13s as it is disabled as soon as we start
> the GPU.

Hmm, curious what is different between x13s and sc7180/sc7280 things?
Or did lockdep recently get more clever (or more annotation)?

I did spend some time a while back trying to bring some sense to
devfreq/pm-qos/icc locking:
https://patchwork.freedesktop.org/series/115028/

but haven't had time to revisit that for a while

BR,
-R

> On Wed, Mar 15, 2023 at 10:19:21AM +0100, Johan Hovold wrote:
> >
> > Since 6.3-rc2 (or possibly -rc1), I'm now seeing the below
> > devfreq-related lockdep splat.
> >
> > I noticed that you posted a fix for something similar here:
> >
> >       https://lore.kernel.org/r/20230312204150.1353517-9-robdclark@gmail.com
> >
> > but that particular patch makes no difference.
> >
> > From skimming the calltraces below and qos/devfreq related changes in
> > 6.3-rc1 it seems like this could be related to:
> >
> >       fadcc3ab1302 ("drm/msm/gpu: Bypass PM QoS constraint for idle clamp")
>
> Below is an updated splat from 6.4-rc5.
>
> Johan
>
> [ 2941.931507] ======================================================
> [ 2941.931509] WARNING: possible circular locking dependency detected
> [ 2941.931513] 6.4.0-rc5 #64 Not tainted
> [ 2941.931516] ------------------------------------------------------
> [ 2941.931518] ring0/359 is trying to acquire lock:
> [ 2941.931520] ffff63310e35c078 (&devfreq->lock){+.+.}-{3:3}, at: qos_min_notifier_call+0x28/0x88
> [ 2941.931541]
>                but task is already holding lock:
> [ 2941.931543] ffff63310e3cace8 (&(c->notifiers)->rwsem){++++}-{3:3}, at: blocking_notifier_call_chain+0x30/0x70
> [ 2941.931553]
>                which lock already depends on the new lock.
>
> [ 2941.931555]
>                the existing dependency chain (in reverse order) is:
> [ 2941.931556]
>                -> #4 (&(c->notifiers)->rwsem){++++}-{3:3}:
> [ 2941.931562]        down_write+0x50/0x198
> [ 2941.931567]        blocking_notifier_chain_register+0x30/0x8c
> [ 2941.931570]        freq_qos_add_notifier+0x68/0x7c
> [ 2941.931574]        dev_pm_qos_add_notifier+0xa0/0xf8
> [ 2941.931579]        devfreq_add_device.part.0+0x360/0x5a4
> [ 2941.931583]        devm_devfreq_add_device+0x74/0xe0
> [ 2941.931587]        msm_devfreq_init+0xa0/0x154 [msm]
> [ 2941.931624]        msm_gpu_init+0x2fc/0x588 [msm]
> [ 2941.931649]        adreno_gpu_init+0x160/0x2d0 [msm]
> [ 2941.931675]        a6xx_gpu_init+0x2c0/0x74c [msm]
> [ 2941.931699]        adreno_bind+0x180/0x290 [msm]
> [ 2941.931723]        component_bind_all+0x124/0x288
> [ 2941.931728]        msm_drm_bind+0x1d8/0x6cc [msm]
> [ 2941.931752]        try_to_bring_up_aggregate_device+0x1ec/0x2f4
> [ 2941.931755]        __component_add+0xa8/0x194
> [ 2941.931758]        component_add+0x14/0x20
> [ 2941.931761]        dp_display_probe+0x2b4/0x474 [msm]
> [ 2941.931785]        platform_probe+0x68/0xd8
> [ 2941.931789]        really_probe+0x184/0x3c8
> [ 2941.931792]        __driver_probe_device+0x7c/0x16c
> [ 2941.931794]        driver_probe_device+0x3c/0x110
> [ 2941.931797]        __device_attach_driver+0xbc/0x158
> [ 2941.931800]        bus_for_each_drv+0x84/0xe0
> [ 2941.931802]        __device_attach+0xa8/0x1d4
> [ 2941.931805]        device_initial_probe+0x14/0x20
> [ 2941.931807]        bus_probe_device+0xb0/0xb4
> [ 2941.931810]        deferred_probe_work_func+0xa0/0xf4
> [ 2941.931812]        process_one_work+0x288/0x5bc
> [ 2941.931816]        worker_thread+0x74/0x450
> [ 2941.931818]        kthread+0x124/0x128
> [ 2941.931822]        ret_from_fork+0x10/0x20
> [ 2941.931826]
>                -> #3 (dev_pm_qos_mtx){+.+.}-{3:3}:
> [ 2941.931831]        __mutex_lock+0xa0/0x840
> [ 2941.931833]        mutex_lock_nested+0x24/0x30
> [ 2941.931836]        dev_pm_qos_remove_notifier+0x34/0x140
> [ 2941.931838]        genpd_remove_device+0x3c/0x174
> [ 2941.931841]        genpd_dev_pm_detach+0x78/0x1b4
> [ 2941.931844]        dev_pm_domain_detach+0x24/0x34
> [ 2941.931846]        a6xx_gmu_remove+0x34/0xc4 [msm]
> [ 2941.931869]        a6xx_destroy+0xd0/0x160 [msm]
> [ 2941.931892]        adreno_unbind+0x40/0x64 [msm]
> [ 2941.931916]        component_unbind+0x38/0x6c
> [ 2941.931919]        component_unbind_all+0xc8/0xd4
> [ 2941.931921]        msm_drm_uninit.isra.0+0x150/0x1c4 [msm]
> [ 2941.931945]        msm_drm_bind+0x310/0x6cc [msm]
> [ 2941.931967]        try_to_bring_up_aggregate_device+0x1ec/0x2f4
> [ 2941.931970]        __component_add+0xa8/0x194
> [ 2941.931973]        component_add+0x14/0x20
> [ 2941.931976]        dp_display_probe+0x2b4/0x474 [msm]
> [ 2941.932000]        platform_probe+0x68/0xd8
> [ 2941.932003]        really_probe+0x184/0x3c8
> [ 2941.932005]        __driver_probe_device+0x7c/0x16c
> [ 2941.932008]        driver_probe_device+0x3c/0x110
> [ 2941.932011]        __device_attach_driver+0xbc/0x158
> [ 2941.932014]        bus_for_each_drv+0x84/0xe0
> [ 2941.932016]        __device_attach+0xa8/0x1d4
> [ 2941.932018]        device_initial_probe+0x14/0x20
> [ 2941.932021]        bus_probe_device+0xb0/0xb4
> [ 2941.932023]        deferred_probe_work_func+0xa0/0xf4
> [ 2941.932026]        process_one_work+0x288/0x5bc
> [ 2941.932028]        worker_thread+0x74/0x450
> [ 2941.932031]        kthread+0x124/0x128
> [ 2941.932035]        ret_from_fork+0x10/0x20
> [ 2941.932037]
>                -> #2 (&gmu->lock){+.+.}-{3:3}:
> [ 2941.932043]        __mutex_lock+0xa0/0x840
> [ 2941.932045]        mutex_lock_nested+0x24/0x30
> [ 2941.932047]        a6xx_gpu_set_freq+0x30/0x5c [msm]
> [ 2941.932071]        msm_devfreq_target+0xb8/0x1a8 [msm]
> [ 2941.932094]        devfreq_set_target+0x84/0x27c
> [ 2941.932098]        devfreq_update_target+0xc4/0xec
> [ 2941.932102]        devfreq_monitor+0x38/0x170
> [ 2941.932105]        process_one_work+0x288/0x5bc
> [ 2941.932108]        worker_thread+0x74/0x450
> [ 2941.932110]        kthread+0x124/0x128
> [ 2941.932113]        ret_from_fork+0x10/0x20
> [ 2941.932116]
>                -> #1 (&df->lock){+.+.}-{3:3}:
> [ 2941.932121]        __mutex_lock+0xa0/0x840
> [ 2941.932124]        mutex_lock_nested+0x24/0x30
> [ 2941.932126]        msm_devfreq_get_dev_status+0x48/0x134 [msm]
> [ 2941.932149]        devfreq_simple_ondemand_func+0x3c/0x144
> [ 2941.932153]        devfreq_update_target+0x4c/0xec
> [ 2941.932157]        devfreq_monitor+0x38/0x170
> [ 2941.932160]        process_one_work+0x288/0x5bc
> [ 2941.932162]        worker_thread+0x74/0x450
> [ 2941.932165]        kthread+0x124/0x128
> [ 2941.932168]        ret_from_fork+0x10/0x20
> [ 2941.932171]
>                -> #0 (&devfreq->lock){+.+.}-{3:3}:
> [ 2941.932175]        __lock_acquire+0x13d8/0x2188
> [ 2941.932178]        lock_acquire+0x1e8/0x310
> [ 2941.932180]        __mutex_lock+0xa0/0x840
> [ 2941.932182]        mutex_lock_nested+0x24/0x30
> [ 2941.932184]        qos_min_notifier_call+0x28/0x88
> [ 2941.932188]        notifier_call_chain+0xa0/0x17c
> [ 2941.932190]        blocking_notifier_call_chain+0x48/0x70
> [ 2941.932193]        pm_qos_update_target+0xdc/0x1d0
> [ 2941.932195]        freq_qos_apply+0x68/0x74
> [ 2941.932198]        apply_constraint+0x100/0x148
> [ 2941.932201]        __dev_pm_qos_update_request+0xb8/0x1fc
> [ 2941.932203]        dev_pm_qos_update_request+0x3c/0x64
> [ 2941.932206]        msm_devfreq_active+0xf8/0x194 [msm]
> [ 2941.932227]        msm_gpu_submit+0x18c/0x1a8 [msm]
> [ 2941.932249]        msm_job_run+0x98/0x11c [msm]
> [ 2941.932272]        drm_sched_main+0x1a0/0x444 [gpu_sched]
> [ 2941.932281]        kthread+0x124/0x128
> [ 2941.932284]        ret_from_fork+0x10/0x20
> [ 2941.932287]
>                other info that might help us debug this:
>
> [ 2941.932289] Chain exists of:
>                  &devfreq->lock --> dev_pm_qos_mtx --> &(c->notifiers)->rwsem
>
> [ 2941.932296]  Possible unsafe locking scenario:
>
> [ 2941.932298]        CPU0                    CPU1
> [ 2941.932300]        ----                    ----
> [ 2941.932301]   rlock(&(c->notifiers)->rwsem);
> [ 2941.932304]                                lock(dev_pm_qos_mtx);
> [ 2941.932307]                                lock(&(c->notifiers)->rwsem);
> [ 2941.932309]   lock(&devfreq->lock);
> [ 2941.932312]
>                 *** DEADLOCK ***
>
> [ 2941.932313] 4 locks held by ring0/359:
> [ 2941.932315]  #0: ffff633110966170 (&gpu->lock){+.+.}-{3:3}, at: msm_job_run+0x8c/0x11c [msm]
> [ 2941.932342]  #1: ffff633110966208 (&gpu->active_lock){+.+.}-{3:3}, at: msm_gpu_submit+0xdc/0x1a8 [msm]
> [ 2941.932368]  #2: ffffa40da2f91ed0 (dev_pm_qos_mtx){+.+.}-{3:3}, at: dev_pm_qos_update_request+0x30/0x64
> [ 2941.932374]  #3: ffff63310e3cace8 (&(c->notifiers)->rwsem){++++}-{3:3}, at: blocking_notifier_call_chain+0x30/0x70
> [ 2941.932381]
>                stack backtrace:
> [ 2941.932383] CPU: 7 PID: 359 Comm: ring0 Not tainted 6.4.0-rc5 #64
> [ 2941.932386] Hardware name: LENOVO 21BYZ9SRUS/21BYZ9SRUS, BIOS N3HET53W (1.25 ) 10/12/2022
> [ 2941.932389] Call trace:
> [ 2941.932391]  dump_backtrace+0x9c/0x11c
> [ 2941.932395]  show_stack+0x18/0x24
> [ 2941.932398]  dump_stack_lvl+0x60/0xac
> [ 2941.932402]  dump_stack+0x18/0x24
> [ 2941.932405]  print_circular_bug+0x26c/0x348
> [ 2941.932407]  check_noncircular+0x134/0x148
> [ 2941.932409]  __lock_acquire+0x13d8/0x2188
> [ 2941.932411]  lock_acquire+0x1e8/0x310
> [ 2941.932414]  __mutex_lock+0xa0/0x840
> [ 2941.932416]  mutex_lock_nested+0x24/0x30
> [ 2941.932418]  qos_min_notifier_call+0x28/0x88
> [ 2941.932421]  notifier_call_chain+0xa0/0x17c
> [ 2941.932424]  blocking_notifier_call_chain+0x48/0x70
> [ 2941.932426]  pm_qos_update_target+0xdc/0x1d0
> [ 2941.932428]  freq_qos_apply+0x68/0x74
> [ 2941.932431]  apply_constraint+0x100/0x148
> [ 2941.932433]  __dev_pm_qos_update_request+0xb8/0x1fc
> [ 2941.932435]  dev_pm_qos_update_request+0x3c/0x64
> [ 2941.932437]  msm_devfreq_active+0xf8/0x194 [msm]
> [ 2941.932460]  msm_gpu_submit+0x18c/0x1a8 [msm]
> [ 2941.932482]  msm_job_run+0x98/0x11c [msm]
> [ 2941.932504]  drm_sched_main+0x1a0/0x444 [gpu_sched]
> [ 2941.932511]  kthread+0x124/0x128
> [ 2941.932514]  ret_from_fork+0x10/0x20

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Adreno devfreq lockdep splat with 6.3-rc2
  2023-06-08 21:17   ` Rob Clark
@ 2023-06-09  6:17     ` Johan Hovold
  2023-06-09 14:46       ` Rob Clark
  0 siblings, 1 reply; 5+ messages in thread
From: Johan Hovold @ 2023-06-09  6:17 UTC (permalink / raw)
  To: Rob Clark
  Cc: Abhinav Kumar, Dmitry Baryshkov, Sean Paul, David Airlie,
	Daniel Vetter, dri-devel, linux-arm-msm, freedreno, linux-kernel

On Thu, Jun 08, 2023 at 02:17:45PM -0700, Rob Clark wrote:
> On Thu, Jun 8, 2023 at 7:12 AM Johan Hovold <johan@kernel.org> wrote:

> > Have you had a chance to look at this regression yet? It prevents us
> > from using lockdep on the X13s as it is disabled as soon as we start
> > the GPU.
> 
> Hmm, curious what is different between x13s and sc7180/sc7280 things?

It seems like lockdep needs to hit the tear down path in order to
detect the circular lock dependency. Perhaps you don't hit that on your
sc7180/sc7280? 

It is due to the fact that the panel is looked up way too late so that
bind fails unless the panel driver is already loaded when the msm drm
driver probes.

Manually loading the panel driver before msm makes the splat go away.

> Or did lockdep recently get more clever (or more annotation)?

I think this is indeed a new problem related to some of the devfreq work
you did in 6.3-rc1 (e.g. fadcc3ab1302 ("drm/msm/gpu: Bypass PM QoS
constraint for idle clamp")).

> I did spend some time a while back trying to bring some sense to
> devfreq/pm-qos/icc locking:
> https://patchwork.freedesktop.org/series/115028/
> 
> but haven't had time to revisit that for a while

That's the series I link to below, but IIRC it did not look directly
applicable to the splat I see on X13s (e.g. does not involve
fs_reclaim).

> > On Wed, Mar 15, 2023 at 10:19:21AM +0100, Johan Hovold wrote:
> > >
> > > Since 6.3-rc2 (or possibly -rc1), I'm now seeing the below
> > > devfreq-related lockdep splat.
> > >
> > > I noticed that you posted a fix for something similar here:
> > >
> > >       https://lore.kernel.org/r/20230312204150.1353517-9-robdclark@gmail.com
> > >
> > > but that particular patch makes no difference.
> > >
> > > From skimming the calltraces below and qos/devfreq related changes in
> > > 6.3-rc1 it seems like this could be related to:
> > >
> > >       fadcc3ab1302 ("drm/msm/gpu: Bypass PM QoS constraint for idle clamp")

Johan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Adreno devfreq lockdep splat with 6.3-rc2
  2023-06-09  6:17     ` Johan Hovold
@ 2023-06-09 14:46       ` Rob Clark
  0 siblings, 0 replies; 5+ messages in thread
From: Rob Clark @ 2023-06-09 14:46 UTC (permalink / raw)
  To: Johan Hovold
  Cc: Abhinav Kumar, Dmitry Baryshkov, Sean Paul, David Airlie,
	Daniel Vetter, dri-devel, linux-arm-msm, freedreno, linux-kernel

On Thu, Jun 8, 2023 at 11:17 PM Johan Hovold <johan@kernel.org> wrote:
>
> On Thu, Jun 08, 2023 at 02:17:45PM -0700, Rob Clark wrote:
> > On Thu, Jun 8, 2023 at 7:12 AM Johan Hovold <johan@kernel.org> wrote:
>
> > > Have you had a chance to look at this regression yet? It prevents us
> > > from using lockdep on the X13s as it is disabled as soon as we start
> > > the GPU.
> >
> > Hmm, curious what is different between x13s and sc7180/sc7280 things?
>
> It seems like lockdep needs to hit the tear down path in order to
> detect the circular lock dependency. Perhaps you don't hit that on your
> sc7180/sc7280?
>
> It is due to the fact that the panel is looked up way too late so that
> bind fails unless the panel driver is already loaded when the msm drm
> driver probes.

Oh, this seems likely

> Manually loading the panel driver before msm makes the splat go away.
>
> > Or did lockdep recently get more clever (or more annotation)?
>
> I think this is indeed a new problem related to some of the devfreq work
> you did in 6.3-rc1 (e.g. fadcc3ab1302 ("drm/msm/gpu: Bypass PM QoS
> constraint for idle clamp")).
>
> > I did spend some time a while back trying to bring some sense to
> > devfreq/pm-qos/icc locking:
> > https://patchwork.freedesktop.org/series/115028/
> >
> > but haven't had time to revisit that for a while
>
> That's the series I link to below, but IIRC it did not look directly
> applicable to the splat I see on X13s (e.g. does not involve
> fs_reclaim).

Ahh, right, sorry I've not had time to do more than glance at the
thread.. and yeah, that one is mostly just trying to solve the reclaim
problem by moving allocations out from under the big-pm-qos-lock.

As far as fadcc3ab1302 ("drm/msm/gpu: Bypass PM QoS constraint for
idle clamp"), it should be just taking the lock that
dev_pm_qos_update_request() would have indirectly, although I guess
without some intervening lock?  We can't really avoid taking the
devfreq lock, I don't think.  But I'd have to spend time I don't have
right now digging into it..

BR,
-R

> > > On Wed, Mar 15, 2023 at 10:19:21AM +0100, Johan Hovold wrote:
> > > >
> > > > Since 6.3-rc2 (or possibly -rc1), I'm now seeing the below
> > > > devfreq-related lockdep splat.
> > > >
> > > > I noticed that you posted a fix for something similar here:
> > > >
> > > >       https://lore.kernel.org/r/20230312204150.1353517-9-robdclark@gmail.com
> > > >
> > > > but that particular patch makes no difference.
> > > >
> > > > From skimming the calltraces below and qos/devfreq related changes in
> > > > 6.3-rc1 it seems like this could be related to:
> > > >
> > > >       fadcc3ab1302 ("drm/msm/gpu: Bypass PM QoS constraint for idle clamp")
>
> Johan

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-06-09 14:47 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-15  9:19 Adreno devfreq lockdep splat with 6.3-rc2 Johan Hovold
2023-06-08 14:13 ` Johan Hovold
2023-06-08 21:17   ` Rob Clark
2023-06-09  6:17     ` Johan Hovold
2023-06-09 14:46       ` Rob Clark

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).