All of lore.kernel.org
 help / color / mirror / Atom feed
* etnaviv: Possible circular lockingon i.MX6QP
@ 2019-06-12 15:48 Fabio Estevam
  2019-06-27  9:43 ` Lucas Stach
  0 siblings, 1 reply; 2+ messages in thread
From: Fabio Estevam @ 2019-06-12 15:48 UTC (permalink / raw)
  To: Lucas Stach, Christian Gmeiner, Russell King - ARM Linux
  Cc: The etnaviv authors, DRI mailing list

Hi,

On a imx6qp-wandboard I get the warning below about a possible
circular locking dependency running 5.1.9 built from
imx_v6_v7_defconfig.

Such warning does not happen on the imx6q or imx6solo variants of
wandboard though.

Any ideas?

Thanks,

Fabio Estevam

** (matchbox-panel:708): WARNING **: Failed to load applet "battery"
(/usr/lib/matchbox-panel/libbattery.so: cannot open shared object
file: No such file or directory).
matchbox-wm: X error warning (0xe00003): BadWindow (invalid Window
parameter) (opcode: 12)
etnaviv-gpu 134000.gpu: MMU fault status 0x00000001
etnaviv-gpu 134000.gpu: MMU 0 fault addr 0x0805ffc0

======================================================
WARNING: possible circular locking dependency detected
5.1.9 #58 Not tainted
------------------------------------------------------
kworker/0:1/29 is trying to acquire lock:
(ptrval) (&(&gpu->fence_spinlock)->rlock){-...}, at:
dma_fence_remove_callback+0x14/0x50

but task is already holding lock:
(ptrval) (&(&sched->job_list_lock)->rlock){-...}, at: drm_sched_stop+0x1c/0x124

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (&(&sched->job_list_lock)->rlock){-...}:
       drm_sched_process_job+0x5c/0x1c8
       dma_fence_signal+0xdc/0x1d4
       irq_handler+0xd0/0x1e0
       __handle_irq_event_percpu+0x48/0x360
       handle_irq_event_percpu+0x28/0x7c
       handle_irq_event+0x38/0x5c
       handle_fasteoi_irq+0xc0/0x17c
       generic_handle_irq+0x20/0x34
       __handle_domain_irq+0x64/0xe0
       gic_handle_irq+0x4c/0xa8
       __irq_svc+0x70/0x98
       cpuidle_enter_state+0x168/0x5a4
       cpuidle_enter_state+0x168/0x5a4
       do_idle+0x220/0x2c0
       cpu_startup_entry+0x18/0x20
       start_kernel+0x3e4/0x498

-> #0 (&(&gpu->fence_spinlock)->rlock){-...}:
       _raw_spin_lock_irqsave+0x38/0x4c
       dma_fence_remove_callback+0x14/0x50
       drm_sched_stop+0x98/0x124
       etnaviv_sched_timedout_job+0x7c/0xb4
       drm_sched_job_timedout+0x34/0x5c
       process_one_work+0x2ac/0x704
       worker_thread+0x2c/0x574
       kthread+0x134/0x148
       ret_from_fork+0x14/0x20
         (null)

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&(&sched->job_list_lock)->rlock);
                               lock(&(&gpu->fence_spinlock)->rlock);
                               lock(&(&sched->job_list_lock)->rlock);
  lock(&(&gpu->fence_spinlock)->rlock);

 *** DEADLOCK ***

3 locks held by kworker/0:1/29:
 #0: (ptrval) ((wq_completion)events){+.+.}, at: process_one_work+0x1f4/0x704
 #1: (ptrval) ((work_completion)(&(&sched->work_tdr)->work)){+.+.},
at: process_one_work+0x1f4/0x704
 #2: (ptrval) (&(&sched->job_list_lock)->rlock){-...}, at:
drm_sched_stop+0x1c/0x124

stack backtrace:
CPU: 0 PID: 29 Comm: kworker/0:1 Not tainted 5.1.9 #58
Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
Workqueue: events drm_sched_job_timedout
[<c0112748>] (unwind_backtrace) from [<c010cfbc>] (show_stack+0x10/0x14)
[<c010cfbc>] (show_stack) from [<c0bd31ec>] (dump_stack+0xd8/0x110)
[<c0bd31ec>] (dump_stack) from [<c017a22c>]
(print_circular_bug.constprop.19+0x1bc/0x2f0)
[<c017a22c>] (print_circular_bug.constprop.19) from [<c017d408>]
(__lock_acquire+0x1778/0x1f38)
[<c017d408>] (__lock_acquire) from [<c017e3a4>] (lock_acquire+0xcc/0x1e8)
[<c017e3a4>] (lock_acquire) from [<c0bf4134>] (_raw_spin_lock_irqsave+0x38/0x4c)
[<c0bf4134>] (_raw_spin_lock_irqsave) from [<c0692710>]
(dma_fence_remove_callback+0x14/0x50)
[<c0692710>] (dma_fence_remove_callback) from [<c05d25b4>]
(drm_sched_stop+0x98/0x124)
[<c05d25b4>] (drm_sched_stop) from [<c064a3e8>]
(etnaviv_sched_timedout_job+0x7c/0xb4)
[<c064a3e8>] (etnaviv_sched_timedout_job) from [<c05d2964>]
(drm_sched_job_timedout+0x34/0x5c)
[<c05d2964>] (drm_sched_job_timedout) from [<c01468ec>]
(process_one_work+0x2ac/0x704)
[<c01468ec>] (process_one_work) from [<c0146d70>] (worker_thread+0x2c/0x574)
[<c0146d70>] (worker_thread) from [<c014cd88>] (kthread+0x134/0x148)
[<c014cd88>] (kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20)
Exception stack(0xe81f7fb0 to 0xe81f7ff8)
7fa0:                                     00000000 00000000 00000000 00000000
7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
7fe0: 00000000 00000000 00000000 00000000 00000013 00000000
etnaviv-gpu 134000.gpu: recover hung GPU!
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: etnaviv: Possible circular lockingon i.MX6QP
  2019-06-12 15:48 etnaviv: Possible circular lockingon i.MX6QP Fabio Estevam
@ 2019-06-27  9:43 ` Lucas Stach
  0 siblings, 0 replies; 2+ messages in thread
From: Lucas Stach @ 2019-06-27  9:43 UTC (permalink / raw)
  To: Fabio Estevam, Christian Gmeiner, Russell King - ARM Linux
  Cc: The etnaviv authors, DRI mailing list

Hi Fabio,

Am Mittwoch, den 12.06.2019, 12:48 -0300 schrieb Fabio Estevam:
> Hi,
> 
> On a imx6qp-wandboard I get the warning below about a possible
> circular locking dependency running 5.1.9 built from
> imx_v6_v7_defconfig.
> 
> Such warning does not happen on the imx6q or imx6solo variants of
> wandboard though.
> 
> Any ideas?

The issue reported by lockdep is real. You probably only see it on QP
as it's uncovered due to a MMU exception triggered GPU hang. MMUv1
cores like the ones on the older i.MX6 are unable to signal MMU
exceptions but just read the dummy page.

Some git history digging shows that the bug has been introduced with
3741540e0413 (drm/sched: Rework HW fence processing.), which is part of kernel 5.1. The fix is 5918045c4ed4 (drm/scheduler: rework job destruction), which is not in any released kernel yet and seems to be too big for stable, so I'm not really sure what to do at this point.

Regards,
Lucas

>  Thanks,
> 
> Fabio Estevam
> 
> ** (matchbox-panel:708): WARNING **: Failed to load applet "battery"
> (/usr/lib/matchbox-panel/libbattery.so: cannot open shared object
> file: No such file or directory).
> matchbox-wm: X error warning (0xe00003): BadWindow (invalid Window
> parameter) (opcode: 12)
> etnaviv-gpu 134000.gpu: MMU fault status 0x00000001
> etnaviv-gpu 134000.gpu: MMU 0 fault addr 0x0805ffc0
> 
> ======================================================
> WARNING: possible circular locking dependency detected
> 5.1.9 #58 Not tainted
> ------------------------------------------------------
> kworker/0:1/29 is trying to acquire lock:
> (ptrval) (&(&gpu->fence_spinlock)->rlock){-...}, at:
> dma_fence_remove_callback+0x14/0x50
> 
> but task is already holding lock:
> (ptrval) (&(&sched->job_list_lock)->rlock){-...}, at:
> drm_sched_stop+0x1c/0x124
> 
> which lock already depends on the new lock.
> 
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #1 (&(&sched->job_list_lock)->rlock){-...}:
>        drm_sched_process_job+0x5c/0x1c8
>        dma_fence_signal+0xdc/0x1d4
>        irq_handler+0xd0/0x1e0
>        __handle_irq_event_percpu+0x48/0x360
>        handle_irq_event_percpu+0x28/0x7c
>        handle_irq_event+0x38/0x5c
>        handle_fasteoi_irq+0xc0/0x17c
>        generic_handle_irq+0x20/0x34
>        __handle_domain_irq+0x64/0xe0
>        gic_handle_irq+0x4c/0xa8
>        __irq_svc+0x70/0x98
>        cpuidle_enter_state+0x168/0x5a4
>        cpuidle_enter_state+0x168/0x5a4
>        do_idle+0x220/0x2c0
>        cpu_startup_entry+0x18/0x20
>        start_kernel+0x3e4/0x498
> 
> -> #0 (&(&gpu->fence_spinlock)->rlock){-...}:
>        _raw_spin_lock_irqsave+0x38/0x4c
>        dma_fence_remove_callback+0x14/0x50
>        drm_sched_stop+0x98/0x124
>        etnaviv_sched_timedout_job+0x7c/0xb4
>        drm_sched_job_timedout+0x34/0x5c
>        process_one_work+0x2ac/0x704
>        worker_thread+0x2c/0x574
>        kthread+0x134/0x148
>        ret_from_fork+0x14/0x20
>          (null)
> 
> other info that might help us debug this:
> 
>  Possible unsafe locking scenario:
> 
>        CPU0                    CPU1
>        ----                    ----
>   lock(&(&sched->job_list_lock)->rlock);
>                                lock(&(&gpu->fence_spinlock)->rlock);
>                                lock(&(&sched->job_list_lock)->rlock);
>   lock(&(&gpu->fence_spinlock)->rlock);
> 
>  *** DEADLOCK ***
> 
> 3 locks held by kworker/0:1/29:
>  #0: (ptrval) ((wq_completion)events){+.+.}, at:
> process_one_work+0x1f4/0x704
>  #1: (ptrval) ((work_completion)(&(&sched->work_tdr)->work)){+.+.},
> at: process_one_work+0x1f4/0x704
>  #2: (ptrval) (&(&sched->job_list_lock)->rlock){-...}, at:
> drm_sched_stop+0x1c/0x124
> 
> stack backtrace:
> CPU: 0 PID: 29 Comm: kworker/0:1 Not tainted 5.1.9 #58
> Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> Workqueue: events drm_sched_job_timedout
> [<c0112748>] (unwind_backtrace) from [<c010cfbc>]
> (show_stack+0x10/0x14)
> [<c010cfbc>] (show_stack) from [<c0bd31ec>] (dump_stack+0xd8/0x110)
> [<c0bd31ec>] (dump_stack) from [<c017a22c>]
> (print_circular_bug.constprop.19+0x1bc/0x2f0)
> [<c017a22c>] (print_circular_bug.constprop.19) from [<c017d408>]
> (__lock_acquire+0x1778/0x1f38)
> [<c017d408>] (__lock_acquire) from [<c017e3a4>]
> (lock_acquire+0xcc/0x1e8)
> [<c017e3a4>] (lock_acquire) from [<c0bf4134>]
> (_raw_spin_lock_irqsave+0x38/0x4c)
> [<c0bf4134>] (_raw_spin_lock_irqsave) from [<c0692710>]
> (dma_fence_remove_callback+0x14/0x50)
> [<c0692710>] (dma_fence_remove_callback) from [<c05d25b4>]
> (drm_sched_stop+0x98/0x124)
> [<c05d25b4>] (drm_sched_stop) from [<c064a3e8>]
> (etnaviv_sched_timedout_job+0x7c/0xb4)
> [<c064a3e8>] (etnaviv_sched_timedout_job) from [<c05d2964>]
> (drm_sched_job_timedout+0x34/0x5c)
> [<c05d2964>] (drm_sched_job_timedout) from [<c01468ec>]
> (process_one_work+0x2ac/0x704)
> [<c01468ec>] (process_one_work) from [<c0146d70>]
> (worker_thread+0x2c/0x574)
> [<c0146d70>] (worker_thread) from [<c014cd88>] (kthread+0x134/0x148)
> [<c014cd88>] (kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20)
> Exception stack(0xe81f7fb0 to 0xe81f7ff8)
> 7fa0:                                     00000000 00000000 00000000
> 00000000
> 7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 00000000
> 7fe0: 00000000 00000000 00000000 00000000 00000013 00000000
> etnaviv-gpu 134000.gpu: recover hung GPU!
> _______________________________________________
> etnaviv mailing list
> etnaviv@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/etnaviv
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2019-06-27  9:43 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-12 15:48 etnaviv: Possible circular lockingon i.MX6QP Fabio Estevam
2019-06-27  9:43 ` Lucas Stach

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.